HebMorph | source effort for making Hebrew | Natural Language Processing library

 by   synhershko Java Version: Current License: Non-SPDX

kandi X-RAY | HebMorph Summary

kandi X-RAY | HebMorph Summary

HebMorph is a Java library typically used in Artificial Intelligence, Natural Language Processing applications. HebMorph has no bugs, it has no vulnerabilities and it has low support. However HebMorph build file is not available and it has a Non-SPDX License. You can download it from GitHub, Maven.

HebMorph is an open-source effort for making Hebrew properly searchable by various IR software libraries, while maintaining decent recall, precision and relevancy in retrievals. All code and files are released under the GNU Affero General Public License version 3. More details at
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              HebMorph has a low active ecosystem.
              It has 88 star(s) with 41 fork(s). There are 15 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 6 open issues and 20 have been closed. On average issues are closed in 148 days. There are 3 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of HebMorph is current.

            kandi-Quality Quality

              HebMorph has 0 bugs and 0 code smells.

            kandi-Security Security

              HebMorph has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              HebMorph code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              HebMorph has a Non-SPDX License.
              Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

            kandi-Reuse Reuse

              HebMorph releases are not available. You will need to build from source code and install.
              Deployable package is available in Maven.
              HebMorph has no build file. You will be need to create the build yourself to build the component from source.
              Installation instructions are not available. Examples and code snippets are available.
              HebMorph saves you 2428 person hours of effort in developing the same functionality from scratch.
              It has 5290 lines of code, 413 functions and 128 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed HebMorph and discovered the below as its top functions. This is intended to give you an instant insight into HebMorph implemented functionality, and help decide if they suit your requirements.
            • Increments the token
            • Lemmatize the Hebrew token
            • Lemmatize a word from the Hebrew token
            • Gets the next token
            • Parse a multi - field query
            • Parse a multi field query
            • Constructs a query based on the given parameters
            • Makes the next token
            • Get next token from input stream
            • Reset the stream
            • Set the custom tokenization cases
            • Calculate a hash code
            • Loads a dictionary from the specified path
            • Checks if the given object is equal in the dictionary
            • Subclasses can override this method to modify the query string
            • Resets the reader
            • Returns true if this object is equal to the morph data
            • Creates a hashCode hashCode
            • Concatenate two arrays
            • Returns a token stream for the original word
            • Creates TokenStreamComponents
            • Creates a token stream for the given analyzer
            • Creates the token stream
            • Tries to parse a string as an integer
            • Compares this token with another instance
            • Load a custom word from a stream
            Get all kandi verified functions for this library.

            HebMorph Key Features

            No Key Features are available at this moment for HebMorph.

            HebMorph Examples and Code Snippets

            No Code Snippets are available at this moment for HebMorph.

            Community Discussions

            Trending Discussions on HebMorph

            QUESTION

            HebMorph with solr: how to use stopwords
            Asked 2018-May-22 at 04:15

            I am developing an application that supports indexing & searching of multi-language texts, including hebrew, using the "solr" engine.

            After lots of searches, I found that HebMorph is the best plugin to use for hebrew language

            My problem is that the behavior of HebMorph with hebrew stopwords seems to be different than solr:

            • Whith solr (any language): when I search for a stopword, the results returned doesn't include any of the stopwords exxisting in query.

            • Whereas when I search for hebrew terms (after pluging HebMorh in solr following this link, the returned results include all existing stopwords in the query.

            1) Is this the normal behavior for HebMorph? If yes, how can I alter it? If no, what should I change?

            2) Since HebMorph doesn't support synonyms, (as I read in their documentation that it is a future work). Is there a way to use synonyms for hebrew as other languages the way solr supports it? (i.e. by adding the proper filter in solrconfig and pointing out to the synonyms file)?

            Thanks in advance for your help.

            ...

            ANSWER

            Answered 2018-May-22 at 04:15

            I'm the author of HebMorph.

            StopWords are indeed supported, but you need to filter them out before the lemmatizer kicks in. Assuming a recent version of HebMorph - your stopwords filter needs to come in right after the tokenizer, which means it needs to take care also of בחל"מ letters attached to the stop-words.

            The general advice nowadays, for all languages, is NOT to drop stopwords - at least not in indexing, so I'd recommend not applying a stop-words filter here either.

            With regards to synonyms - the root issue is with the HebMorph lemmatizer expanding a word to multiple lemmas at times, which makes the work of applying synonyms a bit more challenging. With the (relatively) new graph based analyzers this is now possible to do so we will likely implement that too and Lucene's Synonym filters will be supported OOTB.

            In the commercial version there is already a way to customize word lists and override dictionary definitions, which is useful in an ambiguous language like Hebrew. Many use this as their way of creating synonyms.

            Source https://stackoverflow.com/questions/50444965

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install HebMorph

            You can download it from GitHub, Maven.
            You can use HebMorph like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the HebMorph component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/synhershko/HebMorph.git

          • CLI

            gh repo clone synhershko/HebMorph

          • sshUrl

            git@github.com:synhershko/HebMorph.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Consider Popular Natural Language Processing Libraries

            transformers

            by huggingface

            funNLP

            by fighting41love

            bert

            by google-research

            jieba

            by fxsjy

            Python

            by geekcomputers

            Try Top Libraries by synhershko

            clucene

            by synhershkoC

            Spatial4n

            by synhershkoC#

            Lucene.Net.Contrib

            by synhershkoC#

            LuceneNetDemo

            by synhershkoC#