HebMorph | source effort for making Hebrew | Natural Language Processing library
kandi X-RAY | HebMorph Summary
kandi X-RAY | HebMorph Summary
HebMorph is an open-source effort for making Hebrew properly searchable by various IR software libraries, while maintaining decent recall, precision and relevancy in retrievals. All code and files are released under the GNU Affero General Public License version 3. More details at
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Increments the token
- Lemmatize the Hebrew token
- Lemmatize a word from the Hebrew token
- Gets the next token
- Parse a multi - field query
- Parse a multi field query
- Constructs a query based on the given parameters
- Makes the next token
- Get next token from input stream
- Reset the stream
- Set the custom tokenization cases
- Calculate a hash code
- Loads a dictionary from the specified path
- Checks if the given object is equal in the dictionary
- Subclasses can override this method to modify the query string
- Resets the reader
- Returns true if this object is equal to the morph data
- Creates a hashCode hashCode
- Concatenate two arrays
- Returns a token stream for the original word
- Creates TokenStreamComponents
- Creates a token stream for the given analyzer
- Creates the token stream
- Tries to parse a string as an integer
- Compares this token with another instance
- Load a custom word from a stream
HebMorph Key Features
HebMorph Examples and Code Snippets
Community Discussions
Trending Discussions on HebMorph
QUESTION
I am developing an application that supports indexing & searching of multi-language texts, including hebrew, using the "solr" engine.
After lots of searches, I found that HebMorph is the best plugin to use for hebrew language
My problem is that the behavior of HebMorph with hebrew stopwords seems to be different than solr:
Whith solr (any language): when I search for a stopword, the results returned doesn't include any of the stopwords exxisting in query.
Whereas when I search for hebrew terms (after pluging HebMorh in solr following this link, the returned results include all existing stopwords in the query.
1) Is this the normal behavior for HebMorph? If yes, how can I alter it? If no, what should I change?
2) Since HebMorph doesn't support synonyms, (as I read in their documentation that it is a future work). Is there a way to use synonyms for hebrew as other languages the way solr supports it? (i.e. by adding the proper filter in solrconfig and pointing out to the synonyms file)?
Thanks in advance for your help.
...ANSWER
Answered 2018-May-22 at 04:15I'm the author of HebMorph.
StopWords are indeed supported, but you need to filter them out before the lemmatizer kicks in. Assuming a recent version of HebMorph - your stopwords filter needs to come in right after the tokenizer, which means it needs to take care also of בחל"מ letters attached to the stop-words.
The general advice nowadays, for all languages, is NOT to drop stopwords - at least not in indexing, so I'd recommend not applying a stop-words filter here either.
With regards to synonyms - the root issue is with the HebMorph lemmatizer expanding a word to multiple lemmas at times, which makes the work of applying synonyms a bit more challenging. With the (relatively) new graph based analyzers this is now possible to do so we will likely implement that too and Lucene's Synonym filters will be supported OOTB.
In the commercial version there is already a way to customize word lists and override dictionary definitions, which is useful in an ambiguous language like Hebrew. Many use this as their way of creating synonyms.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install HebMorph
You can use HebMorph like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the HebMorph component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page