elyzer | Stop worrying about Elasticsearch analyzers | Natural Language Processing library
kandi X-RAY | elyzer Summary
kandi X-RAY | elyzer Summary
See step-by-step how Elasticsearch custom analyzers decompose your text into tokens. My therapist said this would be a good idea...
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Analyze a single index
- Prints tokens to stdout
- Retrieve an analyzer from the index
- Convert str to list
- Normalize analyzer
- Parse command line arguments
elyzer Key Features
elyzer Examples and Code Snippets
Community Discussions
Trending Discussions on elyzer
QUESTION
I have set up a custom analyser in Elasticsearch that uses an edge-ngram tokeniser and I'm experimenting with filters and char_filters to refine the search experience.
I've been pointed to the excellent tool elyser which enables you to test the affect your custom analyser has on a specific term but this is throwing errors when I combine a custom analyser with a char_filter, specifically html_strip.
The error I get from elyser is:
illegal_argument_exception', 'reason': 'Custom normalizer may not use char filter [html_strip]'
I would like to know whether this is a legitimate error message or whether it represents a bug in the tool.
I've referred to the main documentation and even their custom analyser example throws an error in elyser:
...ANSWER
Answered 2019-Mar-06 at 14:24This bug is in elyzer. In order to show the state of the tokens at each step of the analysis process, elyzer performs an analyze query for each stage: first char filters, then tokenizer, and finally token filters.
The problem is that on ES side, the analysis process has changed since they introduced normalizers (in a non-backward compatible way). They assume that if there is no normalizer, no analyzer and no tokenizer in the request but either a token filter or a char_filter, then the analyze request should behave like a normalizer.
In your case, elyzer will first perform a request for the html_strip
character filter and ES will think it is about a normalizer, hence the error you're getting since html_strip
is not a valid char_filter
for normalizers.
Since I know Elyzer's developer pretty well (Doug Turnbull), so I've filed a bug already. We'll see what unfolds.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install elyzer
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page