Text-Search-Engine | crappy search engine for text files
kandi X-RAY | Text-Search-Engine Summary
kandi X-RAY | Text-Search-Engine Summary
A search engine for textfiles. Made on a plane flight.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Perform one - word query
- Rank the results of a query
- Generate vectors for a list of documents
- Return the frequency of the query
- Performs one - word query
- Compute term frequency
- Computes the dot product of two documents
- Generate a score for a given term
- Returns a list of uniques
- Compute the term vector for the query
- Populate the term_frequency
- The size of the collection
- Returns the term frequency of a document
- Returns the inverse function of the IDF function
- Returns the indices of terms in the file
- Returns a dictionary mapping term to word index
- Construct a dictionary of indices for each term
- Execute the query
- Calculates the total index for each word
- Free text query
Text-Search-Engine Key Features
Text-Search-Engine Examples and Code Snippets
Community Discussions
Trending Discussions on Text-Search-Engine
QUESTION
I am working on my diploma exam with topic of Full-Text Search in Apache SOLR. Within the introduction, I should elaborate what are the purpose and advantages of Apache SOLR, i.e. why would one opt for Full-Text Search engine like SOLR instead of MySQL, for instance. Using literature like "SOLR in action (2013)" one would say it's rather easy to determine when to use SOLR, ElasticSearch or something else, instead of MySQL - for that era. There is also this great question from 2010 on SO: Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?. Alas, as great as it was around 2010, answers now seem painfully obsolete. E.g. "MySQL MyISAM table type supports Full-Text Search, but InnoDB does not". Several years after this InnoDB also added Full-Text Search support. Now, there are some articles that manage to shed some light on this, like https://lucidworks.com/post/full-text-search-engines-vs-dbms/ which states that advantages of Full-Text Search systems are
search speed, variety of indexing and querying options, ranking and relevancy capabilities...
Yet, there are lot of other articles stating things like
MySQL Full-Text Search will now fit your needs in 80% of cases
etc, and it seems that over past 10 years MySql, MongoDB, PostgreSQL and other relational database Full-Text Search capability increased dramatically.
Yet, graph on https://db-engines.com/en/ranking_trend/system/Elasticsearch%3BMySQL%3BSolr shows that Full-Text Search engines are not losing popularity, but their usage is growing, and even SOLR that was losing pace steadily, now seems to be waking up.
So, there must be something to it? Is it that:
- SOLR, Elastic, Sphinx... are still considerably faster than their relational counterparts?
- there is larger variety of options, like advanced, customizable tokenization, faceting? Maybe better languages support?
- relational databases can't handle well enough search on very large number of documents?
etc.
In short, what would make you take Apache SOLR or Elastic nowadays, instead of MySQL or other relational database with their increased Full-Text search capabilities? Why are Apache SOLR and Elastic Search still that popular when using them requires another stack of resources and administration if you already have data in your relational or NoSQL database?
So the central question is: If I have system that uses MySQL database for data storage, and I need to add full text search capabilities for one or several fields, to include fuzzy search (typos), synonyms, stemming, to handle relevancy and ranking in custom way, is it generally better to use MySQL FTS (so no need for another stack of resources and administration) or a dedicated full text search engine like Apache SOLR or Elastic search is significantly enough better at this?
...ANSWER
Answered 2021-Nov-09 at 19:30Specialized indexing solutions like Apache Solr, ElasticSearch, Sphinx Search are usually faster than the built-in fulltext indexing of MySQL or GIST of PostreSQL, etc. The specialized solutions often have more features like stemming, more sophisticated searching including faceting, and also storing extra data in a "document" associated with the indexed text.
On the other hand, using one of those complementary solutions means extra complexity to copy data into the indexing solution. How frequently do you need to update the index? Is it efficient to update the index incrementally, or do you basically need to clobber the index and create a fresh index from your whole dataset?
Whereas using the builtin indexing features of your RDBMS have the advantage that the index is probably kept in sync with the most recent data updates automatically. And the search capabilities may be good enough for your needs. Keeping the index maintenance simple and automated has a lot of positive value.
Besides, any of the solutions, even a sub-optimal one, is orders of magnitude better than the naïve approach many developers use: textcolumn LIKE '%keyword%'
what would make you take Apache SOLR or Elastic nowadays, instead of MySQL or other relational database with their increased Full-Text search capabilities?
Better performance, more sophisticated search support, and it helps to move those expensive search queries to a dedicated search engine, and lighten the load on your RDBMS.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install Text-Search-Engine
You can use Text-Search-Engine like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page