opensearch | simple formats for the sharing of search results | Search Engine library

by dewitt Python Version: Current License: CC-BY-SA-4.0

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | opensearch Summary

opensearch is a Python library typically used in Database, Search Engine applications. opensearch has no bugs, it has a Strong Copyleft License and it has high support. However opensearch has 9 vulnerabilities and it build file is not available. You can download it from GitHub.

OpenSearch is a collection of simple formats for the sharing of search results. The most recent version of the specification is OpenSearch 1.1 Draft 6.

Support

Quality

Security

License

Reuse

Support

opensearch has a highly active ecosystem.

It has 632 star(s) with 132 fork(s). There are 52 watchers for this library.

It had no major release in the last 6 months.

There are 17 open issues and 7 have been closed. On average issues are closed in 92 days. There are 4 open pull requests and 0 closed requests.

It has a negative sentiment in the developer community.

The latest version of opensearch is current.

Quality

opensearch has 0 bugs and 0 code smells.

Security

opensearch has 9 vulnerability issues reported (0 critical, 3 high, 6 medium, 0 low).

opensearch code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

opensearch is licensed under the CC-BY-SA-4.0 License. This license is Strong Copyleft.

Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

Reuse

opensearch releases are not available. You will need to build from source code and install.

opensearch has no build file. You will be need to create the build yourself to build the component from source.

It has 92206 lines of code, 2 functions and 3 files.

It has low code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed opensearch and discovered the below as its top functions. This is intended to give you an instant insight into opensearch implemented functionality, and help decide if they suit your requirements.

Extract MediaWiki into a directory .
Convert to markdown .

Get all kandi verified functions for this library.

opensearch Key Features

No Key Features are available at this moment for opensearch.

opensearch Examples and Code Snippets

No Code Snippets are available at this moment for opensearch.

Community Discussions

Trending Discussions on opensearch

aws opensearch: Why are similar sets of data ranked differently

Uncaught Type error after adding context component in react.js

AWS OpenSearch Instance Types - better to have few bigger or more smaller instances?

Getting mapper_parsing_exception in OpenSearch

How can I set compatibility mode for Amazon OpenSearch using CloudFormation?

How can AWS Kinesis Firehose lambda send update and delete requests to ElasticSearch?

If all fields for a column are null, OpenSearch does not include that field, and so sorting on that field fails

Fluent Bit does not send logs from my EKS custom applications

Writing to a file parallely while processing in a loop in python

How to scrap data when the site kind of doesn't allow it?

QUESTION

aws opensearch: Why are similar sets of data ranked differently

Asked 2022-Apr-01 at 08:57

I have set up an AWS Opensearch instance with pretty much everything set to default values. i then have inserted some data regarding hotels. When the user searches like Good Morning B my resulting query POST request looks like this:

...

ANSWER

Answered 2022-Apr-01 at 08:57

The number of documents is counted not for the whole index by Elasticsearch but by the underlying Lucene engine, and it's done per shard (each shard is a complete Lucene index). Since your documents are (probably) in different shards, their score turns out slightly different.

Source https://stackoverflow.com/questions/71703677

QUESTION

Uncaught Type error after adding context component in react.js

Asked 2022-Mar-31 at 09:25

I have added a new context component MovieContext.js, but it is causing an uncaught type error. I had a look online and it is apparently caused when trying to render multiple children. This I do not understand because as far as I can workout I am only trying to render one.

Error

...

ANSWER

Answered 2022-Mar-31 at 09:25

You have a named export for MovieProvider and in the same file a default export for MovieContext;

Source https://stackoverflow.com/questions/71689410

QUESTION

AWS OpenSearch Instance Types - better to have few bigger or more smaller instances?

Asked 2022-Mar-30 at 09:20

I am a junior dev ops engineer and have this very basic question.

My team is currently working on providing an AWS OpenSearch cluster. Due to the type of our problem, we require the storage-optimized instances. From the amazon documentation I found that they recommend a minimum number of 3 nodes. The required storage size is known to me, in the OpenSearch Service pricing calculator I found that I can either choose 10 i3.large instances or 5 i3.xlarge ones. I checked the prices, they are the same.

So my question is, when I am faced with such a problem, do I choose the lesser bigger instances or the bigger number of smaller instances? I am particularly interested in the reason.

Thank you!

...

ANSWER

Answered 2022-Mar-30 at 09:20

Each VM has some overhead for the OS so 10 smaller instances would have less compute and RAM available for ES in total than 5 larger instances. Also, if you just leave the default index settings (5 primary shards, 1 replica) and actively write to only 1 index at a time, you'll effectively have only 5 nodes indexing data for you (and these nodes will have less bandwidth because they are smaller).

So, I would usually recommend running a few larger instances instead of many smaller ones. There are some special cases where it won't be true (like a concurrent-search-heavy cluster) but for those, I'd recommend going with even larger instances in the first place.

Source https://stackoverflow.com/questions/71653510

QUESTION

Getting mapper_parsing_exception in OpenSearch

Asked 2022-Mar-30 at 03:46

I'm new to OpenSearch, and I'm following the indexing pattern mentioned here for a POC.

I'm trying to test the mapping mentioned here : https://github.com/spryker/search/blob/master/src/Spryker/Shared/Search/IndexMap/search.json in OpenSearch dev console.

...

ANSWER

Answered 2022-Mar-30 at 03:46

You need to replace page by _doc (or remove it altogether) as there's no more mapping types

Source https://stackoverflow.com/questions/71671231

QUESTION

How can I set compatibility mode for Amazon OpenSearch using CloudFormation?

Asked 2022-Mar-07 at 12:37

Since AWS has replaced ElasticSearch with OpenSearch, some clients have issues connecting to the OpenSearch Service.

To avoid that, we can enable compatibility mode during the cluster creation.

Certain Elasticsearch OSS clients, such as Logstash, check the cluster version before connecting. Compatibility mode sets OpenSearch to report its version as 7.10 so that these clients continue to work with the service.

I'm trying to use CloudFormation to create a cluster using AWS::OpenSearchService::Domain instead of AWS::Elasticsearch::Domain but I can't see a way to enable compatibility mode.

...

ANSWER

Answered 2021-Nov-10 at 11:23

The AWS::OpenSearchService::Domain CloudFormation resource has a property called AdvancedOptions.

As per documentation, you should pass override_main_response_version to the advanced options to enable compatibility mode.

Example:

Source https://stackoverflow.com/questions/69911285

QUESTION

How can AWS Kinesis Firehose lambda send update and delete requests to ElasticSearch?

Asked 2022-Mar-03 at 17:39

I'm not seeing how an AWS Kinesis Firehose lambda can send update and delete requests to ElasticSearch (AWS OpenSearch service).

Elasticsearch document APIs provides for CRUD operations: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html

The examples I've found deals with the Create case, but doesn't show how to do delete or update requests. https://aws.amazon.com/blogs/big-data/ingest-streaming-data-into-amazon-elasticsearch-service-within-the-privacy-of-your-vpc-with-amazon-kinesis-data-firehose/ https://github.com/amazon-archives/serverless-app-examples/blob/master/python/kinesis-firehose-process-record-python/lambda_function.py

The output format in the examples do not show a way to specify create, update or delete requests:

...

ANSWER

Answered 2022-Mar-03 at 04:20

Firehose uses lambda function to transform records before they are being delivered to the destination in your case OpenSearch(ES) so they are only used to modify the structure of the data but can't be used to influence CRUD actions. Firehose can only insert records into a specific index. If you need a simple option to remove records from ES index after a certain period of time have a look at "Index rotation" option when specifying destination for your Firehose stream.

If you want to use CRUD actions with ES and keep using Firehose I would suggest to send records to S3 bucket in the raw format and then trigger a lambda function on object upload event that will perform a CRUD action depending on fields in your payload.

A good example of performing CRUD actions against ES from lambda https://github.com/chankh/ddb-elasticsearch/blob/master/src/lambda_function.py

This particular example is built to send data from DynamoDB streams into ES but it should be a good starting point for you

Source https://stackoverflow.com/questions/71326537

QUESTION

If all fields for a column are null, OpenSearch does not include that field, and so sorting on that field fails

Asked 2022-Mar-03 at 07:33

When adding sorting configuration for data in OpenSearch, I came across a situation where the data's field that I want to sort on had only null values. OpenSearch return an error that says [query_shard_exception] Reason: No mapping found for [MY_NULL_FIELD] in order to sort on. That being said, if I add ONE value, then the sort functions as expected. Is there a way around this?

...

ANSWER

Answered 2022-Mar-03 at 04:41

You can define null_value properties while configuring index mapping.

Source https://stackoverflow.com/questions/71332151

QUESTION

Fluent Bit does not send logs from my EKS custom applications

Asked 2022-Mar-01 at 09:40

I am using AWS Opensearch to retrieve the logs from all my Kubernetes applications. I have the following pods: Kube-proxy, Fluent-bit, aws-node, aws-load-balancer-controller, and all my apps (around 10).

While fluent-bit successfully send all the logs from Kube-proxy, Fluent-bit, aws-node and aws-load-balancer-controller, none of the logs from my applications are sent. My applications had DEBUG, INFO, ERROR logs, and none are sent by fluent bit.

Here is my fluent bit configuration:

...

ANSWER

Answered 2022-Feb-25 at 15:15

have you seen this article from official side? Pay attention on Log files overview section.

When deploying Fluent Bit to Kubernetes, there are three log files that you need to pay attention to. C:\k\kubelet.err.log

Also you can find Fluent GitHub Community and create an issue there to have better support from its contributors

There is a Slack channel for Fluent

Source https://stackoverflow.com/questions/71262479

QUESTION

Writing to a file parallely while processing in a loop in python

Asked 2022-Feb-23 at 19:25

I have a CSV data of 65K. I need to do some processing for each csv line which generates a string at the end. I have to write/append that string in a file.

Psuedo Code:

...

ANSWER

Answered 2022-Feb-23 at 19:25

Q : " Writing to a file parallely while processing in a loop in python ... "

A :
Frankly speaking, the file-I/O is not your performance-related enemy.

_{"With all due respect to the colleagues, Python (since ever) used GIL-lock to avoid any level of concurrent execution ( actually re-SERIAL-ising the code-execution flow into dancing among any amount of threads, lending about 100 [ms] of code-interpretation time to one-AFTER-another-AFTER-another, thus only increasing the interpreter's overhead times ( and devastating all pre-fetches into CPU-core caches on each turn ... paying the full mem-I/O costs on each next re-fetch(es) ). So threading is ANTI-pattern in python (except, I may accept, for network-(long)-transport latency masking ) – user3666197 44 mins ago "}

Given about the 65k files, listed in CSV, ought get processed ASAP, the performance-tuned orchestration is the goal, file-I/O being just a negligible ( and by-design well latency-maskable ) part thereof_{( which does not mean, we can't screw it even more ( if trying to organise it in another performance-devastating ANTI-pattern ), can we? )}

Tip #1 : avoid & resist to use any low-hanging fruit SLOCs if The Performance is the goal

If the code starts with a cheapest-ever iterator-clause,
be it a mock-up for aRow in aCsvDataSET: ...
or the real-code for i in range( len( queries ) ): ... - these (besides being known for ages to be awfully slow part of the python code-interpretation capabilites, the second one being even an iterator-on-range()-iterator in Py3 and even a silent RAM-killer in Py2 ecosystem for any larger sized ranges) look nice in "structured-programming" evangelisation, as they form a syntax-compliant separation of a deeper-level part of the code, yet it does so at an awfully high costs impacts due to repetitively paid overhead-costs accumulation. A finally injected need to "coordinate" unordered concurrent file-I/O operations, not necessary in principle at all, if done smart, are one such example of adverse performance impacts if such a trivial SLOC's ( and similarly poor design decisions' ) are being used.

Better way?

a ) avoid the top-level (slow & overhead-expensive) looping
b ) "split" the 65k-parameter space into not much more blocks than how many memory-I/O-channels are present on your physical device ( the scoring process, I can guess from the posted text, is memory-I/O intensive, as some model has to go through all the texts for scoring to happen )
c ) spawn n_jobs-many process workers, that will joblib.Parallel( n_jobs = ... )( delayed( <_scoring_fun_> )( block_start, block_end, ...<_params_>... ) ) and run the scoring_fun(...) for such distributed block-part of the 65k-long parameter space.
d ) having computed the scores and related outputs, each worker-process can and shall file-I/O its own results in its private, exclusively owned, conflicts-prevented output file
e ) having finished all partial block-parts' processing, the main-Python process can just join the already ( just-[CONCURRENTLY] created, smoothly & non-blocking-ly O/S-buffered / interleaved-flow, real-hardware-deposited ) stored outputs, if such a need is ...,
and
finito - we are done ( knowing there is no faster way to compute the same block-of-tasks, that are principally embarrasingly independent, besides the need to orchestrate them collision-free with minimised-add-on-costs).

If interested in tweaking a real-system End-to-End processing-performance,
start with lstopo-map
next verify the number of physical memory-I/O-channels
and
may a bit experiment with Python joblib.Parallel()-process instantiation, under-subscribing or over-subscribing the n_jobs a bit lower or a bit above the number of physical memory-I/O-channels. If the actual processing has some, hidden to us, maskable latencies, there might be chances to spawn more n_jobs-workers, until the End-to-End processing performance keeps steadily growing, until a system-noise hides any such further performance-tweaking effects

A Bonus part - why un-managed sources of latency kill The Performance

Source https://stackoverflow.com/questions/71233138

QUESTION

How to scrap data when the site kind of doesn't allow it?

Asked 2022-Feb-20 at 08:45

I have been trying to scrap data from https://gov.gitcoin.co/u/owocki/summary using python's BeautifulSoup. image: https://i.stack.imgur.com/0EgUk.png

Inspecting the page with Dev tools gives an idea but with the following code, I'm not getting the full HTML code returned or as it seems the site isn't allowing scraping if I'm correct.

...

ANSWER

Answered 2022-Feb-20 at 08:45

What happens?

As mentioned in the comments content of website is provided dynamically, so you won't get your information with requests on that specific ressource / url, cause it is not able to render the website like a browser would do.

How to fix?

It do not need beautifulsoup for that task, cause there are ressources that will give you structured json data:

Source https://stackoverflow.com/questions/71192177

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install opensearch

You can download it from GitHub.
You can use opensearch like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: