elasticsearch-py | Official Python client for Elasticsearch | REST library

by elastic Python Version: v8.8.0 License: Apache-2.0

X-Ray Key Features Code Snippets(3)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | elasticsearch-py Summary

elasticsearch-py is a Python library typically used in Web Services, REST applications. elasticsearch-py has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has high support. You can install using 'pip install elasticsearch-py' or download it from GitHub, PyPI.

Official Python client for Elasticsearch

Support

Quality

Security

License

Reuse

Support

elasticsearch-py has a highly active ecosystem.

It has 3968 star(s) with 1154 fork(s). There are 396 watchers for this library.

It had no major release in the last 12 months.

There are 37 open issues and 978 have been closed. On average issues are closed in 138 days. There are 10 open pull requests and 0 closed requests.

It has a positive sentiment in the developer community.

The latest version of elasticsearch-py is v8.8.0

Quality

elasticsearch-py has 0 bugs and 0 code smells.

Security

elasticsearch-py has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

elasticsearch-py code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

elasticsearch-py is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

elasticsearch-py releases are available to install and integrate.

Deployable package is available in PyPI.

Build file is available. You can build the component from source.

It has 48667 lines of code, 1284 functions and 132 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed elasticsearch-py and discovered the below as its top functions. This is intended to give you an instant insight into elasticsearch-py implemented functionality, and help decide if they suit your requirements.

Submit search results
Perform a request
Escape value
Updates the query by index
Performs a request
Return the current stack level
Lists jobs
Updates a job
Get data feed feeds
Search a template by index
Perform search
Performs a search query
Update an index
Execute a transformation
Get ML trained models
Create auto follow pattern
Put a mapping
Creates a new job
Update a datafeed
Creates a new data feed
Delete documents by index
Update index by query
Perform a search
Perform a search operation

Get all kandi verified functions for this library.

elasticsearch-py Key Features

No Key Features are available at this moment for elasticsearch-py.

elasticsearch-py Examples and Code Snippets

tableschema-elasticsearch-py,Documentation,Usage overview

Python

Lines of Code : 62

License : Permissive (MIT)

Copy

import elasticsearch
import jsontableschema_es

INDEX_NAME = 'testing_index'

# Connect to Elasticsearch instance running on localhost
es=elasticsearch.Elasticsearch()
storage=jsontableschema_es.Storage(es)

# List all indexes
print(list(storage.buck

tableschema-elasticsearch-py,Documentation,Mappings

Python

Lines of Code : 42

License : Permissive (MIT)

Copy

{
  "fields": [
    {
      "name": "my-number",
      "type": "number"
    },
    {
      "name": "my-array-of-dates",
      "type": "array",
      "es:itemType": "date"
    },
    {
      "name": "my-person-object",
      "type": "object",
      "e

requests-auth-aws-sigv4,Usage,Usage with Elasticsearch Client (elasticsearch-py)

Python

Lines of Code : 12

License : Permissive (Apache-2.0)

Copy

from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_auth_aws_sigv4 import AWSSigV4

es_host = 'search-service-foobar.us-east-1.es.amazonaws.com'
aws_auth = AWSSigV4('es')

# use the requests connection_class and pass in our

Community Discussions

Trending Discussions on elasticsearch-py

Does the AsyncElasticsearch client use the same session for async actions?

"document_missing_exception" at Opensearch client.update (python)

Elasticsearch - How to add range filter to search query

elasticsearch on python gives name or service not known

Periodically process and update documents in elasticsearch index

Aggregation query fails using ElasticSearch Python client

Elasticsearch Bulk insert w/ Python - socket timeout error

python3.6 async/await still works synchronously with fastAPI

Connecting to remote Elasticsearch server with python's Elasticsearch package

Elasticsearch _bulk returns empty dict

QUESTION

Does the AsyncElasticsearch client use the same session for async actions?

Asked 2022-Mar-01 at 14:20

Does the AsyncElasticsearch client open a new session for each async request?

AsyncElasticsearch (from elasticsearch-py) uses AIOHTTP. From what I understand, AIOHTTP recommends a using a context manager for the aiohttp.ClientSession object, so as to not generate a new session for each request:

...

ANSWER

Answered 2022-Mar-01 at 14:20

Turns out the AsyncElasticsearch was not the right client to speed up bulk ingests in this case. I use the helpers.parallel_bulk() function instead.

Source https://stackoverflow.com/questions/70455506

QUESTION

"document_missing_exception" at Opensearch client.update (python)

Asked 2022-Feb-17 at 12:03

My query is simple - add field:value to existing doc, but it fails with error of document_missing_exception. the code below is without parameters to make it easy to view i use opensearch py client and set the index,co_type as that indx, id of the document and query body, as seen in previous post How to update a document using elasticsearch-py?

...

ANSWER

Answered 2022-Feb-17 at 12:03

What is the Elasticsearch version you are using?

Please try by giving

Source https://stackoverflow.com/questions/71144961

QUESTION

Elasticsearch - How to add range filter to search query

Asked 2022-Feb-04 at 22:13

I use elasticsearch-dsl in order to query Elasticsearch in python.
I want to search documents with text field and get all documents that created field of them is less than datetime.now().
I execute the following query but elasticsearch raises error.

...

ANSWER

Answered 2022-Feb-04 at 22:13

You can't combine few types of queries like this, use bool:

Source https://stackoverflow.com/questions/70990983

QUESTION

elasticsearch on python gives name or service not known

Asked 2022-Jan-29 at 11:01

I'm working with an elastic API on a url like https://something.xyzw.eg/api/search/advance (not the real url). The API works fine on postman. Also the python code generated by postman works fine and returns results. However when using leasticsearch-dsl package I keep getting:

Failed to establish a new connection: [Errno -2] Name or service not known)

Here is my code similar to the first example on documents:

...

ANSWER

Answered 2022-Jan-29 at 09:46

Can you try to add port=443 as in one of the examples from the doc you mentioned https://elasticsearch-py.readthedocs.io/en/v7.16.3/#tls-ssl-and-authentication ?

Source https://stackoverflow.com/questions/70903867

QUESTION

Periodically process and update documents in elasticsearch index

Asked 2022-Jan-12 at 10:21

I need to come up with a strategy to process and update documents in an elasticsearch index periodically and efficiently. I do not have to look at documents that I processed before.

My setting is that I have a long running process, which continuously inserts documents to an index, say approx. 500 documents per hour (think about the common logging example).

I need to find a solution to update some amount of documents periodically (via cron job, e.g) to run some code on a specific field (text field, eg.) to enhance that document with a number of new fields. I want to do this to offer more fine grained aggregations on the index. In the logging analogy, this could be, e.g., I get the UserAgent-string from a log entry (document), do some parsing on that, and add some new fields back to that document and index it.

So my approach would be:

Get some amount of documents (or even all) that I haven't looked at before. I could query them by combining must_not and exists, for instance.
Run my code on these documents (run the parser, compute some new stuff, whatever).
Update the documents obtained previously (probably most preferably via bulk api).

I know there is the Update by query API. But this does not seem to be right here, since I need to run my own code (which btw depends on external libraries), on my server and not as a painless script, which would not offer that comprehensive tasks I need.

I am accessing elasticsearch via python.

The problem is now that I don't know how to implement the above approach. E.g. what if the amount of document obtained in step 1. is larger than myindex.settings.index.max_result_window?

Any ideas?

...

ANSWER

Answered 2022-Jan-12 at 10:21

I considered @Jay's comment and ended up with this pattern, for the moment:

Source https://stackoverflow.com/questions/70656110

QUESTION

Aggregation query fails using ElasticSearch Python client

Asked 2021-Nov-07 at 14:39

Here is an aggregation query that works as expected when I use dev tools in on Elastic Search :  

   search_query = {
      "aggs": {
        "SHAID": {
          "terms": {
            "field": "identiferid",
            "order": {
              "sort": "desc"
            },
    #         "size": 100000
          },
          "aggs": {
            "update": {
              "date_histogram": {
                "field": "endTime",
                "calendar_interval": "1d"
              },
              "aggs": {
                "update1": {
                      "sum": {
                        "script": {
                          "lang": "painless",
                          "source":"""
                              if (doc['distanceIndex.att'].size()!=0) { 
                                  return doc['distanceIndex.att'].value;
                              } 
                              else { 
                                  if (doc['distanceIndex.att2'].size()!=0) { 
                                  return doc['distanceIndex.att2'].value;
                              }
                              return null;
                              }
                              """
                        }
                      }
                    },
                "update2": {
                         "sum": {
                        "script": {
                          "lang": "painless",
                          "source":"""
                              if (doc['distanceIndex.att3'].size()!=0) { 
                                  return doc['distanceIndex.att3'].value;
                              } 
                              else { 
                                  if (doc['distanceIndex.at4'].size()!=0) { 
                                  return doc['distanceIndex.att4'].value;
                              }
                              return null;
                              }
                              """
                        }
                      }
                  },
              }
            },
            "sort": {
              "sum": {
                "field": "time2"
              }
            }
          }
        }
      },
    "size": 0,
      "query": {
        "bool": {
          "filter": [
            {
              "match_all": {}
            },
            {
              "range": {
                "endTime": {
                  "gte": "2021-11-01T00:00:00Z",
                  "lt": "2021-11-03T00:00:00Z"
                }
              }
            }
          ]
        }
      }
    }

...

ANSWER

Answered 2021-Nov-07 at 14:39

helpers.scan is a

Simple abstraction on top of the scroll() api - a simple iterator that yields all hits as returned by underlining scroll requests.

It's meant to iterate through large result sets and comes with a default keyword argument of size=1000

To run an aggregation, use the es_client.search() method directly, passing in your query as body, and including "size": 0 in the query should be fine.

Source https://stackoverflow.com/questions/69870574

QUESTION

Elasticsearch Bulk insert w/ Python - socket timeout error

Asked 2021-May-18 at 13:29

ElasticSearch 7.10.2

Python 3.8.5

elasticsearch-py 7.12.1

I'm trying to do a bulk insert of 100,000 records to ElasticSearch using elasticsearch-py bulk helper.

Here is the Python code:

...

ANSWER

Answered 2021-May-18 at 13:29

TL;DR:

Reduce the chunk_size from 10000 to the default of 500 and I'd expect it to work. You probably want to disable automatic retries if that can give you duplicates.

What happened?

When creating your Elasticsearch object, you specified chunk_size=10000. This means that the streaming_bulk call will try to insert chunks of 10000 elements. The connection to elasticsearch has a configurable timeout, which by default is 10 seconds. So, if your elasticsearch server takes more than 10 seconds to process the 10000 elements you want to insert, a timeout will happen and this will be handled as an error.

When creating your Elasticsearch object, you also specified retry_on_timeout as True and in the streaming_bulk_call you set max_retries=max_insert_retries, which is 3.

This means that when such a timeout happens, the library will try reconnecting 3 times, however, when the insert still has a timeout after that, it will give you the error you noticed. (Documentation)

Also, when the timeout happens, the library can not know whether the documents were inserted successfully, so it has to assume that they were not. Thus, it will try to insert the same documents again. I don't know how your input lines look like, but if they do not contain an _id field, this would create duplicates in your index. You probably want to prevent this -- either by adding some kind of _id, or by disabling the automatic retry and handling it manually.

What to do?

There is two ways you can go about this:

Increase the timeout
Reduce the chunk_size

streaming_bulk by default has chunk_size set to 500. Your 10000 is much higher. I don't you can expect a high performance gain when increasing this over 500, so I'd advice you to just use the default of 500 here. If 500 still fails with a timeout, you may even want to reduce it further. This could happen if the documents you want to index are very complex.

You could also increase the timeout for the streaming_bulk call, or, alternatively, for your es object. To only change it for the streaming_bulk call, you can provide the request_timeout keyword argument:

Source https://stackoverflow.com/questions/67522617

QUESTION

python3.6 async/await still works synchronously with fastAPI

Asked 2021-May-13 at 17:37

I have a fastAPI app that posts two requests, one of them is longer (if it helps, they're Elasticsearch queries and I'm using the AsyncElasticsearch module which already returns coroutine). This is my attempt:

...

ANSWER

Answered 2021-Apr-02 at 09:30

Yes, that's correct the coroutine won't proceed until the results are ready. You can use asyncio.gather to run tasks concurrently:

Source https://stackoverflow.com/questions/66916601

QUESTION

Connecting to remote Elasticsearch server with python's Elasticsearch package

Asked 2021-Jan-20 at 19:06

I want to use a remote Elasticsearch server for my website.

I have used elastic.co/ cloud service to create a remote Elasticsearch server. I can connect/ping remote Elasticsearch server using the following command (it is scrubbed of sensitive info): curl -u username:password https://55555555555bb0c30d1cba4e9e6.us-central1.gcp.cloud.es.io:9243

After tying this command into terminal, I receive the following response:

...

ANSWER

Answered 2021-Jan-20 at 19:06

You need to connect using TLS/SSL and Authentication as described in the documentation.

In your case you should use something like this.

Source https://stackoverflow.com/questions/65814910

QUESTION

Elasticsearch _bulk returns empty dict

Asked 2021-Jan-18 at 14:25

I have a strange behavior on my ES7.8 cluster, when inserting data using elasticsearch.helpers.streaming_bulk it will say this strange error:

...

ANSWER

Answered 2021-Jan-18 at 14:25

Turns out it's my own stupidity. I specified filter_path=['hits.hits._id']) parameter all the way down during bulk request.

Thanks for the tip @Val.

Source https://stackoverflow.com/questions/65774395

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install elasticsearch-py

You can install using 'pip install elasticsearch-py' or download it from GitHub, PyPI.
You can use elasticsearch-py like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: