elasticsearch-py | Official Python client for Elasticsearch | REST library
kandi X-RAY | elasticsearch-py Summary
kandi X-RAY | elasticsearch-py Summary
Official Python client for Elasticsearch
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Submit search results
- Perform a request
- Escape value
- Updates the query by index
- Performs a request
- Return the current stack level
- Lists jobs
- Updates a job
- Get data feed feeds
- Search a template by index
- Perform search
- Performs a search query
- Update an index
- Execute a transformation
- Get ML trained models
- Create auto follow pattern
- Put a mapping
- Creates a new job
- Update a datafeed
- Creates a new data feed
- Delete documents by index
- Update index by query
- Perform a search
- Perform a search operation
elasticsearch-py Key Features
elasticsearch-py Examples and Code Snippets
import elasticsearch
import jsontableschema_es
INDEX_NAME = 'testing_index'
# Connect to Elasticsearch instance running on localhost
es=elasticsearch.Elasticsearch()
storage=jsontableschema_es.Storage(es)
# List all indexes
print(list(storage.buck
{
"fields": [
{
"name": "my-number",
"type": "number"
},
{
"name": "my-array-of-dates",
"type": "array",
"es:itemType": "date"
},
{
"name": "my-person-object",
"type": "object",
"e
from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_auth_aws_sigv4 import AWSSigV4
es_host = 'search-service-foobar.us-east-1.es.amazonaws.com'
aws_auth = AWSSigV4('es')
# use the requests connection_class and pass in our
Community Discussions
Trending Discussions on elasticsearch-py
QUESTION
Does the AsyncElasticsearch client open a new session for each async request?
AsyncElasticsearch (from elasticsearch-py) uses AIOHTTP. From what I understand, AIOHTTP recommends a using a context manager for the aiohttp.ClientSession
object, so as to not generate a new session for each request:
ANSWER
Answered 2022-Mar-01 at 14:20Turns out the AsyncElasticsearch was not the right client to speed up bulk ingests in this case. I use the helpers.parallel_bulk() function instead.
QUESTION
My query is simple - add field:value to existing doc, but it fails with error of document_missing_exception. the code below is without parameters to make it easy to view i use opensearch py client and set the index,co_type as that indx, id of the document and query body, as seen in previous post How to update a document using elasticsearch-py?
...ANSWER
Answered 2022-Feb-17 at 12:03What is the Elasticsearch version you are using?
Please try by giving
QUESTION
I use elasticsearch-dsl in order to query Elasticsearch in python.
I want to search documents with text
field and get all documents that created
field of them is less than datetime.now()
.
I execute the following query but elasticsearch raises error.
ANSWER
Answered 2022-Feb-04 at 22:13You can't combine few types of queries like this, use bool:
QUESTION
I'm working with an elastic API on a url like https://something.xyzw.eg/api/search/advance (not the real url). The API works fine on postman. Also the python code generated by postman works fine and returns results. However when using leasticsearch-dsl package I keep getting:
Failed to establish a new connection: [Errno -2] Name or service not known)
Here is my code similar to the first example on documents:
...ANSWER
Answered 2022-Jan-29 at 09:46Can you try to add port=443
as in one of the examples from the doc you mentioned https://elasticsearch-py.readthedocs.io/en/v7.16.3/#tls-ssl-and-authentication ?
QUESTION
I need to come up with a strategy to process and update documents in an elasticsearch index periodically and efficiently. I do not have to look at documents that I processed before.
My setting is that I have a long running process, which continuously inserts documents to an index, say approx. 500 documents per hour (think about the common logging example).
I need to find a solution to update some amount of documents periodically (via cron job, e.g) to run some code on a specific field (text field, eg.) to enhance that document with a number of new fields. I want to do this to offer more fine grained aggregations on the index. In the logging analogy, this could be, e.g., I get the UserAgent-string from a log entry (document), do some parsing on that, and add some new fields back to that document and index it.
So my approach would be:
- Get some amount of documents (or even all) that I haven't looked at before. I could query them by combining
must_not
andexists
, for instance. - Run my code on these documents (run the parser, compute some new stuff, whatever).
- Update the documents obtained previously (probably most preferably via bulk api).
I know there is the Update by query API. But this does not seem to be right here, since I need to run my own code (which btw depends on external libraries), on my server and not as a painless script, which would not offer that comprehensive tasks I need.
I am accessing elasticsearch via python.
The problem is now that I don't know how to implement the above approach. E.g. what if the amount of document obtained in step 1. is larger than myindex.settings.index.max_result_window
?
Any ideas?
...ANSWER
Answered 2022-Jan-12 at 10:21I considered @Jay's comment and ended up with this pattern, for the moment:
QUESTION
Here is an aggregation query that works as expected when I use dev tools in on Elastic Search :
search_query = {
"aggs": {
"SHAID": {
"terms": {
"field": "identiferid",
"order": {
"sort": "desc"
},
# "size": 100000
},
"aggs": {
"update": {
"date_histogram": {
"field": "endTime",
"calendar_interval": "1d"
},
"aggs": {
"update1": {
"sum": {
"script": {
"lang": "painless",
"source":"""
if (doc['distanceIndex.att'].size()!=0) {
return doc['distanceIndex.att'].value;
}
else {
if (doc['distanceIndex.att2'].size()!=0) {
return doc['distanceIndex.att2'].value;
}
return null;
}
"""
}
}
},
"update2": {
"sum": {
"script": {
"lang": "painless",
"source":"""
if (doc['distanceIndex.att3'].size()!=0) {
return doc['distanceIndex.att3'].value;
}
else {
if (doc['distanceIndex.at4'].size()!=0) {
return doc['distanceIndex.att4'].value;
}
return null;
}
"""
}
}
},
}
},
"sort": {
"sum": {
"field": "time2"
}
}
}
}
},
"size": 0,
"query": {
"bool": {
"filter": [
{
"match_all": {}
},
{
"range": {
"endTime": {
"gte": "2021-11-01T00:00:00Z",
"lt": "2021-11-03T00:00:00Z"
}
}
}
]
}
}
}
...ANSWER
Answered 2021-Nov-07 at 14:39helpers.scan
is a
Simple abstraction on top of the scroll() api - a simple iterator that yields all hits as returned by underlining scroll requests.
It's meant to iterate through large result sets and comes with a default keyword argument of size=1000
To run an aggregation, use the es_client.search()
method directly, passing in your query as body
, and including "size": 0
in the query should be fine.
QUESTION
ElasticSearch 7.10.2
Python 3.8.5
elasticsearch-py 7.12.1
I'm trying to do a bulk insert of 100,000 records to ElasticSearch using elasticsearch-py bulk helper.
Here is the Python code:
...ANSWER
Answered 2021-May-18 at 13:29Reduce the chunk_size
from 10000 to the default of 500 and I'd expect it to work. You probably want to disable automatic retries if that can give you duplicates.
When creating your Elasticsearch
object, you specified chunk_size=10000
. This means that the streaming_bulk
call will try to insert chunks of 10000 elements. The connection to elasticsearch has a configurable timeout, which by default is 10 seconds. So, if your elasticsearch server takes more than 10 seconds to process the 10000 elements you want to insert, a timeout will happen and this will be handled as an error.
When creating your Elasticsearch
object, you also specified retry_on_timeout
as True and in the streaming_bulk_call
you set max_retries=max_insert_retries
, which is 3.
This means that when such a timeout happens, the library will try reconnecting 3 times, however, when the insert still has a timeout after that, it will give you the error you noticed. (Documentation)
Also, when the timeout happens, the library can not know whether the documents were inserted successfully, so it has to assume that they were not. Thus, it will try to insert the same documents again. I don't know how your input lines look like, but if they do not contain an _id
field, this would create duplicates in your index. You probably want to prevent this -- either by adding some kind of _id
, or by disabling the automatic retry and handling it manually.
There is two ways you can go about this:
- Increase the
timeout
- Reduce the
chunk_size
streaming_bulk
by default has chunk_size
set to 500. Your 10000 is much higher. I don't you can expect a high performance gain when increasing this over 500, so I'd advice you to just use the default of 500 here. If 500 still fails with a timeout, you may even want to reduce it further. This could happen if the documents you want to index are very complex.
You could also increase the timeout for the streaming_bulk
call, or, alternatively, for your es
object. To only change it for the streaming_bulk
call, you can provide the request_timeout
keyword argument:
QUESTION
I have a fastAPI app that posts two requests, one of them is longer (if it helps, they're Elasticsearch queries and I'm using the AsyncElasticsearch module which already returns coroutine). This is my attempt:
...ANSWER
Answered 2021-Apr-02 at 09:30Yes, that's correct the coroutine won't proceed until the results are ready. You can use asyncio.gather to run tasks concurrently:
QUESTION
I want to use a remote Elasticsearch server for my website.
I have used elastic.co/
cloud service to create a remote Elasticsearch server.
I can connect/ping remote Elasticsearch server using the following command (it is scrubbed of sensitive info):
curl -u username:password https://55555555555bb0c30d1cba4e9e6.us-central1.gcp.cloud.es.io:9243
After tying this command into terminal, I receive the following response:
...ANSWER
Answered 2021-Jan-20 at 19:06You need to connect using TLS/SSL and Authentication as described in the documentation.
In your case you should use something like this.
QUESTION
I have a strange behavior on my ES7.8 cluster, when inserting data using elasticsearch.helpers.streaming_bulk it will say this strange error:
...ANSWER
Answered 2021-Jan-18 at 14:25Turns out it's my own stupidity. I specified filter_path=['hits.hits._id'])
parameter all the way down during bulk request.
Thanks for the tip @Val.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install elasticsearch-py
You can use elasticsearch-py like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page