address-index-api | Address Index is an application which resolves addresses | Parser library
kandi X-RAY | address-index-api Summary
kandi X-RAY | address-index-api Summary
Address Index is a Play Framework (2.8.8) application which matches addresses. The system works via large Elasticsearch (7.9.3) indices build primarily from AddressBase Premium data. The input can be a complete address (from any source), and the system uses advanced data science techniques to determine the most likely matching AddressBase entries with UPRNs (Unique Property Reference Numbers). Addresses can be matched one at a time or in batches. Additional functions exist for postcode searching and partial address string matching for typeaheads. Plans to deploy the application as a service available to all members of the Public Sector Mapping Agreement have been put on hold for the duration of the Census test. The support team are awaiting a decision on the future of this.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of address-index-api
address-index-api Key Features
address-index-api Examples and Code Snippets
Community Discussions
Trending Discussions on address-index-api
QUESTION
Our ES is fairly slow, we did not optimize it (and the query) yet, but according to this link, request rejection from Elastic is a form of a feedback that asks to slow down and adapt the size of the bulk.
We built a form of a back pressure where the size of a blocking bulk (a list of individual requests sent at the same time, we do not use MSearch yet) depends on how many requests were rejected in the previous bulk. We wait for current bulk to finish before starting a new one. Obviously all rejected requests are re-injected into the request-queue (in a form of a data needed to construct the query). For example if our Elastic can handle 500 simultaneous requests and we send 600, some of them will be rejected and the new size will be reduced to 480 (20% off).
What we found out was that ES returns different results for the previously rejected requests. For example it may return something like the expected result, but with an offset of 2. We also have missing results where an address should have 1 result, but has none due to this bug.
If the bulk size is less than the threshold that ES can handle, everything goes as expected and we get expected results.
It doesn't look like it's the library's (elastic4s) problem.
Elastic configuration: 2 nodes with 5 shards each
Per node: 2 CPU, 32 GB ram, 16 GB heap. Everything else is default
I couldn't find any information on the internet, did anyone have this problem? What was the solution?
What we tried so far:
Thread.sleep
between bulks as the link above suggests.Removing cache on query level as well as removing it from the index.
Trying same index on a different (slower) hardware.
Verified that it's not a race-condition (in our code) problem.
Update:
What the query like.
Thread pool for search:
"search" : {
"type" : "fixed",
"min" : 4,
"max" : 4,
"queue_size" : 1000
},
2nd UPDATE:
We also tried setting preference to our query (thinking that it was a problem with shards): .preference(Preference.Primary)
with no positive result (they were even more random than before). Two consecutive runs with this setting give different "random" results, so this is not consistent.
ANSWER
Answered 2017-Feb-22 at 10:44The reason for inconsistent results was that Elastic replies with Success
if at least 1 shard had a result. So basically if only one of our 5 shards succeeded, the request would return a successful result with only 20% of the data.
As seen here and here, this is not a bug, this is a feature. Elastic prefers to return some (albeit, inconsistent) result instead of not returning anything.
The solution to this problem is either to use only one shard or to treat more than 0 failed shards as a general request failure using following object that each ES response has:
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install address-index-api
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page