ann-benchmarks | Benchmarks of approximate nearest neighbor libraries | Machine Learning library
kandi X-RAY | ann-benchmarks Summary
kandi X-RAY | ann-benchmarks Summary
Doing fast searching of nearest neighbors in high dimensional spaces is an increasingly important problem, but so far there has not been a lot of empirical attempts at comparing approaches in an objective way. This project contains some tools to benchmark various implementations of approximate nearest neighbor (ANN) search for different metrics. We have pregenerated datasets (in HDF5) formats and we also have Docker containers for each algorithm. There’s a [test suite] that makes sure every algorithm works.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Generate plot
- Create a pointset from the data
- Get plot data
- Prepare the data from the data
- Run the command line interface
- Download src to dst
- Get dataset
- Run the algorithm
- Build the detail site
- Generate a plot
- Load all results from results
- Download kosarak data from Kosak
- Download and return the dataset
- Download a deep image
- Decorator to configure a query parameter
- Extract word vectors from word vectors
- Build index site
- Compute metrics for each run
- Compute the relative similarity metric
- Fit the model
- Compute the metrics for each prediction
- Establishes the network
- Fit the index to the index
- Calculate the index of the dataset
- Extracts the lastfm dataset
- Create an index from data
- Fits on the network
ann-benchmarks Key Features
ann-benchmarks Examples and Code Snippets
// reading data
let anndata = AnnBenchmarkData::new(fname).unwrap();
let nb_elem = anndata.train_data.len();
let max_nb_connection = 24;
let nb_layer = 16.min((nb_elem as f32).ln().trunc() as usize);
let ef_c = 400;
// al
Community Discussions
Trending Discussions on ann-benchmarks
QUESTION
I'm looking for an algorithm with the fastest time per query for a problem similar to nearest-neighbor search, but with two differences:
- I need to only approximately confirm (tolerating Type I and Type II error) the existence of a neighbor within some distance k or return the approximate distance of the nearest neighbor.
- I can query many at once
I'd like better throughput than the approximate nearest neighbor libraries out there (https://github.com/erikbern/ann-benchmarks) which seem better designed for single queries. In particular, the algorithmic relaxation of the first criteria seems like it should leave room for an algorithmic shortcut, but I can't find any solutions in the literature nor can I figure out how to design one.
Here's my current best solution, which operates at about 10k queries / sec on per CPU. I'm looking for something close to an order-of-magnitude speedup if possible.
...ANSWER
Answered 2020-Sep-21 at 04:54I'm a bit skeptical of benchmarks such as the one you have linked, as in my experience I have found that the definition of the problem at hand far outweighs in importance the merits of any one algorithm across a set of other (possibly similar looking) problems.
More simply put, an algorithm being a high performer on a given benchmark does not imply it will be a higher performer on the problem you care about. Even small or apparently trivial changes to the formulation of your problem can significantly change the performance of any fixed set of algorithms.
That said, given the specifics of the problem you care about I would recommend the following:
- use the cascading approach described in the paper [1]
- use SIMD operations (either SSE on intel chips or GPUs) to accelerate, the nearest neighbour problem is one where operations closer to the metal and parallelism can really shine
- tune the parameters of the algorithm to maximize your objective; in particular, the algorithm of [1] has a few easy to tune parameters which will dramatically trade performance for accuracy, make sure you perform a grid search over these parameters to set them to the sweet spot for your problem
Note: I have recommended the paper [1] because I have tried many of the algorithms listed in the benchmark you linked and found them all inferior (for the task of image reconstruction) to the approach listed in [1] while at the same time being much more complicated than [1], both undesirable properties. YMMV depending on your problem definition.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install ann-benchmarks
You can use ann-benchmarks like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page