ann-benchmarks | Benchmarks of approximate nearest neighbor libraries | Machine Learning library

by erikbern Python Version: Current License: MIT

X-Ray Key Features Code Snippets(1)Community Discussions(1)Vulnerabilities Install Support

kandi X-RAY | ann-benchmarks Summary

ann-benchmarks is a Python library typically used in Artificial Intelligence, Machine Learning applications. ann-benchmarks has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. You can download it from GitHub.

Doing fast searching of nearest neighbors in high dimensional spaces is an increasingly important problem, but so far there has not been a lot of empirical attempts at comparing approaches in an objective way. This project contains some tools to benchmark various implementations of approximate nearest neighbor (ANN) search for different metrics. We have pregenerated datasets (in HDF5) formats and we also have Docker containers for each algorithm. There’s a [test suite] that makes sure every algorithm works.

Support

Quality

Security

License

Reuse

Support

ann-benchmarks has a medium active ecosystem.

It has 3712 star(s) with 560 fork(s). There are 106 watchers for this library.

It had no major release in the last 6 months.

There are 36 open issues and 112 have been closed. On average issues are closed in 396 days. There are 9 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of ann-benchmarks is current.

Quality

ann-benchmarks has 0 bugs and 0 code smells.

Security

ann-benchmarks has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

ann-benchmarks code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

ann-benchmarks is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

ann-benchmarks releases are not available. You will need to build from source code and install.

Build file is available. You can build the component from source.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi has reviewed ann-benchmarks and discovered the below as its top functions. This is intended to give you an instant insight into ann-benchmarks implemented functionality, and help decide if they suit your requirements.

Generate plot
Create a pointset from the data
Get plot data
Prepare the data from the data
Run the command line interface
Download src to dst
Get dataset
Run the algorithm
Build the detail site
Generate a plot
Load all results from results
Download kosarak data from Kosak
Download and return the dataset
Download a deep image
Decorator to configure a query parameter
Extract word vectors from word vectors
Build index site
Compute metrics for each run
Compute the relative similarity metric
Fit the model
Compute the metrics for each prediction
Establishes the network
Fit the index to the index
Calculate the index of the dataset
Extracts the lastfm dataset
Create an index from data
Fits on the network

Get all kandi verified functions for this library.

ann-benchmarks Key Features

No Key Features are available at this moment for ann-benchmarks.

ann-benchmarks Examples and Code Snippets

hnsw-rs,Examples and Benchmarks

Rust

Lines of Code : 22

License : Non-SPDX (NOASSERTION)

Copy

    //  reading data
    let anndata = AnnBenchmarkData::new(fname).unwrap();
    let nb_elem = anndata.train_data.len();
    let max_nb_connection = 24;
    let nb_layer = 16.min((nb_elem as f32).ln().trunc() as usize);
    let ef_c = 400;
    // al

Community Discussions

Trending Discussions on ann-benchmarks

If I relax some constraints, can I get an algorithmic shortcut on Approximate Nearest Neighbors?

QUESTION

If I relax some constraints, can I get an algorithmic shortcut on Approximate Nearest Neighbors?

Asked 2020-Sep-23 at 17:37

I'm looking for an algorithm with the fastest time per query for a problem similar to nearest-neighbor search, but with two differences:

I need to only approximately confirm (tolerating Type I and Type II error) the existence of a neighbor within some distance k or return the approximate distance of the nearest neighbor.
I can query many at once

I'd like better throughput than the approximate nearest neighbor libraries out there (https://github.com/erikbern/ann-benchmarks) which seem better designed for single queries. In particular, the algorithmic relaxation of the first criteria seems like it should leave room for an algorithmic shortcut, but I can't find any solutions in the literature nor can I figure out how to design one.

Here's my current best solution, which operates at about 10k queries / sec on per CPU. I'm looking for something close to an order-of-magnitude speedup if possible.

...

ANSWER

Answered 2020-Sep-21 at 04:54

I'm a bit skeptical of benchmarks such as the one you have linked, as in my experience I have found that the definition of the problem at hand far outweighs in importance the merits of any one algorithm across a set of other (possibly similar looking) problems.

More simply put, an algorithm being a high performer on a given benchmark does not imply it will be a higher performer on the problem you care about. Even small or apparently trivial changes to the formulation of your problem can significantly change the performance of any fixed set of algorithms.

That said, given the specifics of the problem you care about I would recommend the following:

use the cascading approach described in the paper [1]
use SIMD operations (either SSE on intel chips or GPUs) to accelerate, the nearest neighbour problem is one where operations closer to the metal and parallelism can really shine
tune the parameters of the algorithm to maximize your objective; in particular, the algorithm of [1] has a few easy to tune parameters which will dramatically trade performance for accuracy, make sure you perform a grid search over these parameters to set them to the sweet spot for your problem

Note: I have recommended the paper [1] because I have tried many of the algorithms listed in the benchmark you linked and found them all inferior (for the task of image reconstruction) to the approach listed in [1] while at the same time being much more complicated than [1], both undesirable properties. YMMV depending on your problem definition.

Source https://stackoverflow.com/questions/63985972

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install ann-benchmarks

You can download it from GitHub.
You can use ann-benchmarks like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: