ann-benchmarks | Benchmarks of approximate nearest neighbor libraries | Machine Learning library

 by   erikbern Python Version: Current License: MIT

kandi X-RAY | ann-benchmarks Summary

kandi X-RAY | ann-benchmarks Summary

ann-benchmarks is a Python library typically used in Artificial Intelligence, Machine Learning applications. ann-benchmarks has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. You can download it from GitHub.

Doing fast searching of nearest neighbors in high dimensional spaces is an increasingly important problem, but so far there has not been a lot of empirical attempts at comparing approaches in an objective way. This project contains some tools to benchmark various implementations of approximate nearest neighbor (ANN) search for different metrics. We have pregenerated datasets (in HDF5) formats and we also have Docker containers for each algorithm. There’s a [test suite] that makes sure every algorithm works.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              ann-benchmarks has a medium active ecosystem.
              It has 3712 star(s) with 560 fork(s). There are 106 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 36 open issues and 112 have been closed. On average issues are closed in 396 days. There are 9 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of ann-benchmarks is current.

            kandi-Quality Quality

              ann-benchmarks has 0 bugs and 0 code smells.

            kandi-Security Security

              ann-benchmarks has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              ann-benchmarks code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              ann-benchmarks is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              ann-benchmarks releases are not available. You will need to build from source code and install.
              Build file is available. You can build the component from source.
              Installation instructions are not available. Examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi has reviewed ann-benchmarks and discovered the below as its top functions. This is intended to give you an instant insight into ann-benchmarks implemented functionality, and help decide if they suit your requirements.
            • Generate plot
            • Create a pointset from the data
            • Get plot data
            • Prepare the data from the data
            • Run the command line interface
            • Download src to dst
            • Get dataset
            • Run the algorithm
            • Build the detail site
            • Generate a plot
            • Load all results from results
            • Download kosarak data from Kosak
            • Download and return the dataset
            • Download a deep image
            • Decorator to configure a query parameter
            • Extract word vectors from word vectors
            • Build index site
            • Compute metrics for each run
            • Compute the relative similarity metric
            • Fit the model
            • Compute the metrics for each prediction
            • Establishes the network
            • Fit the index to the index
            • Calculate the index of the dataset
            • Extracts the lastfm dataset
            • Create an index from data
            • Fits on the network
            Get all kandi verified functions for this library.

            ann-benchmarks Key Features

            No Key Features are available at this moment for ann-benchmarks.

            ann-benchmarks Examples and Code Snippets

            hnsw-rs,Examples and Benchmarks
            Rustdot img1Lines of Code : 22dot img1License : Non-SPDX (NOASSERTION)
            copy iconCopy
                //  reading data
                let anndata = AnnBenchmarkData::new(fname).unwrap();
                let nb_elem = anndata.train_data.len();
                let max_nb_connection = 24;
                let nb_layer = 16.min((nb_elem as f32).ln().trunc() as usize);
                let ef_c = 400;
                // al  

            Community Discussions

            QUESTION

            If I relax some constraints, can I get an algorithmic shortcut on Approximate Nearest Neighbors?
            Asked 2020-Sep-23 at 17:37

            I'm looking for an algorithm with the fastest time per query for a problem similar to nearest-neighbor search, but with two differences:

            • I need to only approximately confirm (tolerating Type I and Type II error) the existence of a neighbor within some distance k or return the approximate distance of the nearest neighbor.
            • I can query many at once

            I'd like better throughput than the approximate nearest neighbor libraries out there (https://github.com/erikbern/ann-benchmarks) which seem better designed for single queries. In particular, the algorithmic relaxation of the first criteria seems like it should leave room for an algorithmic shortcut, but I can't find any solutions in the literature nor can I figure out how to design one.

            Here's my current best solution, which operates at about 10k queries / sec on per CPU. I'm looking for something close to an order-of-magnitude speedup if possible.

            ...

            ANSWER

            Answered 2020-Sep-21 at 04:54

            I'm a bit skeptical of benchmarks such as the one you have linked, as in my experience I have found that the definition of the problem at hand far outweighs in importance the merits of any one algorithm across a set of other (possibly similar looking) problems.

            More simply put, an algorithm being a high performer on a given benchmark does not imply it will be a higher performer on the problem you care about. Even small or apparently trivial changes to the formulation of your problem can significantly change the performance of any fixed set of algorithms.

            That said, given the specifics of the problem you care about I would recommend the following:

            • use the cascading approach described in the paper [1]
            • use SIMD operations (either SSE on intel chips or GPUs) to accelerate, the nearest neighbour problem is one where operations closer to the metal and parallelism can really shine
            • tune the parameters of the algorithm to maximize your objective; in particular, the algorithm of [1] has a few easy to tune parameters which will dramatically trade performance for accuracy, make sure you perform a grid search over these parameters to set them to the sweet spot for your problem

            Note: I have recommended the paper [1] because I have tried many of the algorithms listed in the benchmark you linked and found them all inferior (for the task of image reconstruction) to the approach listed in [1] while at the same time being much more complicated than [1], both undesirable properties. YMMV depending on your problem definition.

            Source https://stackoverflow.com/questions/63985972

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install ann-benchmarks

            You can download it from GitHub.
            You can use ann-benchmarks like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/erikbern/ann-benchmarks.git

          • CLI

            gh repo clone erikbern/ann-benchmarks

          • sshUrl

            git@github.com:erikbern/ann-benchmarks.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link