rrcf | 🌲 Implementation of the Robust Random Cut Forest algorithm | Predictive Analytics library

 by   kLabUM Python Version: 0.4.4 License: MIT

kandi X-RAY | rrcf Summary

kandi X-RAY | rrcf Summary

rrcf is a Python library typically used in Analytics, Predictive Analytics applications. rrcf has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has low support. You can download it from GitHub.

The Robust Random Cut Forest (RRCF) algorithm is an ensemble method for detecting outliers in streaming data. RRCF offers a number of features that many competing anomaly detection algorithms lack. Specifically, RRCF:. This repository provides an open-source implementation of the RRCF algorithm and its core data structures for the purposes of facilitating experimentation and enabling future extensions of the RRCF algorithm.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              rrcf has a low active ecosystem.
              It has 441 star(s) with 101 fork(s). There are 19 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 25 open issues and 20 have been closed. On average issues are closed in 15 days. There are 3 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of rrcf is 0.4.4

            kandi-Quality Quality

              rrcf has 0 bugs and 0 code smells.

            kandi-Security Security

              rrcf has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              rrcf code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              rrcf is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              rrcf releases are available to install and integrate.
              Build file is available. You can build the component from source.
              Installation instructions, examples and code snippets are available.
              rrcf saves you 304 person hours of effort in developing the same functionality from scratch.
              It has 732 lines of code, 54 functions and 8 files.
              It has high code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed rrcf and discovered the below as its top functions. This is intended to give you an instant insight into rrcf implemented functionality, and help decide if they suit your requirements.
            • Insert a point into the tree .
            • Remove a point from the tree .
            • Calculate the percentage of displacement between a leaf node .
            • Deserialize a dictionary .
            • Builds a branch tree .
            • Yield n - sized samples from a sequence .
            Get all kandi verified functions for this library.

            rrcf Key Features

            No Key Features are available at this moment for rrcf.

            rrcf Examples and Code Snippets

            No Code Snippets are available at this moment for rrcf.

            Community Discussions

            QUESTION

            Isolation Forest vs Robust Random Cut Forest in outlier detection
            Asked 2020-Dec-09 at 02:13

            I am examining different methods in outlier detection. I came across sklearn's implementation of Isolation Forest and Amazon sagemaker's implementation of RRCF (Robust Random Cut Forest). Both are ensemble methods based on decision trees, aiming to isolate every single point. The more isolation steps there are, the more likely the point is to be an inlier, and the opposite is true.

            However, even after looking at the original papers of the algorithms, I am failing to understand exactly the difference between both algorithms. In what way do they work differently? Is one of them more efficient than the other?

            EDIT: I am adding the links to the research papers for more information, as well as some tutorials discussing the topics.

            Isolation Forest:

            Paper Tutorial

            Robust Random Cut Forest:

            Paper Tutorial

            ...

            ANSWER

            Answered 2020-Jul-28 at 12:23

            In part of my answers I'll assume you refer to Sklearn's Isolation Forest. I believe those are the 4 main differences:

            1. Code availability: Isolation Forest has a popular open-source implementation in Scikit-Learn (sklearn.ensemble.IsolationForest), while both AWS implementation of Robust Random Cut Forest (RRCF) are closed-source, in Amazon Kinesis and Amazon SageMaker. There is an interesting third party RRCF open-source implementation on GitHub though: https://github.com/kLabUM/rrcf ; but unsure how popular it is yet

            2. Training design: RRCF can work on streams, as highlighted in the paper and as exposed in the streaming analytics service Kinesis Data Analytics. On the other hand, the absence of partial_fit method hints me that Sklearn's Isolation Forest is a batch-only algorithm that cannot readily work on data streams

            3. Scalability: SageMaker RRCF is more scalable. Sklearn's Isolation Forest is single-machine code, which can nonetheless be parallelized over CPUs with the n_jobs parameter. On the other hand, SageMaker RRCF can be used over one machine or multiple machines. Also, it supports SageMaker Pipe mode (streaming data via unix pipes) which makes it able to learn on much bigger data than what fits on disk

            4. the way features are sampled at each recursive isolation: RRCF gives more weight to dimension with higher variance (according to SageMaker doc), while I think isolation forest samples at random, which is one reason why RRCF is expected to perform better in high-dimensional space (picture from the RRCF paper)

            Source https://stackoverflow.com/questions/63115867

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install rrcf

            Use pip to install rrcf via pypi:. Currently, only Python 3 is supported.

            Support

            Read the docs here 📖.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/kLabUM/rrcf.git

          • CLI

            gh repo clone kLabUM/rrcf

          • sshUrl

            git@github.com:kLabUM/rrcf.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link