rrcf | 🌲 Implementation of the Robust Random Cut Forest algorithm | Predictive Analytics library
kandi X-RAY | rrcf Summary
kandi X-RAY | rrcf Summary
The Robust Random Cut Forest (RRCF) algorithm is an ensemble method for detecting outliers in streaming data. RRCF offers a number of features that many competing anomaly detection algorithms lack. Specifically, RRCF:. This repository provides an open-source implementation of the RRCF algorithm and its core data structures for the purposes of facilitating experimentation and enabling future extensions of the RRCF algorithm.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Insert a point into the tree .
- Remove a point from the tree .
- Calculate the percentage of displacement between a leaf node .
- Deserialize a dictionary .
- Builds a branch tree .
- Yield n - sized samples from a sequence .
rrcf Key Features
rrcf Examples and Code Snippets
Community Discussions
Trending Discussions on rrcf
QUESTION
I am examining different methods in outlier detection. I came across sklearn's implementation of Isolation Forest and Amazon sagemaker's implementation of RRCF (Robust Random Cut Forest). Both are ensemble methods based on decision trees, aiming to isolate every single point. The more isolation steps there are, the more likely the point is to be an inlier, and the opposite is true.
However, even after looking at the original papers of the algorithms, I am failing to understand exactly the difference between both algorithms. In what way do they work differently? Is one of them more efficient than the other?
EDIT: I am adding the links to the research papers for more information, as well as some tutorials discussing the topics.
Isolation Forest:
Robust Random Cut Forest:
...ANSWER
Answered 2020-Jul-28 at 12:23In part of my answers I'll assume you refer to Sklearn's Isolation Forest. I believe those are the 4 main differences:
Code availability: Isolation Forest has a popular open-source implementation in Scikit-Learn (
sklearn.ensemble.IsolationForest
), while both AWS implementation of Robust Random Cut Forest (RRCF) are closed-source, in Amazon Kinesis and Amazon SageMaker. There is an interesting third party RRCF open-source implementation on GitHub though: https://github.com/kLabUM/rrcf ; but unsure how popular it is yetTraining design: RRCF can work on streams, as highlighted in the paper and as exposed in the streaming analytics service Kinesis Data Analytics. On the other hand, the absence of
partial_fit
method hints me that Sklearn's Isolation Forest is a batch-only algorithm that cannot readily work on data streamsScalability: SageMaker RRCF is more scalable. Sklearn's Isolation Forest is single-machine code, which can nonetheless be parallelized over CPUs with the
n_jobs
parameter. On the other hand, SageMaker RRCF can be used over one machine or multiple machines. Also, it supports SageMaker Pipe mode (streaming data via unix pipes) which makes it able to learn on much bigger data than what fits on diskthe way features are sampled at each recursive isolation: RRCF gives more weight to dimension with higher variance (according to SageMaker doc), while I think isolation forest samples at random, which is one reason why RRCF is expected to perform better in high-dimensional space (picture from the RRCF paper)
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install rrcf
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page