RP-DBSCAN | recent trends in big data processing
kandi X-RAY | RP-DBSCAN Summary
kandi X-RAY | RP-DBSCAN Summary
RP-DBSCAN is a Java library. RP-DBSCAN has no vulnerabilities, it has a Permissive License and it has low support. However RP-DBSCAN has 49 bugs and it build file is not available. You can download it from GitHub.
Following the recent trends in big data processing, several parallel DBSCAN algorithms have been reported in the literature. In most such algorithms, neighboring points are assigned to the same data partition for parallel processing to facilitate calculation of the density of the neighbors. This data partitioning scheme causes a few critical problems including load imbalance between data partitions, especially in a skewed data set. To remedy these problems, we propose a cell-based data partitioning scheme, pseudo random partitioning, that randomly distributes small cells rather than the points themselves. It achieves high load balance regardless of data skewness while retaining the data contiguity required for DBSCAN. In addition, we build and broadcast a highly compact summary of the entire data set, which we call a two-level cell dictionary, to supplement random partitions. Then, we develop a novel parallel DBSCAN algorithm, Random Partitioning-DBSCAN (shortly, RPDBSCAN), that uses pseudo random partitioning together with a two-level cell dictionary. The algorithm simultaneously finds the local clusters to each data partition and then merges these local clusters to obtain global clustering. To validate the merit of our approach, we implement RP-DBSCAN on Spark and conduct extensive experiments using various real-world data sets on 12 Microsoft Azure machines (48 cores). In RP-DBSCAN, data partitioning and cluster merging are very light, and clustering on each split is not dragged out by a specific worker. Therefore, the performance results show that RP-DBSCAN significantly outperforms the state-of-the-art algorithms by up to 180 times.
Following the recent trends in big data processing, several parallel DBSCAN algorithms have been reported in the literature. In most such algorithms, neighboring points are assigned to the same data partition for parallel processing to facilitate calculation of the density of the neighbors. This data partitioning scheme causes a few critical problems including load imbalance between data partitions, especially in a skewed data set. To remedy these problems, we propose a cell-based data partitioning scheme, pseudo random partitioning, that randomly distributes small cells rather than the points themselves. It achieves high load balance regardless of data skewness while retaining the data contiguity required for DBSCAN. In addition, we build and broadcast a highly compact summary of the entire data set, which we call a two-level cell dictionary, to supplement random partitions. Then, we develop a novel parallel DBSCAN algorithm, Random Partitioning-DBSCAN (shortly, RPDBSCAN), that uses pseudo random partitioning together with a two-level cell dictionary. The algorithm simultaneously finds the local clusters to each data partition and then merges these local clusters to obtain global clustering. To validate the merit of our approach, we implement RP-DBSCAN on Spark and conduct extensive experiments using various real-world data sets on 12 Microsoft Azure machines (48 cores). In RP-DBSCAN, data partitioning and cluster merging are very light, and clustering on each split is not dragged out by a specific worker. Therefore, the performance results show that RP-DBSCAN significantly outperforms the state-of-the-art algorithms by up to 180 times.
Support
Quality
Security
License
Reuse
Support
RP-DBSCAN has a low active ecosystem.
It has 46 star(s) with 8 fork(s). There are 6 watchers for this library.
It had no major release in the last 6 months.
RP-DBSCAN has no issues reported. There are no pull requests.
It has a neutral sentiment in the developer community.
The latest version of RP-DBSCAN is current.
Quality
RP-DBSCAN has 49 bugs (33 blocker, 1 critical, 7 major, 8 minor) and 624 code smells.
Security
RP-DBSCAN has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
RP-DBSCAN code analysis shows 0 unresolved vulnerabilities.
There are 15 security hotspots that need review.
License
RP-DBSCAN is licensed under the Apache-2.0 License. This license is Permissive.
Permissive licenses have the least restrictions, and you can use them in most projects.
Reuse
RP-DBSCAN releases are not available. You will need to build from source code and install.
RP-DBSCAN has no build file. You will be need to create the build yourself to build the component from source.
Installation instructions are not available. Examples and code snippets are available.
RP-DBSCAN saves you 1549 person hours of effort in developing the same functionality from scratch.
It has 3449 lines of code, 208 functions and 28 files.
It has medium code complexity. Code complexity directly impacts maintainability of the code.
Top functions reviewed by kandi - BETA
kandi has reviewed RP-DBSCAN and discovered the below as its top functions. This is intended to give you an instant insight into RP-DBSCAN implemented functionality, and help decide if they suit your requirements.
- Given a set of edge splits returns a list of edge edges
- Adds a new binary node
- Build minimum spanning forest
- L2 norm
- Equivalent to L2 norm
- Generate the metadata with approximate approximations
- This method is used to get the codepoint coordinates
- Calculate the state of the polynomial with a sphere
- Calculates the distance between a point and a sphere
- Finds the nearest node in the polynomial
- Calculates the nearest nearest neighbor to the given coordinates
- Returns the set of coordinates for a lv1 id
- Build the neighbor search tree without the LVP
- Get the grid coordinates for the level 1
- Gets the index of the SVG coordinates for a given ID
- Check if the map contains a cell
- Creates a unique hash code for this vector
- Gets neighbor node
- Sets min and max values for the given partition
- Gets the counts of two partition at the same partition
- Sets the min and max coordinates
- Returns a string representation of this node
- Gets neighbor id
- Finds the nearest neighbor of this node
- Build the neighbor search tree
- Main entry point
Get all kandi verified functions for this library.
RP-DBSCAN Key Features
No Key Features are available at this moment for RP-DBSCAN.
RP-DBSCAN Examples and Code Snippets
No Code Snippets are available at this moment for RP-DBSCAN.
Community Discussions
No Community Discussions are available at this moment for RP-DBSCAN.Refer to stack overflow page for discussions.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install RP-DBSCAN
You can download it from GitHub.
You can use RP-DBSCAN like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the RP-DBSCAN component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
You can use RP-DBSCAN like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the RP-DBSCAN component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
Support
For any new features, suggestions and bugs create an issue on GitHub.
If you have any questions check and ask questions on community page Stack Overflow .
Find more information at:
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page