DocumentCluster | Document clustering program , in Java
kandi X-RAY | DocumentCluster Summary
kandi X-RAY | DocumentCluster Summary
DocumentCluster is a Java library. DocumentCluster has no bugs, it has no vulnerabilities, it has a Strong Copyleft License and it has low support. However DocumentCluster build file is not available. You can download it from GitHub.
this file describes documentcluster, a program for clustering text documents based on similarity of word frequencies. document words are first filtered against a specified stop word list, then stemmed using the classic porter stemming algorithm. the resulting data is then converted to term frequency - inverse document frequency values, and normalized so each document is a vector of length one. the document data is internally represented as a sparse matrix with collapsed word columns. the vectors are then clustered using the classic k-means algorithm, using cosine similarity as the distance measure. the number of clusters will be the number specified or half the number of files, whichever is less. files that have no word overlap with other files after stop-word removal will be excluded from
this file describes documentcluster, a program for clustering text documents based on similarity of word frequencies. document words are first filtered against a specified stop word list, then stemmed using the classic porter stemming algorithm. the resulting data is then converted to term frequency - inverse document frequency values, and normalized so each document is a vector of length one. the document data is internally represented as a sparse matrix with collapsed word columns. the vectors are then clustered using the classic k-means algorithm, using cosine similarity as the distance measure. the number of clusters will be the number specified or half the number of files, whichever is less. files that have no word overlap with other files after stop-word removal will be excluded from
Support
Quality
Security
License
Reuse
Support
DocumentCluster has a low active ecosystem.
It has 3 star(s) with 3 fork(s). There are 1 watchers for this library.
It had no major release in the last 6 months.
DocumentCluster has no issues reported. There are no pull requests.
It has a neutral sentiment in the developer community.
The latest version of DocumentCluster is current.
Quality
DocumentCluster has no bugs reported.
Security
DocumentCluster has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
License
DocumentCluster is licensed under the GPL-3.0 License. This license is Strong Copyleft.
Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.
Reuse
DocumentCluster releases are not available. You will need to build from source code and install.
DocumentCluster has no build file. You will be need to create the build yourself to build the component from source.
Installation instructions are not available. Examples and code snippets are available.
Top functions reviewed by kandi - BETA
kandi has reviewed DocumentCluster and discovered the below as its top functions. This is intended to give you an instant insight into DocumentCluster implemented functionality, and help decide if they suit your requirements.
- Test a test program
- Returns the stem of a word
- Extract the location of the syllables in a word
- Matches a word against a list of suffixes
- Get the singleton stemmer
- Test program
- Get the next word
- Returns true if there are more words
- Inserts a document pointer into the document
- Starts a cluster
- Cluster documents
- Calculates the centroid of the specified documents
- Finds the closest cluster to a list of clusters
- Returns a String representation of the files to be displayed
- Creates a String representation of the contents
- Compares this object with another object
Get all kandi verified functions for this library.
DocumentCluster Key Features
No Key Features are available at this moment for DocumentCluster.
DocumentCluster Examples and Code Snippets
No Code Snippets are available at this moment for DocumentCluster.
Community Discussions
No Community Discussions are available at this moment for DocumentCluster.Refer to stack overflow page for discussions.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install DocumentCluster
You can download it from GitHub.
You can use DocumentCluster like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the DocumentCluster component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
You can use DocumentCluster like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the DocumentCluster component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .
Support
For any new features, suggestions and bugs create an issue on GitHub.
If you have any questions check and ask questions on community page Stack Overflow .
Find more information at:
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page