DocumentCluster | Document clustering program , in Java

 by   ezraerb Java Version: Current License: GPL-3.0

kandi X-RAY | DocumentCluster Summary

kandi X-RAY | DocumentCluster Summary

DocumentCluster is a Java library. DocumentCluster has no bugs, it has no vulnerabilities, it has a Strong Copyleft License and it has low support. However DocumentCluster build file is not available. You can download it from GitHub.

this file describes documentcluster, a program for clustering text documents based on similarity of word frequencies. document words are first filtered against a specified stop word list, then stemmed using the classic porter stemming algorithm. the resulting data is then converted to term frequency - inverse document frequency values, and normalized so each document is a vector of length one. the document data is internally represented as a sparse matrix with collapsed word columns. the vectors are then clustered using the classic k-means algorithm, using cosine similarity as the distance measure. the number of clusters will be the number specified or half the number of files, whichever is less. files that have no word overlap with other files after stop-word removal will be excluded from
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              DocumentCluster has a low active ecosystem.
              It has 3 star(s) with 3 fork(s). There are 1 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              DocumentCluster has no issues reported. There are no pull requests.
              It has a neutral sentiment in the developer community.
              The latest version of DocumentCluster is current.

            kandi-Quality Quality

              DocumentCluster has no bugs reported.

            kandi-Security Security

              DocumentCluster has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              DocumentCluster is licensed under the GPL-3.0 License. This license is Strong Copyleft.
              Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

            kandi-Reuse Reuse

              DocumentCluster releases are not available. You will need to build from source code and install.
              DocumentCluster has no build file. You will be need to create the build yourself to build the component from source.
              Installation instructions are not available. Examples and code snippets are available.

            Top functions reviewed by kandi - BETA

            kandi has reviewed DocumentCluster and discovered the below as its top functions. This is intended to give you an instant insight into DocumentCluster implemented functionality, and help decide if they suit your requirements.
            • Test a test program
            • Returns the stem of a word
            • Extract the location of the syllables in a word
            • Matches a word against a list of suffixes
            • Get the singleton stemmer
            • Test program
            • Get the next word
            • Returns true if there are more words
            • Inserts a document pointer into the document
            • Starts a cluster
            • Cluster documents
            • Calculates the centroid of the specified documents
            • Finds the closest cluster to a list of clusters
            • Returns a String representation of the files to be displayed
            • Creates a String representation of the contents
            • Compares this object with another object
            Get all kandi verified functions for this library.

            DocumentCluster Key Features

            No Key Features are available at this moment for DocumentCluster.

            DocumentCluster Examples and Code Snippets

            No Code Snippets are available at this moment for DocumentCluster.

            Community Discussions

            No Community Discussions are available at this moment for DocumentCluster.Refer to stack overflow page for discussions.

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install DocumentCluster

            You can download it from GitHub.
            You can use DocumentCluster like any standard Java library. Please include the the jar files in your classpath. You can also use any IDE and you can run and debug the DocumentCluster component as you would do with any other Java program. Best practice is to use a build tool that supports dependency management such as Maven or Gradle. For Maven installation, please refer maven.apache.org. For Gradle installation, please refer gradle.org .

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/ezraerb/DocumentCluster.git

          • CLI

            gh repo clone ezraerb/DocumentCluster

          • sshUrl

            git@github.com:ezraerb/DocumentCluster.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Consider Popular Java Libraries

            CS-Notes

            by CyC2018

            JavaGuide

            by Snailclimb

            LeetCodeAnimation

            by MisterBooo

            spring-boot

            by spring-projects

            Try Top Libraries by ezraerb

            BayseanBandit

            by ezraerbPython

            baysean-classifier

            by ezraerbC++

            RecordValidator

            by ezraerbJava

            nflodap

            by ezraerbJava

            GeneExpression

            by ezraerbJava