RTextTools | open source machine learning package for automatic text | Machine Learning library

 by   timjurka C Version: Current License: No License

kandi X-RAY | RTextTools Summary

kandi X-RAY | RTextTools Summary

RTextTools is a C library typically used in Artificial Intelligence, Machine Learning applications. RTextTools has no bugs, it has no vulnerabilities and it has low support. You can download it from GitHub.

RTextTools: Automatic Text Classification via Supervised Learning. Description: RTextTools is a machine learning package for automatic text classification that makes it simple for novice users to get started with machine learning, while allowing experienced users to easily experiment with different settings and algorithm combinations. The package includes nine algorithms for ensemble classification (svm, slda, boosting, bagging, random forests, glmnet, decision trees, neural networks, maximum entropy), comprehensive analytics, and thorough documentation. Version: 1.4.0 Depends: R (≥ 2.15.0), methods, SparseM, randomForest, tree, nnet, tm, e1071, ipred, caTools, maxent, glmnet, tau Published: 2012-09-22 Authors: Timothy P. Jurka, Loren Collingwood, Amber E. Boydstun, Emiliano Grossman, Wouter van Atteveldt Maintainer: Timothy P. Jurka License: GPL-3 URL:
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              RTextTools has a low active ecosystem.
              It has 69 star(s) with 62 fork(s). There are 12 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 6 open issues and 1 have been closed. On average issues are closed in 1071 days. There are 3 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of RTextTools is current.

            kandi-Quality Quality

              RTextTools has no bugs reported.

            kandi-Security Security

              RTextTools has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

            kandi-License License

              RTextTools does not have a standard license declared.
              Check the repository for any license declaration and review the terms closely.
              OutlinedDot
              Without a license, all rights are reserved, and you cannot use the library in your applications.

            kandi-Reuse Reuse

              RTextTools releases are not available. You will need to build from source code and install.
              Installation instructions are available. Examples and code snippets are not available.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of RTextTools
            Get all kandi verified functions for this library.

            RTextTools Key Features

            No Key Features are available at this moment for RTextTools.

            RTextTools Examples and Code Snippets

            No Code Snippets are available at this moment for RTextTools.

            Community Discussions

            QUESTION

            Introducing new data against the model, but an error is produced test data does not match model !", how to overcome the problem?
            Asked 2020-May-07 at 17:04

            In the R code below, I have included the sentences when looking to compare the manually classified with lexicon dictionary results by positive, negative and neutral (in matrixdata1), the algorithms results for the model produces different outcome in the tables, which is good. However, when executing..

            ...

            ANSWER

            Answered 2020-Apr-16 at 10:23

            Check the format of the train and test data. The error means that the test data is not like the training data, i.e. the configuration of shapes in the model is not compatible with the test data.

            If the data you have is is not similar then you can try to fix it. But if the test data is similar to the train data then I recommend splitting the training data itself to derive the test data. This would help you to troubleshoot the issue further to find out what is wrong.

            Source https://stackoverflow.com/questions/61246479

            QUESTION

            Why is the result of each model algorithms (max entropy, forest, svm, etc) producing the exact same output in the tables?
            Asked 2020-Apr-16 at 23:10

            In the R code below, I am introducing train data to create models based on a series of algorithms (e.g. Max Entropy, SVM, etc).

            I am having a problem with the algorithm table of results, as each one is showing the exact same output.

            Please can you help me to specifically understand the reasons to why each algorithm's table of results is producing exact same output?

            Dataset applied in the R code

            ...

            ANSWER

            Answered 2020-Apr-16 at 23:10

            In the above code I am identifying how well the lexicon performs against my manual classification.

            You may do so by just comparing the two columns of your "dataset" (ML does not seem to be relevant). Using confusion matrix, for example:

            Source https://stackoverflow.com/questions/61096792

            QUESTION

            rtexttools package alternative for R version 3.5.2 or newest R version
            Asked 2019-Dec-06 at 16:00

            Is there any alternative for rtexttools or another package for this kind of classification methodology, because these package were erased, also maxent and glmnet and they depended on rtexttools and vice verse; here is the script that im trying to apply and classify

            ...

            ANSWER

            Answered 2019-Dec-06 at 14:56

            First, the package(s) are not on CRAN anymore but you can still use them if you want. The easiest way is to install them from the archive:

            Source https://stackoverflow.com/questions/59214665

            QUESTION

            Issues to install packages in latest version of RStudio and R Versions (3.5.1 / 3.5.3)
            Asked 2019-Mar-21 at 19:48

            I am unable to install packages (rtexttools and depend on maxent both lunch errors debt that the were removed recently from the CRAN repository) through latest version of RStudio and R Version.3.5.1 and 3.5.3.

            I tried also installing them manually, format .tar, but nothing happened the the error remained.

            The error:

            install.packages("RTextTools")

            Installing package into ‘C:/Users/dramosd/Documents/R/win-library/3.5’ (as ‘lib’ is unspecified)

            Warning in install.packages : package ‘RTextTools’ is not available (for R version 3.5.3)

            ...

            ANSWER

            Answered 2019-Mar-21 at 19:12

            Some packages are not available to download via CRAN, In that case you can use below function to install R package directly from there github repo.

            Try this-

            Source https://stackoverflow.com/questions/55287397

            QUESTION

            Text classification - randomForest. variables in the training data missing in newdata
            Asked 2018-Sep-19 at 15:49

            I'm completely new to statistical learning etc but have a particular interest in text classification. I was following a lab I found on the topic here: https://cfss.uchicago.edu/text_classification.html#fnref1. Unfortunately the lab ends before the trained model could be used on new data, so I tried to figure out how to complete it myself.

            I have my model trained, Im using random forest. When I try to use predict() on new data it throws an error: Error in predict.randomForest(modelFit, newdata) : variables in the training data missing in newdata

            Which in my mind doesn't make sense as the test data is literally a subset of the original data. I assume this error has something to do with how I built my model vs the data structure of the test data but I'm honestly not competent enough to figure out how to solve the error or where it is actually even stemming from (though I assume Im making some ridiculous error).

            There are other posts with the same error but I think the source of their errors are different to mine, I've tried to find a fix for this all day!

            Complete code I'm using below:

            ...

            ANSWER

            Answered 2018-Sep-19 at 15:49

            The problem is that test is not a subset of the data that you are fitting the model with (congress_dtm). If you create a subset of congress_dtm, it does work:

            Source https://stackoverflow.com/questions/52409297

            QUESTION

            Text Mining PDFs - Convert List of Character Vectors (Strings) to Dataframe
            Asked 2017-Sep-22 at 17:26

            I'm using text mining packages to read a group of PDF documents into plaintext, and I want to export this plaintext to a dataframe/CSV/text files (to facilitate further analysis with RTextTools)

            First, I pulled PDF documents into a VCorpus using the tm package. The tm package's VCorpus object stores lists containing a "PlainTextDocument" and "TextDocument" object for metadata and plaintext. I.e. "Metadata: DocumentName1"... and the content, "The terms of X are...".

            ...

            ANSWER

            Answered 2017-Sep-22 at 17:26

            This should do the trick:

            Source https://stackoverflow.com/questions/46368540

            QUESTION

            “RTextTools” - fix error in create_matrix on Linux Ubuntu
            Asked 2017-Jul-13 at 20:13

            The package "RTextTools" has a known error in the funtion create_matrix(). The following post shows how to solve the problem for a single R-Session with the following
            Fix. However, the post only says how to fix the error via trace("create_matrix",edit=T)

            I run R on a linux server via command line. I am wondering how to fix this problem in such a setup

            ...

            ANSWER

            Answered 2017-Jul-13 at 20:13

            It's fixed in the latest version on github according to that post. Download the RTextTools folder from github. Then do:

            Source https://stackoverflow.com/questions/45088806

            QUESTION

            How do I access the list of stop words in RTextTools?
            Asked 2017-Jul-03 at 12:26

            While there have been answers about providing custom lists of stop words to RTextTools, I would like to know about any command to access the existing/default stop word list.

            ...

            ANSWER

            Answered 2017-Jul-01 at 11:15

            Depending on the given language, it's e.g. tm::stopwords("german") or tm::stopwords("english"):

            Source https://stackoverflow.com/questions/44860189

            QUESTION

            How can I lemmatize english words (example: 'run' and 'ran') using R to bring them all to the same tense?
            Asked 2017-Mar-27 at 03:40

            I want to lemmatize english words such that all of them get converted to the same tense. For example:

            ...

            ANSWER

            Answered 2017-Mar-26 at 12:34

            QUESTION

            Impossible to see results of `RTextTools::toLower()` text in Document-Term-Matrix
            Asked 2017-Mar-22 at 14:10

            I try to create a matrix, for this I would like to tolower text. For this I use this R instruction :

            ...

            ANSWER

            Answered 2017-Mar-22 at 14:10

            As @chateaur said, it does perform the toLower internally, it just doesn't expose the contents of the pipeline at arbitrary points to you. RTextTools + tm build in severe structural limitations on what you can do, where, when and in what sequence in your pipeline. It's really frustrating. Avoid that...

            I recommend you write your own pipeline, and the best open-source package I found for pipelines when I was investigating this recently was quanteda. To illustrate the point it has an overloaded toLower() method you can use on strings, corpora, tokens - wherever you like, no restrictions, before or after stopword, punctuation removal and stemming. And it has tons of other useful methods for constructing your pipeline in whatever arbitrary sequence of steps you want, unlike RTextTools + tm. (You can also measure the usefulness of a package like quanteda by looking at the number/rate of active maintainers, commits, issues, fixes, releases, hits on github, SO, google, cleanness of the code and the API...).

            Using RTextTools + tm on the frontend is sometimes painful, and often limiting. I simply found too many bugs, limitations, syntax quirks and annoyances with them - it killed my productivity and constantly drove me nuts. And it wasn't too performant either. You can still use (RTextTools +) tm for constructing and manipulating the DTM (and TF/TFIDF) matrices, and e1071 for the classifier.

            Also: an honorable mention to qdap package for similarly adding useful tools at the document/discourse-level.

            (PS: it's truly sad that R text-processing packages are so balkanized... so many people working at cross-purposes and furiously reinventing wheels... but sometimes that happens for several reasons.)

            Source https://stackoverflow.com/questions/42952476

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install RTextTools

            RTextTools requires R 2.15+, which can be downloaded at http://www.r-project.org/. To build and install RTextTools, run the following commands while in the root folder:. R CMD REMOVE RTextTools R CMD BUILD RTextTools R CMD INSTALL RTextTools_X.X.X.tar.gz (where the X’s should be replaced with the version number — e.g. 1.4.0).

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/timjurka/RTextTools.git

          • CLI

            gh repo clone timjurka/RTextTools

          • sshUrl

            git@github.com:timjurka/RTextTools.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link