RTextTools | open source machine learning package for automatic text | Machine Learning library
kandi X-RAY | RTextTools Summary
kandi X-RAY | RTextTools Summary
RTextTools: Automatic Text Classification via Supervised Learning. Description: RTextTools is a machine learning package for automatic text classification that makes it simple for novice users to get started with machine learning, while allowing experienced users to easily experiment with different settings and algorithm combinations. The package includes nine algorithms for ensemble classification (svm, slda, boosting, bagging, random forests, glmnet, decision trees, neural networks, maximum entropy), comprehensive analytics, and thorough documentation. Version: 1.4.0 Depends: R (≥ 2.15.0), methods, SparseM, randomForest, tree, nnet, tm, e1071, ipred, caTools, maxent, glmnet, tau Published: 2012-09-22 Authors: Timothy P. Jurka, Loren Collingwood, Amber E. Boydstun, Emiliano Grossman, Wouter van Atteveldt Maintainer: Timothy P. Jurka License: GPL-3 URL:
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of RTextTools
RTextTools Key Features
RTextTools Examples and Code Snippets
Community Discussions
Trending Discussions on RTextTools
QUESTION
In the R code below, I have included the sentences when looking to compare the manually classified with lexicon dictionary results by positive, negative and neutral (in matrixdata1), the algorithms results for the model produces different outcome in the tables, which is good. However, when executing..
...ANSWER
Answered 2020-Apr-16 at 10:23Check the format of the train and test data. The error means that the test data is not like the training data, i.e. the configuration of shapes in the model is not compatible with the test data.
If the data you have is is not similar then you can try to fix it. But if the test data is similar to the train data then I recommend splitting the training data itself to derive the test data. This would help you to troubleshoot the issue further to find out what is wrong.
QUESTION
In the R code below, I am introducing train data to create models based on a series of algorithms (e.g. Max Entropy, SVM, etc).
I am having a problem with the algorithm table of results, as each one is showing the exact same output.
Please can you help me to specifically understand the reasons to why each algorithm's table of results is producing exact same output?
...ANSWER
Answered 2020-Apr-16 at 23:10In the above code I am identifying how well the lexicon performs against my manual classification.
You may do so by just comparing the two columns of your "dataset" (ML does not seem to be relevant). Using confusion matrix, for example:
QUESTION
Is there any alternative for rtexttools or another package for this kind of classification methodology, because these package were erased, also maxent and glmnet and they depended on rtexttools and vice verse; here is the script that im trying to apply and classify
...ANSWER
Answered 2019-Dec-06 at 14:56First, the package(s) are not on CRAN
anymore but you can still use them if you want. The easiest way is to install them from the archive:
QUESTION
I am unable to install packages (rtexttools and depend on maxent both lunch errors debt that the were removed recently from the CRAN repository) through latest version of RStudio and R Version.3.5.1 and 3.5.3.
I tried also installing them manually, format .tar, but nothing happened the the error remained.
The error:
...install.packages("RTextTools")
Installing package into ‘C:/Users/dramosd/Documents/R/win-library/3.5’ (as ‘lib’ is unspecified)
Warning in install.packages : package ‘RTextTools’ is not available (for R version 3.5.3)
ANSWER
Answered 2019-Mar-21 at 19:12Some packages are not available to download via CRAN, In that case you can use below function to install R package directly from there github repo.
Try this-
QUESTION
I'm completely new to statistical learning etc but have a particular interest in text classification. I was following a lab I found on the topic here: https://cfss.uchicago.edu/text_classification.html#fnref1. Unfortunately the lab ends before the trained model could be used on new data, so I tried to figure out how to complete it myself.
I have my model trained, Im using random forest. When I try to use predict()
on new data it throws an error: Error in predict.randomForest(modelFit, newdata) :
variables in the training data missing in newdata
Which in my mind doesn't make sense as the test data is literally a subset of the original data. I assume this error has something to do with how I built my model vs the data structure of the test data but I'm honestly not competent enough to figure out how to solve the error or where it is actually even stemming from (though I assume Im making some ridiculous error).
There are other posts with the same error but I think the source of their errors are different to mine, I've tried to find a fix for this all day!
Complete code I'm using below:
...ANSWER
Answered 2018-Sep-19 at 15:49The problem is that test
is not a subset of the data that you are fitting the model with (congress_dtm
). If you create a subset of congress_dtm
, it does work:
QUESTION
I'm using text mining packages to read a group of PDF documents into plaintext, and I want to export this plaintext to a dataframe/CSV/text files (to facilitate further analysis with RTextTools)
First, I pulled PDF documents into a VCorpus using the tm package. The tm package's VCorpus object stores lists containing a "PlainTextDocument" and "TextDocument" object for metadata and plaintext. I.e. "Metadata: DocumentName1"... and the content, "The terms of X are...".
...ANSWER
Answered 2017-Sep-22 at 17:26This should do the trick:
QUESTION
The package "RTextTools" has a known error in the funtion create_matrix()
. The following post shows how to solve the problem for a single R-Session with the following
Fix. However, the post only says how to fix the error via trace("create_matrix",edit=T)
I run R on a linux server via command line. I am wondering how to fix this problem in such a setup
...ANSWER
Answered 2017-Jul-13 at 20:13It's fixed in the latest version on github according to that post. Download the RTextTools folder from github. Then do:
QUESTION
While there have been answers about providing custom lists of stop words to RTextTools, I would like to know about any command to access the existing/default stop word list.
...ANSWER
Answered 2017-Jul-01 at 11:15Depending on the given language
, it's e.g. tm::stopwords("german")
or tm::stopwords("english")
:
QUESTION
I want to lemmatize english words such that all of them get converted to the same tense. For example:
...ANSWER
Answered 2017-Mar-26 at 12:34Have a look at the textstem package I maintain:
QUESTION
I try to create a matrix, for this I would like to tolower text. For this I use this R instruction :
...ANSWER
Answered 2017-Mar-22 at 14:10As @chateaur said, it does perform the toLower internally, it just doesn't expose the contents of the pipeline at arbitrary points to you. RTextTools + tm build in severe structural limitations on what you can do, where, when and in what sequence in your pipeline. It's really frustrating. Avoid that...
I recommend you write your own pipeline, and the best open-source package I found for pipelines when I was investigating this recently was quanteda. To illustrate the point it has an overloaded toLower() method you can use on strings, corpora, tokens - wherever you like, no restrictions, before or after stopword, punctuation removal and stemming. And it has tons of other useful methods for constructing your pipeline in whatever arbitrary sequence of steps you want, unlike RTextTools + tm. (You can also measure the usefulness of a package like quanteda by looking at the number/rate of active maintainers, commits, issues, fixes, releases, hits on github, SO, google, cleanness of the code and the API...).
Using RTextTools + tm on the frontend is sometimes painful, and often limiting. I simply found too many bugs, limitations, syntax quirks and annoyances with them - it killed my productivity and constantly drove me nuts. And it wasn't too performant either. You can still use (RTextTools +) tm for constructing and manipulating the DTM (and TF/TFIDF) matrices, and e1071 for the classifier.
Also: an honorable mention to qdap package for similarly adding useful tools at the document/discourse-level.
(PS: it's truly sad that R text-processing packages are so balkanized... so many people working at cross-purposes and furiously reinventing wheels... but sometimes that happens for several reasons.)
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install RTextTools
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page