janeaustenr | An R Package for Jane Austen 's Complete Novels orange_book | Computer Vision library
kandi X-RAY | janeaustenr Summary
kandi X-RAY | janeaustenr Summary
An R Package for Jane Austen's Complete Novels :orange_book:
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of janeaustenr
janeaustenr Key Features
janeaustenr Examples and Code Snippets
Community Discussions
Trending Discussions on janeaustenr
QUESTION
I have the shiny app below in which I create a wordcloud. This wordcloud is based on the shiny widgets in the sidebar. The selectInput()
subsets it by label
, the Maximum Number of Words:
is supposed to show the maximum count of words that will be displayed in the wordcloud and the Minimun Frequency
the minimum frequency that a word needs to be displayed. Those widgets are reactive and are based on the df()
function which creates the dataframe needed for the wordcloud. The proble is that when I subset using input$freq
the dataframe has fewer rows than needed to subset with input$max
as well so nothing is displayed.
ANSWER
Answered 2022-Jan-10 at 08:54QUESTION
Im trying to tokenize by word the email
column of df dataset but I get
ANSWER
Answered 2022-Jan-09 at 01:37The 3rd argument to unnest_tokens
is the input i.e the column in the dataframe which needs to be split. You have passed it as text
but there is no text
column in your data.
You can do -
QUESTION
I tried to run the following code with the following data:
...ANSWER
Answered 2021-Nov-14 at 22:29It is possible that count
from dplyr
got masked from any other package loaded with having the same function count
. So, use dplyr::count
QUESTION
I have a dataframe x
that is:
ANSWER
Answered 2021-Jun-01 at 17:24From your output, it appears that there is leading blank-space in the name. If it were just "dispoabl"
with no leading/trailing blanks, I would expect
QUESTION
I want to cluster words that are similar using R and the tidytext
package.
I have created my tokens and would now like to convert it to a matrix in order to cluster it. I would like to try out a number of token techniques to see which provides the most compact clusters.
My code is as follows (taken from the docs of widyr
package). I just cant make the next step. Can anyone help?
ANSWER
Answered 2021-Feb-07 at 17:59You can create an appropriate matrix for this via casting from tidytext. There are several functions to cast_
, such as cast_sparse()
.
Let's use four example books, and cluster the chapters within the books:
QUESTION
I am trying to mutate row numbers after tokenizing within a group_by block and get an error:
Error: Can't recycle input of size 73422 to size 37055.
Run rlang::last_error()
to see where the error occurred.
ANSWER
Answered 2020-Aug-27 at 11:02Just move your group_by to after the unnest_tokens
statement. Like this:
QUESTION
Code example to keep the term and inverse frequency:
...ANSWER
Answered 2020-Aug-17 at 14:52If I understand the question correctly, you want to get a tf-idf per word across your three different documents - in other words, an output data.frame that is unique by word.
The problem is that you cannot do this with tf-idf, because the "idf" part multiplies the term frequency by the log of the inverse document frequency. When you combine the three documents, then every term occurs in your single combined document, meaning it has a document frequency of 1, equal to the number of documents. So the tf-idf for every word of a combined document is zero. I've shown this below.
tf-idf is different for the same words within documents. That's why the tidytext example shows each word by book, not once for the whole corpus.
Here's how to do this in quanteda by document:
QUESTION
library(janeaustenr)
library(tidytext)
library(tidyverse)
library(tm)
library(corpus)
text <- removeNumbers(sensesensibility)
text <- data.frame(text)
tidy_text <- text %>% unnest_tokens(bigram,text,token='ngrams',n=2)
tidy_text %>%count(bigram,sort =TRUE)
tidy_text <-tidy_text %>% separate(bigram,c('word1','word2'),sep =' ')
tidy_text_filtered <- tidy_text %>%
filter(!word1 %in% stop_words$word)%>%
filter(!word2 %in% stop_words$word)
trigram_count <- tidy_text_filtered%>% count(word1,word2, sort= TRUE)
united <- trigram_count%>%unite(bigram,word1,word2,sep=' ')%>%
filter(n >1)
united <- united %>% bind_tf_idf(bigram,n)
...ANSWER
Answered 2020-Jun-24 at 22:46The bind_tf_idf
includes three argument 'term', 'document' and 'n'. We can create the 'document' column
QUESTION
The following code
...ANSWER
Answered 2020-Mar-25 at 20:01There are two things you needed to change:
since you did not set
stringsAsFactors = FALSE
when constructing thedata.frame
, you need to converttext
to character first.You do not have a column named
book
, which means you have to select some other column asdocument
. Since you put a column namedclass
into your example, I assume you want to calculate the tf-idf over this column.
Here is the code:
QUESTION
By using unnest_token
, I want to create a tidy text tibble which combines two different tokens: single words and bigrams.
The reasoning behind is that sometimes single words are the more reasonable unit to study and sometime it is rather higher-order n-grams.
If two words show up as a "sensible" bigram, I want to store the bigram and not store the individual words. If the same words show up in a different context (i.e. not as bigram), then I want to save them as single words.
In the stupid example below "of the" is an important bigram. Thus, I want to remove single words "of" and "the" if they actually appear as "of the" in the text. But if "of" and "the" show up in other combinations, I would like to keep them as single words.
...ANSWER
Answered 2020-Mar-17 at 13:17You could do this by replacing the bigrams you're intrested in with a compound in text, before tokenisation (i.e. unnest_tokens
):
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install janeaustenr
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page