quanteda | An R package for the Quantitative Analysis of Textual Data

by quanteda R Version: v3.3 License: GPL-3.0

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | quanteda Summary

quanteda is a R library. quanteda has no bugs, it has no vulnerabilities, it has a Strong Copyleft License and it has medium support. You can download it from GitHub.

An R package for managing and analyzing text, created by Kenneth Benoit. Supported by the European Research Council grant ERC-2011-StG 283794-QUANTESS. For more details, see quanteda 3.0 is a major release that improves functionality, completes the modularisation of the package begun in v2.0, further improves function consistency by removing previously deprecated functions, and enhances workflow stability and consistency by deprecating some shortcut steps built into some functions. See for a full list of the changes.

Support

Quality

Security

License

Reuse

Support

quanteda has a medium active ecosystem.

It has 784 star(s) with 184 fork(s). There are 55 watchers for this library.

It had no major release in the last 12 months.

There are 62 open issues and 1217 have been closed. On average issues are closed in 414 days. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of quanteda is v3.3

Quality

quanteda has 0 bugs and 0 code smells.

Security

quanteda has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

quanteda code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

quanteda is licensed under the GPL-3.0 License. This license is Strong Copyleft.

Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

Reuse

quanteda releases are available to install and integrate.

Installation instructions, examples and code snippets are available.

It has 97789 lines of code, 0 functions and 461 files.

It has low code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of quanteda

Get all kandi verified functions for this library.

quanteda Key Features

No Key Features are available at this moment for quanteda.

quanteda Examples and Code Snippets

No Code Snippets are available at this moment for quanteda.

Community Discussions

Trending Discussions on quanteda

R not displaying Arabic text correctly

Computing relative frequencies based on dictionary

R: How to count the total number of tokens in a corpus?

Why does loading multiple packages in R produce warnings?

How to loop through numbered dataframes in R environment. I have to loop through 22 (potentially 22*6) dataframes in R

How to define optional element in regex pattern with quanteda's kwic?

How to convert a tokens object into a corpus object

quanteda collocations and lemmatization

Measuring co-occurence patterns in media articles over time with Quanteda

Transform Two Column Data Frame into Quanteda Dictionary Format

QUESTION

R not displaying Arabic text correctly

Asked 2022-Feb-24 at 02:07

I am running a simple unsupervised learning model on an Arabic text corpus, and the model is running well. However, I am having an issue with the plots that aren't working well as they are printing the Arabic characters from left to right, rather than the correct format of right to left.

Here are the packages I am using:

...

ANSWER

Answered 2022-Feb-24 at 02:07

If you're using old a version of R that is 3.2 or Less then those versions does not handle Unicode in proper way. Try to install latest version of R from https://cran.r-project.org/ and if required then install all packages.

Source https://stackoverflow.com/questions/70989953

QUESTION

Computing relative frequencies based on dictionary

Asked 2022-Feb-01 at 17:16

I'd like to examine the Psychological Capital (a construct consisting of four dimensions, namely hope, optimism, efficacy and resiliency) of founders using computer-aided text analysis in R. So far I have pulled tweets from various users into R. The data frame contains of 2130 tweets from 5 different users in different periods. The dataframe is called before_failure. Picture of original data frame

I have then used the quanteda package to create a corpus, perfomed tokenization on it and removed redundant punctuatio/numbers/symbols:

...

ANSWER

Answered 2022-Feb-01 at 17:16

The easiest way to do this is to use tokens_lookup() with a category for tokens not matched, then to compile this into a dfm that you then convert to term proportions within document.

To use a reproducible example from built-in quanteda objects, the process would be the following. (You can substitute your own corpus and dictionary and the code should work fine.)

Source https://stackoverflow.com/questions/70943380

QUESTION

R: How to count the total number of tokens in a corpus?

Asked 2022-Feb-01 at 00:26

I have created a Quanteda corpus called readtext_corpus with 190 types of text. I would like to count the total number of tokens or words in the corpus. I tried the function ntoken which gives a number of words per text not the total number of words for all 190 texts.

...

ANSWER

Answered 2022-Feb-01 at 00:26

you can just use the sum() function which is really simple. I left an example:

Source https://stackoverflow.com/questions/70934308

QUESTION

Why does loading multiple packages in R produce warnings?

Asked 2021-Dec-27 at 20:12

required_packs <- c("pdftools","readxl","pdfsearch","tidyverse","data.table","stringr","tidytext","dplyr","igraph","NLP","tm", "quanteda", "ggraph", "topicmodels", "lasso2", "reshape2", "FSelector")
new_packs <- required_packs[!(required_packs %in% installed.packages()[,"Package"])]
if(length(new_packs)) install.packages(new_packs)
i <- 1
for (i in 1:length(required_packs)) {
 sapply(required_packs[i],require, character.only = T)
}

...

ANSWER

Answered 2021-Dec-27 at 20:12

I think the problem is that you used T when you meant TRUE. For example,

Source https://stackoverflow.com/questions/70497999

QUESTION

How to loop through numbered dataframes in R environment. I have to loop through 22 (potentially 22*6) dataframes in R

Asked 2021-Dec-02 at 16:02

I have 6 different dataframes that each calculates a cosine similarity between a set of documents. I have already calculated the cosine similarity, I just need to pull out the right variable on each of the six and save it. The code to do this looks like this:

...

ANSWER

Answered 2021-Dec-02 at 16:02

You can use get(object_name) to get an object by name

Source https://stackoverflow.com/questions/70200199

QUESTION

How to define optional element in regex pattern with quanteda's kwic?

Asked 2021-Nov-28 at 23:11

I am struggling to 'translate' a regex expression from stringi/stringr to quanteda's kwic function.

How can I get all instances of "Jane Mayer", regardless of whether she has a middle name or not. Note that I don't have a list of all existing middle names in the data. So defining multiple patterns (one for each middle name) wouldn't be possible.

Many thanks!

...

ANSWER

Answered 2021-Nov-28 at 23:11

It seems you need to pass another pattern to match exactly Jane Mayer:

Source https://stackoverflow.com/questions/70148186

QUESTION

How to convert a tokens object into a corpus object

Asked 2021-Oct-17 at 08:53

I have a corpus object that I converted into a tokens object. I then filtered this object to remove words and unify their spelling. For my further workflow, I again need a corpus object. How can I construct this from the tokens object?

...

ANSWER

Answered 2021-Oct-17 at 08:53

You could paste the tokens together to return a new corpus. (Although this may not be the best approach if your goal is to get back to a corpus so that you can use corpus_reshape().)

Source https://stackoverflow.com/questions/69591928

QUESTION

quanteda collocations and lemmatization

Asked 2021-Sep-04 at 09:21

I am using the Quanteda suite of packages to preprocess some text data. I want to incorporate collocations as features and decided to use the textstat_collocations function. According to the documentation and I quote:

"The tokens object . . . . While identifying collocations for tokens objects is supported, you will get better results with character or corpus objects due to relatively imperfect detection of sentence boundaries from texts already tokenized."

This makes perfect sense, so here goes:

...

ANSWER

Answered 2021-Sep-04 at 09:21

The problem is that you have already compounded the elements of the collocations into a single "token" containing a space, but by supplying the phrase() wrapper in tokens_compound(), you are telling tokens_replace() to look for two sequential tokens, not the one with a space.

The way to get what you want is by making the lemmatised replacement match the collocation.

Source https://stackoverflow.com/questions/69051478

QUESTION

Measuring co-occurence patterns in media articles over time with Quanteda

Asked 2021-Aug-15 at 13:43

I am trying to measure the number of times that different words co-occur with a particular term in collections of Chinese newspaper articles from each quarter of a year. To do this, I have been using Quanteda and written several R functions to run on each group of articles. My work steps are:

Group the articles by quarter.
Produce a frequency co-occurence matrix (FCM) for the articles in each quarter (Function 1).
Take the column from this matrix for the 'term' I am interested in and convert this to a data.frame (Function 2)
Merge the data.frames for each quarter together, then produce a large csv file with a column for each quarter and a row for each co-occurring term.

This seems to work okay. But I wondered if anybody more skilled in R might be able to check what I am doing is correct, or might suggest a more efficient way of doing it?

Thanks for any help!

...

ANSWER

Answered 2021-Aug-13 at 09:28

If you are interested in counting co-occurrences within a window for specific target terms, a better way is to use the window argument of tokens_select(), and then to count occurrences from a dfm on the window-selected tokens.

Source https://stackoverflow.com/questions/68763866

QUESTION

Transform Two Column Data Frame into Quanteda Dictionary Format

Asked 2021-Aug-12 at 15:39

My ultimate goal is to create a quanteda dictionary to use for topic classification on text data.

However, my topic keywords are stored in a somewhat different format: I have a column of about 4000 keywords and a second column that specifies the topic each keyword belongs to. Note that there is no equal number of words for each topic. My data looks like this:

...

ANSWER

Answered 2021-Aug-12 at 15:39

If your data is in a data.frame like topics (see data section), you can quickly get the data in a list like you want. You can use the function split.

Source https://stackoverflow.com/questions/68758947

Community Discussions, Code Snippets contain sources that include Stack Exchange Network