quanteda | An R package for the Quantitative Analysis of Textual Data

 by   quanteda R Version: v3.3 License: GPL-3.0

kandi X-RAY | quanteda Summary

kandi X-RAY | quanteda Summary

quanteda is a R library. quanteda has no bugs, it has no vulnerabilities, it has a Strong Copyleft License and it has medium support. You can download it from GitHub.

An R package for managing and analyzing text, created by Kenneth Benoit. Supported by the European Research Council grant ERC-2011-StG 283794-QUANTESS. For more details, see quanteda 3.0 is a major release that improves functionality, completes the modularisation of the package begun in v2.0, further improves function consistency by removing previously deprecated functions, and enhances workflow stability and consistency by deprecating some shortcut steps built into some functions. See for a full list of the changes.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              quanteda has a medium active ecosystem.
              It has 784 star(s) with 184 fork(s). There are 55 watchers for this library.
              OutlinedDot
              It had no major release in the last 12 months.
              There are 62 open issues and 1217 have been closed. On average issues are closed in 414 days. There are 1 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of quanteda is v3.3

            kandi-Quality Quality

              quanteda has 0 bugs and 0 code smells.

            kandi-Security Security

              quanteda has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              quanteda code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              quanteda is licensed under the GPL-3.0 License. This license is Strong Copyleft.
              Strong Copyleft licenses enforce sharing, and you can use them when creating open source projects.

            kandi-Reuse Reuse

              quanteda releases are available to install and integrate.
              Installation instructions, examples and code snippets are available.
              It has 97789 lines of code, 0 functions and 461 files.
              It has low code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
            Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of quanteda
            Get all kandi verified functions for this library.

            quanteda Key Features

            No Key Features are available at this moment for quanteda.

            quanteda Examples and Code Snippets

            No Code Snippets are available at this moment for quanteda.

            Community Discussions

            QUESTION

            R not displaying Arabic text correctly
            Asked 2022-Feb-24 at 02:07

            I am running a simple unsupervised learning model on an Arabic text corpus, and the model is running well. However, I am having an issue with the plots that aren't working well as they are printing the Arabic characters from left to right, rather than the correct format of right to left.

            Here are the packages I am using:

            ...

            ANSWER

            Answered 2022-Feb-24 at 02:07

            If you're using old a version of R that is 3.2 or Less then those versions does not handle Unicode in proper way. Try to install latest version of R from https://cran.r-project.org/ and if required then install all packages.

            Source https://stackoverflow.com/questions/70989953

            QUESTION

            Computing relative frequencies based on dictionary
            Asked 2022-Feb-01 at 17:16

            I'd like to examine the Psychological Capital (a construct consisting of four dimensions, namely hope, optimism, efficacy and resiliency) of founders using computer-aided text analysis in R. So far I have pulled tweets from various users into R. The data frame contains of 2130 tweets from 5 different users in different periods. The dataframe is called before_failure. Picture of original data frame

            I have then used the quanteda package to create a corpus, perfomed tokenization on it and removed redundant punctuatio/numbers/symbols:

            ...

            ANSWER

            Answered 2022-Feb-01 at 17:16

            The easiest way to do this is to use tokens_lookup() with a category for tokens not matched, then to compile this into a dfm that you then convert to term proportions within document.

            To use a reproducible example from built-in quanteda objects, the process would be the following. (You can substitute your own corpus and dictionary and the code should work fine.)

            Source https://stackoverflow.com/questions/70943380

            QUESTION

            R: How to count the total number of tokens in a corpus?
            Asked 2022-Feb-01 at 00:26

            I have created a Quanteda corpus called readtext_corpus with 190 types of text. I would like to count the total number of tokens or words in the corpus. I tried the function ntoken which gives a number of words per text not the total number of words for all 190 texts.

            ...

            ANSWER

            Answered 2022-Feb-01 at 00:26

            you can just use the sum() function which is really simple. I left an example:

            Source https://stackoverflow.com/questions/70934308

            QUESTION

            Why does loading multiple packages in R produce warnings?
            Asked 2021-Dec-27 at 20:12
            required_packs <- c("pdftools","readxl","pdfsearch","tidyverse","data.table","stringr","tidytext","dplyr","igraph","NLP","tm", "quanteda", "ggraph", "topicmodels", "lasso2", "reshape2", "FSelector")
            new_packs <- required_packs[!(required_packs %in% installed.packages()[,"Package"])]
            if(length(new_packs)) install.packages(new_packs)
            i <- 1
            for (i in 1:length(required_packs)) {
             sapply(required_packs[i],require, character.only = T)
            }
            
            ...

            ANSWER

            Answered 2021-Dec-27 at 20:12

            I think the problem is that you used T when you meant TRUE. For example,

            Source https://stackoverflow.com/questions/70497999

            QUESTION

            How to loop through numbered dataframes in R environment. I have to loop through 22 (potentially 22*6) dataframes in R
            Asked 2021-Dec-02 at 16:02

            I have 6 different dataframes that each calculates a cosine similarity between a set of documents. I have already calculated the cosine similarity, I just need to pull out the right variable on each of the six and save it. The code to do this looks like this:

            ...

            ANSWER

            Answered 2021-Dec-02 at 16:02

            You can use get(object_name) to get an object by name

            Source https://stackoverflow.com/questions/70200199

            QUESTION

            How to define optional element in regex pattern with quanteda's kwic?
            Asked 2021-Nov-28 at 23:11

            I am struggling to 'translate' a regex expression from stringi/stringr to quanteda's kwic function.

            How can I get all instances of "Jane Mayer", regardless of whether she has a middle name or not. Note that I don't have a list of all existing middle names in the data. So defining multiple patterns (one for each middle name) wouldn't be possible.

            Many thanks!

            ...

            ANSWER

            Answered 2021-Nov-28 at 23:11

            It seems you need to pass another pattern to match exactly Jane Mayer:

            Source https://stackoverflow.com/questions/70148186

            QUESTION

            How to convert a tokens object into a corpus object
            Asked 2021-Oct-17 at 08:53

            I have a corpus object that I converted into a tokens object. I then filtered this object to remove words and unify their spelling. For my further workflow, I again need a corpus object. How can I construct this from the tokens object?

            ...

            ANSWER

            Answered 2021-Oct-17 at 08:53

            You could paste the tokens together to return a new corpus. (Although this may not be the best approach if your goal is to get back to a corpus so that you can use corpus_reshape().)

            Source https://stackoverflow.com/questions/69591928

            QUESTION

            quanteda collocations and lemmatization
            Asked 2021-Sep-04 at 09:21

            I am using the Quanteda suite of packages to preprocess some text data. I want to incorporate collocations as features and decided to use the textstat_collocations function. According to the documentation and I quote:

            "The tokens object . . . . While identifying collocations for tokens objects is supported, you will get better results with character or corpus objects due to relatively imperfect detection of sentence boundaries from texts already tokenized."

            This makes perfect sense, so here goes:

            ...

            ANSWER

            Answered 2021-Sep-04 at 09:21

            The problem is that you have already compounded the elements of the collocations into a single "token" containing a space, but by supplying the phrase() wrapper in tokens_compound(), you are telling tokens_replace() to look for two sequential tokens, not the one with a space.

            The way to get what you want is by making the lemmatised replacement match the collocation.

            Source https://stackoverflow.com/questions/69051478

            QUESTION

            Measuring co-occurence patterns in media articles over time with Quanteda
            Asked 2021-Aug-15 at 13:43

            I am trying to measure the number of times that different words co-occur with a particular term in collections of Chinese newspaper articles from each quarter of a year. To do this, I have been using Quanteda and written several R functions to run on each group of articles. My work steps are:

            1. Group the articles by quarter.
            2. Produce a frequency co-occurence matrix (FCM) for the articles in each quarter (Function 1).
            3. Take the column from this matrix for the 'term' I am interested in and convert this to a data.frame (Function 2)
            4. Merge the data.frames for each quarter together, then produce a large csv file with a column for each quarter and a row for each co-occurring term.

            This seems to work okay. But I wondered if anybody more skilled in R might be able to check what I am doing is correct, or might suggest a more efficient way of doing it?

            Thanks for any help!

            ...

            ANSWER

            Answered 2021-Aug-13 at 09:28

            If you are interested in counting co-occurrences within a window for specific target terms, a better way is to use the window argument of tokens_select(), and then to count occurrences from a dfm on the window-selected tokens.

            Source https://stackoverflow.com/questions/68763866

            QUESTION

            Transform Two Column Data Frame into Quanteda Dictionary Format
            Asked 2021-Aug-12 at 15:39

            My ultimate goal is to create a quanteda dictionary to use for topic classification on text data.

            However, my topic keywords are stored in a somewhat different format: I have a column of about 4000 keywords and a second column that specifies the topic each keyword belongs to. Note that there is no equal number of words for each topic. My data looks like this:

            ...

            ANSWER

            Answered 2021-Aug-12 at 15:39

            If your data is in a data.frame like topics (see data section), you can quickly get the data in a list like you want. You can use the function split.

            Source https://stackoverflow.com/questions/68758947

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install quanteda

            The normal way from CRAN, using your R GUI or.

            Support

            If you like quanteda, please consider leaving feedback or a testimonial here.
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/quanteda/quanteda.git

          • CLI

            gh repo clone quanteda/quanteda

          • sshUrl

            git@github.com:quanteda/quanteda.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link