scattertext | Beautiful visualizations of how language differs | Natural Language Processing library

 by   JasonKessler Python Version: 0.2.1 License: Apache-2.0

kandi X-RAY | scattertext Summary

kandi X-RAY | scattertext Summary

scattertext is a Python library typically used in Institutions, Learning, Education, Artificial Intelligence, Natural Language Processing applications. scattertext has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. You can install using 'pip install scattertext' or download it from GitHub, PyPI.

This is a tool that's intended for visualizing what words and phrases are more characteristic of a category than others. Consider the example at the top of the page. Looking at this seem overwhelming. In fact, it's a relatively simple visualization of word use during the 2012 political convention. Each dot corresponds to a word or phrase mentioned by Republicans or Democrats during their conventions. The closer a dot is to the top of the plot, the more frequently it was used by Democrats. The further right a dot, the more that word or phrase was used by Republicans. Words frequently used by both parties, like "of" and "the" and even "Mitt" tend to occur in the upper-right-hand corner. Although very low frequency words have been hidden to preserve computing resources, a word that neither party used, like "giraffe" would be in the bottom-left-hand corner. The interesting things happen close to the upper-left and lower-right corners. In the upper-left corner, words like "auto" (as in auto bailout) and "millionaires" are frequently used by Democrats but infrequently or never used by Republicans. Likewise, terms frequently used by Republicans and infrequently by Democrats occupy the bottom-right corner. These include "big government" and "olympics", referring to the Salt Lake City Olympics in which Gov. Romney was involved. Terms are colored by their association. Those that are more associated with Democrats are blue, and those more associated with Republicans red. Terms that are most characteristic of the both sets of documents are displayed on the far-right of the visualization. The inspiration for this visualization came from Dataclysm (Rudder, 2014). Scattertext is designed to help you build these graphs and efficiently label points on them. The documentation (including this readme) is a work in progress. Please see the tutorial below as well as the PyData 2017 Tutorial. Poking around the code and tests should give you a good idea of how things work. The library covers some novel and effective term-importance formulas, including Scaled F-Score.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              scattertext has a medium active ecosystem.
              It has 2072 star(s) with 278 fork(s). There are 56 watchers for this library.
              There were 2 major release(s) in the last 6 months.
              There are 18 open issues and 77 have been closed. On average issues are closed in 5 days. There are 1 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of scattertext is 0.2.1

            kandi-Quality Quality

              scattertext has 0 bugs and 0 code smells.

            kandi-Security Security

              scattertext has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              scattertext code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              scattertext is licensed under the Apache-2.0 License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              scattertext releases are available to install and integrate.
              Deployable package is available in PyPI.
              Build file is available. You can build the component from source.
              Installation instructions, examples and code snippets are available.
              scattertext saves you 1756 person hours of effort in developing the same functionality from scratch.
              It has 3886 lines of code, 376 functions and 73 files.
              It has medium code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed scattertext and discovered the below as its top functions. This is intended to give you an instant insight into scattertext implemented functionality, and help decide if they suit your requirements.
            • Produces a scatter plot
            • Returns the metadata associated with this store
            • Inject x and y coordinates
            • Verify that coordinates are valid
            • Produce pairplot plot
            • Returns a pandas DataFrame containing the category projection
            • Get y axis
            • Get the X axis of the projection
            • Produce a frequency exporter
            • Produce a scattertext explorer
            • Compute the similarity score of a given category
            • Produce a characteristic explorer
            • Generate a ScatterTextorer
            • Produce a scattertext explorer explorer
            • Compute the optimal category projection
            • Draw the term rank score
            • Return the metadata associated with this item
            • Generate HTML to HTML
            • Removes the specified categories
            • Get the javascript for the plot
            • Produce a scattertext table
            • Produce a semicolore explorer
            • Produce a scatter plot of the given dataframe
            • Produce a 4 - square explorer
            • Produces a 4 - square plot
            • Extract count phrases from text or text
            • Produce a two - axis plot
            • Produce a corpus from a given corpus
            Get all kandi verified functions for this library.

            scattertext Key Features

            No Key Features are available at this moment for scattertext.

            scattertext Examples and Code Snippets

            default
            Pythondot img1Lines of Code : 140dot img1no licencesLicense : No License
            copy iconCopy
             >>> f = DFAFilter()
             >>> f.add("sexy")
             >>> f.filter("hello sexy baby")
             hello **** baby
            
            >>> import langid
            >>> langid.classify("This is a test")
            ('en', -54.41310358047485)
            
            from langdetect import detect
              
            A corpus of Spanish political speeches from 1937 to 2019
            HTMLdot img2Lines of Code : 12dot img2no licencesLicense : No License
            copy iconCopy
            @inproceedings{alvarez-mellado-2020-corpus,
                title = "A Corpus of {S}panish Political Speeches from 1937 to 2019",
                author = "{\'A}lvarez-Mellado, Elena",
                booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference",
               

            Community Discussions

            QUESTION

            Assertion Error when producing ScatterText Visualisation
            Asked 2021-Jul-09 at 21:44

            I'm new to scattertext and have written the code which should produce an interactive html visualisation.

            ...

            ANSWER

            Answered 2021-Jul-09 at 21:44

            Make sure at least in of the values in the sentiment column of your data frame is the exact string "1".

            Source https://stackoverflow.com/questions/68273278

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install scattertext

            Install Python 3.4 or higher and run:. If you cannot (or don't want to) install spaCy, substitute nlp = spacy.load('en') lines with nlp = scattertext.WhitespaceNLP.whitespace_nlp. Note, this is not compatible with word_similarity_explorer, and the tokenization and sentence boundary detection capabilities will be low-performance regular expressions. See demo_without_spacy.py for an example. It is recommended you install jieba, spacy, empath, astropy, flashtext, gensim and umap-learn in order to take full advantage of Scattertext. Scattertext should mostly work with Python 2.7, but it may not. The HTML outputs look best in Chrome and Safari.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
          • PyPI

            pip install scattertext

          • CLONE
          • HTTPS

            https://github.com/JasonKessler/scattertext.git

          • CLI

            gh repo clone JasonKessler/scattertext

          • sshUrl

            git@github.com:JasonKessler/scattertext.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Reuse Pre-built Kits with scattertext

            Consider Popular Natural Language Processing Libraries

            transformers

            by huggingface

            funNLP

            by fighting41love

            bert

            by google-research

            jieba

            by fxsjy

            Python

            by geekcomputers

            Try Top Libraries by JasonKessler

            Scattertext-PyData

            by JasonKesslerHTML

            agefromname

            by JasonKesslerPython

            fakeout

            by JasonKesslerHTML

            GlobalAI2018

            by JasonKesslerHTML

            PuPPyTalk

            by JasonKesslerHTML