BERTopic | Leveraging BERT and c-TF-IDF to create | Topic Modeling library

 by   MaartenGr Python Version: 0.16.2 License: MIT

kandi X-RAY | BERTopic Summary

kandi X-RAY | BERTopic Summary

BERTopic is a Python library typically used in Artificial Intelligence, Topic Modeling, Bert, Neural Network, Transformer applications. BERTopic has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. However BERTopic has 3 bugs. You can install using 'pip install BERTopic' or download it from GitHub, PyPI.

For quick access to common functions, here is an overview of BERTopic's main methods:. | Method | Code | |-----------------------|---| | Fit the model | .fit(docs) | | Fit the model and predict documents | .fit_transform(docs) | | Predict new documents | .transform([new_doc]) | | Access single topic | .get_topic(topic=12) | | Access all topics | .get_topics() | | Get topic freq | .get_topic_freq() | | Get all topic information| .get_topic_info() | | Get representative docs per topic | .get_representative_docs() | | Get topics per class | .topics_per_class(docs, topics, classes) | | Dynamic Topic Modeling | .topics_over_time(docs, topics, timestamps) | | Update topic representation | .update_topics(docs, topics, n_gram_range=(1, 3)) | | Reduce nr of topics | .reduce_topics(docs, topics, nr_topics=30) | | Find topics | .find_topics("vehicle") | | Save model | .save("my_model") | | Load model | BERTopic.load("my_model") | | Get parameters | .get_params() |.
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              BERTopic has a medium active ecosystem.
              It has 4329 star(s) with 551 fork(s). There are 47 watchers for this library.
              There were 2 major release(s) in the last 12 months.
              There are 92 open issues and 1015 have been closed. On average issues are closed in 38 days. There are 2 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of BERTopic is 0.16.2

            kandi-Quality Quality

              BERTopic has 3 bugs (0 blocker, 0 critical, 3 major, 0 minor) and 3 code smells.

            kandi-Security Security

              BERTopic has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              BERTopic code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              BERTopic is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              BERTopic releases are available to install and integrate.
              Deployable package is available in PyPI.
              Build file is available. You can build the component from source.
              Installation instructions, examples and code snippets are available.
              BERTopic saves you 403 person hours of effort in developing the same functionality from scratch.
              It has 957 lines of code, 57 functions and 13 files.
              It has high code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed BERTopic and discovered the below as its top functions. This is intended to give you an instant insight into BERTopic implemented functionality, and help decide if they suit your requirements.
            • Generate a graph showing the documents for a given topic model
            • Returns the given topic
            • Check that the topic model is fully fitted
            • Extract embeddings from documents
            • Visualize a topic model
            • Calculate the annotation for each topic
            • Returns the frequency of the topic
            • Returns the topic representations of the model
            • Embed a list of documents
            • Plot topics over time
            • Plot a Barchart histogram
            • Visualize the distribution of the distribution
            • Embeds a list of documents
            • Plot the given topics
            • Load a topic model
            • Visualize the rank of the model
            • Plot the heatmap
            • Visualize a distribution
            • Plot the hierarchy of the model
            • Visualize documents
            • Generate a figure of hierarchical documents
            • Visualize topics over time
            • Visualize the topics per class
            • Generates a heat map for a topic model
            • Generates a figure showing the term rank
            • Generate a pandas dataframe showing the given documents
            Get all kandi verified functions for this library.

            BERTopic Key Features

            No Key Features are available at this moment for BERTopic.

            BERTopic Examples and Code Snippets

            How to change the function parameters (visualize_topics_over_time) in BERTopic?
            Pythondot img1Lines of Code : 122dot img1License : Strong Copyleft (CC BY-SA 4.0)
            copy iconCopy
            import pandas as pd
            from typing import List
            import plotly.graph_objects as go
            from sklearn.preprocessing import normalize
            
            
            def visualize_topics_over_time(topic_model,
                                           topics_over_time: pd.DataFrame,
                    

            Community Discussions

            QUESTION

            scatter plot color bar does not look right
            Asked 2022-Mar-24 at 22:20

            I have written my code to create a scatter plot with a color bar on the right. But the color bar does not look right, in the sense that the color is too light to be mapped to the actual color used in the plot. I am not sure what is missing or wrong here. But I am hoping to get something similar to what's shown here: https://medium.com/@juliansteam/what-bert-topic-modelling-reveal-about-the-2021-unrest-in-south-africa-d0d15629a9b4 (about in the middle of the page)

            ...

            ANSWER

            Answered 2022-Mar-24 at 22:20

            The colorbar uses the given alpha=.3. In the scatterplot, many dots with the same color are superimposed, causing them to look brighter than a single dot.

            One way to tackle this, is to create a ScalarMappable object to be used by the colorbar, taking the colormap and the norm of the scatter plot (but not its alpha). Note that simply changing the alpha of the scatter object (scatter.set_alpha(1)) would also change the plot itself.

            Source https://stackoverflow.com/questions/71608280

            QUESTION

            How to change the function parameters (visualize_topics_over_time) in BERTopic?
            Asked 2022-Feb-08 at 08:25

            I am using BERTopic to perform the topic modelling, everything works perfectly fine. However, since I am forcing the algorithm to give me 10 topics using nr_topics=10 as output, and when I visualize the topics overtime using topic_model.visualize_topics_over_time(topics_over_time, top_n_topics=10, width=1250, height=450), some colors are repeated for topics as there are only 7 colors mentioned in the function visualize_topics_over_time. I tried executing the same function in my python notebook with additional color values, but it gives me the following error:

            Can someone please help me update the function with additional four colors?

            ...

            ANSWER

            Answered 2022-Feb-08 at 08:25

            To add colors to the function, you will indeed have to copy the function and change it to include more colors:

            Source https://stackoverflow.com/questions/70952260

            QUESTION

            Removal of Stop Words and Stemming/Lemmatization for BERTopic
            Asked 2021-Jun-25 at 08:49

            For Topic Modelling, I'm trying out the BERTopic: Link

            I'm little confused here, I am trying out the BERTopic on my custom Dataset.
            Since BERT was trained in such a way that it holds the semantic meaning of the text/document, Should I be removing the stop words and stem/lemmatize my documents before passing it onto BERTopic? Because I'm afraid if these stopwords might land into my topics as salient terms which they are not

            Suggestions and Advices please!

            ...

            ANSWER

            Answered 2021-Jun-25 at 08:49

            A good way to know if this is needed is to check the examples/tutorials given by the link you provided : Here is Topic Modeling. As you can see, it does not seem to do any preprocess before calling the model.

            It then seems that it's not needed or preconised by the authors of the model.

            However, removing stopwords can make the whole process faster and they often do not contains salient informations about the topic (by their nature). It is sometimes preconised not to remove them for certains tasks such as Sentiment Analysis as you can read in these links :

            Why is removing stopwords not always a good idea ?

            DataStack discussion over stopwords

            As for Lemmatization or Stemmatization, this link provides you good insights about the subject for a Topic Modeling task saying that it should be implemented for improved results.

            In conclusion, the BERTTopic does not need Lemming/stemming nor removing stopwords to work but can be implemented to enhance both processing time and results. At the end, it always depend on your needs and ressources. Giving a try to both solutions and compare the results you have depending on what you want is always a good way to understand pros and cons about these tools.

            Source https://stackoverflow.com/questions/68127754

            QUESTION

            zsh: no matches found: bertopic[visualization]
            Asked 2021-Jan-08 at 05:37

            I am trying to install bertopic[visualization] in my macbook pro using

            ...

            ANSWER

            Answered 2021-Jan-08 at 05:37

            zsh uses square brackets for pattern matching which means that if you need to pass literal square brackets as an argument to a command, you either need to escape them or quote the argument like this:

            so try using:

            pip3 install 'bertopic[visualization]'

            Source https://stackoverflow.com/questions/65623487

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install BERTopic

            Installation, with sentence-transformers, can be done using pypi:.
            For an in-depth overview of the features of BERTopic you can check the full documentation here or you can follow along with one of the examples below:.
            We start by extracting topics from the well-known 20 newsgroups dataset containing English documents:.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            Install
          • PyPI

            pip install bertopic

          • CLONE
          • HTTPS

            https://github.com/MaartenGr/BERTopic.git

          • CLI

            gh repo clone MaartenGr/BERTopic

          • sshUrl

            git@github.com:MaartenGr/BERTopic.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Reuse Pre-built Kits with BERTopic

            Consider Popular Topic Modeling Libraries

            gensim

            by RaRe-Technologies

            Familia

            by baidu

            BERTopic

            by MaartenGr

            Top2Vec

            by ddangelov

            lda

            by lda-project

            Try Top Libraries by MaartenGr

            KeyBERT

            by MaartenGrPython

            PolyFuzz

            by MaartenGrPython

            Concept

            by MaartenGrPython

            soan

            by MaartenGrPython

            cTFIDF

            by MaartenGrPython