Disinformation: Starter Kit - Terms of Service Labeling and Readability

share link

by kandikits dot icon Updated: Jan 24, 2022

technology logo
technology logo

Solution Kit Solution Kit  

Timing is crucial for everyone in this era of globalization. Sifting through lots of documents can be difficult and time consuming. Without an abstract or summary, it can take minutes just to figure out what is provided in a paper or document. Summarizer is an algorithm that extracts sentences from a text document, determines which are most important, and returns them in a readable and structured way. In this challenge, we are inviting to build a solution for summarization on documents such as terms and conditions that preserves all the essential points of this document. You can parse different sections of the document based on their headings and create summaries for individual sections and finally show a merged summary for showing the essence of the document. Please see below a sample solution kit to jumpstart your solution on creating a simple summarizer application. To use this kit to build your own solution, scroll down to refer sections Kit Deployment Instructions and Instruction to Run. Complexity : Medium This sample solution kit does extractive summarization on the given document.

Development Environment

VSCode and Jupyter Notebook are used for development and debugging. Jupyter Notebook is a web based interactive environment often used for experiments, whereas VSCode is used to get a typical experience of IDE for developers.

notebookby jupyter

Jupyter Notebook doticonstar image 10204 doticonVersion:v7.0.0b4doticon
License: Permissive (BSD-3-Clause)

Jupyter Interactive Notebook

Support
    Quality
      Security
        License
          Reuse

            notebookby jupyter

            Jupyter Notebook doticon star image 10204 doticonVersion:v7.0.0b4doticon License: Permissive (BSD-3-Clause)

            Jupyter Interactive Notebook
            Support
              Quality
                Security
                  License
                    Reuse

                      vscodeby microsoft

                      TypeScript doticonstar image 147328 doticonVersion:1.79.2doticon
                      License: Permissive (MIT)

                      Visual Studio Code

                      Support
                        Quality
                          Security
                            License
                              Reuse

                                vscodeby microsoft

                                TypeScript doticon star image 147328 doticonVersion:1.79.2doticon License: Permissive (MIT)

                                Visual Studio Code
                                Support
                                  Quality
                                    Security
                                      License
                                        Reuse

                                          Exploratory Data Analysis

                                          For extensive analysis and exploration of data, and to deal with arrays, these libraries are used. They are also used for performing scientific computation and data manipulation.

                                          numpyby numpy

                                          Python doticonstar image 23755 doticonVersion:v1.25.0rc1doticon
                                          License: Permissive (BSD-3-Clause)

                                          The fundamental package for scientific computing with Python.

                                          Support
                                            Quality
                                              Security
                                                License
                                                  Reuse

                                                    numpyby numpy

                                                    Python doticon star image 23755 doticonVersion:v1.25.0rc1doticon License: Permissive (BSD-3-Clause)

                                                    The fundamental package for scientific computing with Python.
                                                    Support
                                                      Quality
                                                        Security
                                                          License
                                                            Reuse

                                                              pandasby pandas-dev

                                                              Python doticonstar image 38689 doticonVersion:v2.0.2doticon
                                                              License: Permissive (BSD-3-Clause)

                                                              Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

                                                              Support
                                                                Quality
                                                                  Security
                                                                    License
                                                                      Reuse

                                                                        pandasby pandas-dev

                                                                        Python doticon star image 38689 doticonVersion:v2.0.2doticon License: Permissive (BSD-3-Clause)

                                                                        Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
                                                                        Support
                                                                          Quality
                                                                            Security
                                                                              License
                                                                                Reuse

                                                                                  Text Mining

                                                                                  Libraries in this group are used for the analysis and processing of unstructured natural language

                                                                                  nltkby nltk

                                                                                  Python doticonstar image 12020 doticonVersion:Currentdoticon
                                                                                  License: Permissive (Apache-2.0)

                                                                                  NLTK Source

                                                                                  Support
                                                                                    Quality
                                                                                      Security
                                                                                        License
                                                                                          Reuse

                                                                                            nltkby nltk

                                                                                            Python doticon star image 12020 doticonVersion:Currentdoticon License: Permissive (Apache-2.0)

                                                                                            NLTK Source
                                                                                            Support
                                                                                              Quality
                                                                                                Security
                                                                                                  License
                                                                                                    Reuse

                                                                                                      spaCyby explosion

                                                                                                      Python doticonstar image 26383 doticonVersion:v3.2.6doticon
                                                                                                      License: Permissive (MIT)

                                                                                                      💫 Industrial-strength Natural Language Processing (NLP) in Python

                                                                                                      Support
                                                                                                        Quality
                                                                                                          Security
                                                                                                            License
                                                                                                              Reuse

                                                                                                                spaCyby explosion

                                                                                                                Python doticon star image 26383 doticonVersion:v3.2.6doticon License: Permissive (MIT)

                                                                                                                💫 Industrial-strength Natural Language Processing (NLP) in Python
                                                                                                                Support
                                                                                                                  Quality
                                                                                                                    Security
                                                                                                                      License
                                                                                                                        Reuse

                                                                                                                          sentencepieceby google

                                                                                                                          C++ doticonstar image 7616 doticonVersion:v0.1.99doticon
                                                                                                                          License: Permissive (Apache-2.0)

                                                                                                                          Unsupervised text tokenizer for Neural Network-based text generation.

                                                                                                                          Support
                                                                                                                            Quality
                                                                                                                              Security
                                                                                                                                License
                                                                                                                                  Reuse

                                                                                                                                    sentencepieceby google

                                                                                                                                    C++ doticon star image 7616 doticonVersion:v0.1.99doticon License: Permissive (Apache-2.0)

                                                                                                                                    Unsupervised text tokenizer for Neural Network-based text generation.
                                                                                                                                    Support
                                                                                                                                      Quality
                                                                                                                                        Security
                                                                                                                                          License
                                                                                                                                            Reuse

                                                                                                                                              Machine Learning & Natural Language Processing

                                                                                                                                              The library offers state-of-the-art pre-trained models for Natural Language Processing (NLP).

                                                                                                                                              scikit-learnby scikit-learn

                                                                                                                                              Python doticonstar image 54584 doticonVersion:1.2.2doticon
                                                                                                                                              License: Permissive (BSD-3-Clause)

                                                                                                                                              scikit-learn: machine learning in Python

                                                                                                                                              Support
                                                                                                                                                Quality
                                                                                                                                                  Security
                                                                                                                                                    License
                                                                                                                                                      Reuse

                                                                                                                                                        scikit-learnby scikit-learn

                                                                                                                                                        Python doticon star image 54584 doticonVersion:1.2.2doticon License: Permissive (BSD-3-Clause)

                                                                                                                                                        scikit-learn: machine learning in Python
                                                                                                                                                        Support
                                                                                                                                                          Quality
                                                                                                                                                            Security
                                                                                                                                                              License
                                                                                                                                                                Reuse

                                                                                                                                                                  pytorchby pytorch

                                                                                                                                                                  Python doticonstar image 67874 doticonVersion:v2.0.1doticon
                                                                                                                                                                  License: Others (Non-SPDX)

                                                                                                                                                                  Tensors and Dynamic neural networks in Python with strong GPU acceleration

                                                                                                                                                                  Support
                                                                                                                                                                    Quality
                                                                                                                                                                      Security
                                                                                                                                                                        License
                                                                                                                                                                          Reuse

                                                                                                                                                                            pytorchby pytorch

                                                                                                                                                                            Python doticon star image 67874 doticonVersion:v2.0.1doticon License: Others (Non-SPDX)

                                                                                                                                                                            Tensors and Dynamic neural networks in Python with strong GPU acceleration
                                                                                                                                                                            Support
                                                                                                                                                                              Quality
                                                                                                                                                                                Security
                                                                                                                                                                                  License
                                                                                                                                                                                    Reuse

                                                                                                                                                                                      transformersby huggingface

                                                                                                                                                                                      Python doticonstar image 104111 doticonVersion:v4.30.2doticon
                                                                                                                                                                                      License: Permissive (Apache-2.0)

                                                                                                                                                                                      🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

                                                                                                                                                                                      Support
                                                                                                                                                                                        Quality
                                                                                                                                                                                          Security
                                                                                                                                                                                            License
                                                                                                                                                                                              Reuse

                                                                                                                                                                                                transformersby huggingface

                                                                                                                                                                                                Python doticon star image 104111 doticonVersion:v4.30.2doticon License: Permissive (Apache-2.0)

                                                                                                                                                                                                🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
                                                                                                                                                                                                Support
                                                                                                                                                                                                  Quality
                                                                                                                                                                                                    Security
                                                                                                                                                                                                      License
                                                                                                                                                                                                        Reuse
                                                                                                                                                                                                          Python doticonstar image 10938 doticonVersion:v2.2.2doticon
                                                                                                                                                                                                          License: Permissive (Apache-2.0)

                                                                                                                                                                                                          Multilingual Sentence & Image Embeddings with BERT

                                                                                                                                                                                                          Support
                                                                                                                                                                                                            Quality
                                                                                                                                                                                                              Security
                                                                                                                                                                                                                License
                                                                                                                                                                                                                  Reuse

                                                                                                                                                                                                                    sentence-transformersby UKPLab

                                                                                                                                                                                                                    Python doticon star image 10938 doticonVersion:v2.2.2doticon License: Permissive (Apache-2.0)

                                                                                                                                                                                                                    Multilingual Sentence & Image Embeddings with BERT
                                                                                                                                                                                                                    Support
                                                                                                                                                                                                                      Quality
                                                                                                                                                                                                                        Security
                                                                                                                                                                                                                          License
                                                                                                                                                                                                                            Reuse

                                                                                                                                                                                                                              Utilities

                                                                                                                                                                                                                              library tqdm can be used to show progress bar for any long running process step in the code

                                                                                                                                                                                                                              tqdmby tqdm

                                                                                                                                                                                                                              Python doticonstar image 25025 doticonVersion:v4.65.0doticon
                                                                                                                                                                                                                              License: Others (Non-SPDX)

                                                                                                                                                                                                                              A Fast, Extensible Progress Bar for Python and CLI

                                                                                                                                                                                                                              Support
                                                                                                                                                                                                                                Quality
                                                                                                                                                                                                                                  Security
                                                                                                                                                                                                                                    License
                                                                                                                                                                                                                                      Reuse

                                                                                                                                                                                                                                        tqdmby tqdm

                                                                                                                                                                                                                                        Python doticon star image 25025 doticonVersion:v4.65.0doticon License: Others (Non-SPDX)

                                                                                                                                                                                                                                        A Fast, Extensible Progress Bar for Python and CLI
                                                                                                                                                                                                                                        Support
                                                                                                                                                                                                                                          Quality
                                                                                                                                                                                                                                            Security
                                                                                                                                                                                                                                              License
                                                                                                                                                                                                                                                Reuse

                                                                                                                                                                                                                                                  Testing

                                                                                                                                                                                                                                                  The libraries listed here can be used for unit testing as well as integration testing

                                                                                                                                                                                                                                                  pytestby pytest-dev

                                                                                                                                                                                                                                                  Python doticonstar image 10300 doticonVersion:7.3.2doticon
                                                                                                                                                                                                                                                  License: Permissive (MIT)

                                                                                                                                                                                                                                                  The pytest framework makes it easy to write small tests, yet scales to support complex functional testing

                                                                                                                                                                                                                                                  Support
                                                                                                                                                                                                                                                    Quality
                                                                                                                                                                                                                                                      Security
                                                                                                                                                                                                                                                        License
                                                                                                                                                                                                                                                          Reuse

                                                                                                                                                                                                                                                            pytestby pytest-dev

                                                                                                                                                                                                                                                            Python doticon star image 10300 doticonVersion:7.3.2doticon License: Permissive (MIT)

                                                                                                                                                                                                                                                            The pytest framework makes it easy to write small tests, yet scales to support complex functional testing
                                                                                                                                                                                                                                                            Support
                                                                                                                                                                                                                                                              Quality
                                                                                                                                                                                                                                                                Security
                                                                                                                                                                                                                                                                  License
                                                                                                                                                                                                                                                                    Reuse

                                                                                                                                                                                                                                                                      Kit Solution Source

                                                                                                                                                                                                                                                                      Python doticonstar image 1206 doticonVersion:0.10.1doticon
                                                                                                                                                                                                                                                                      License: Permissive (MIT)

                                                                                                                                                                                                                                                                      Easy to use extractive text summarization with BERT

                                                                                                                                                                                                                                                                      Support
                                                                                                                                                                                                                                                                        Quality
                                                                                                                                                                                                                                                                          Security
                                                                                                                                                                                                                                                                            License
                                                                                                                                                                                                                                                                              Reuse

                                                                                                                                                                                                                                                                                bert-extractive-summarizerby dmmiller612

                                                                                                                                                                                                                                                                                Python doticon star image 1206 doticonVersion:0.10.1doticon License: Permissive (MIT)

                                                                                                                                                                                                                                                                                Easy to use extractive text summarization with BERT
                                                                                                                                                                                                                                                                                Support
                                                                                                                                                                                                                                                                                  Quality
                                                                                                                                                                                                                                                                                    Security
                                                                                                                                                                                                                                                                                      License
                                                                                                                                                                                                                                                                                        Reuse

                                                                                                                                                                                                                                                                                          Deployment Information

                                                                                                                                                                                                                                                                                          The entire solution is available as a package to download from the source code repository. Please add your kit solution or prototype source repository in this section.

                                                                                                                                                                                                                                                                                          For Windows OS, Download, extract and double-click kit_installer file to install the kit. Note: Do ensure to extract the zip file before running it. The installation may take from 2 to 10 minutes based on bandwidth. 1. When you're prompted during the installation of the kit, press Y to launch the app automatically and execute cells in the notebook by selecting Cell --> Run All from Menu bar 2. To run the app manually, press N when you're prompted and locate the zip file Text_Summarizer.zip 3. Extract the zip file and navigate to the directory bert-extractive-summarizer-master 4. Open command prompt in the extracted directory bert-extractive-summarizer-master and run the command jupyter notebook For other Operating System, 1. Click here to install python 2. Click here to download the repository 3. Extract the zip file and navigate to the directory bert-extractive-summarizer-master 4. Open terminal in the extracted directory bert-extractive-summarizer-master 5. Install dependencies by executing the command pip install -r requirements.txt 6. Run the command jupyter notebook

                                                                                                                                                                                                                                                                                          Instruction to Run

                                                                                                                                                                                                                                                                                          Follow the below instructions to run the solution. 1. Locate and open the Terms-of-service-summarizer.ipynb notebook from the Jupyter Notebook browser window. 2. Execute cells in the notebook by selecting Cell --> Run All from Menu bar. 3.Output file summarized_text.txt will be saved in bert-extractive-summarizer-master directory from the kit_installer.bat location. For summarizing with your text, 1. Open the input file sample.txt in the bert-extractive-summarizer-master directory from the kit_installer.bat location. 2. Update the text that you want to summarize. 3. Execute cells in the notebook by selecting Cell --> Run All from Menu bar. 4. Output file summarized_text.txt will be saved in bert-extractive-summarizer-master directory from the kit_installer.bat location. Input file: sample.txt-contains content to be summarized. Output file: summarized_text.txt-contains summarized content.

                                                                                                                                                                                                                                                                                          Input Parameters

                                                                                                                                                                                                                                                                                          Input Parameters: 1.text varibale specifies the input text that you want to summarize. 2.minimum_length refers to the minimum length to accept as a sentence for summarizing. 3.maximum_length refers to the maximum length to accept as a sentence for summarizing. 4.sentences specifies the number of sentences in summarized text. You can additionally build interfaces and other enhancements for additional score. For any support, you can direct message us at #help-with-kandi-kits

                                                                                                                                                                                                                                                                                          Troubleshooting

                                                                                                                                                                                                                                                                                          1. While running batch file, if you encounter Windows protection alert, select More info --> Run anyway 2. During kit installer, if you encounter Windows security alert, click Allow

                                                                                                                                                                                                                                                                                          Support

                                                                                                                                                                                                                                                                                          For any support, you can direct message us at #help-with-kandi-kits