Disinformation: Starter Kit - Terms of Service Labeling and Readability
by kandikits Updated: Jan 24, 2022
Solution Kit
Timing is crucial for everyone in this era of globalization. Sifting through lots of documents can be difficult and time consuming. Without an abstract or summary, it can take minutes just to figure out what is provided in a paper or document. Summarizer is an algorithm that extracts sentences from a text document, determines which are most important, and returns them in a readable and structured way. In this challenge, we are inviting to build a solution for summarization on documents such as terms and conditions that preserves all the essential points of this document. You can parse different sections of the document based on their headings and create summaries for individual sections and finally show a merged summary for showing the essence of the document. Please see below a sample solution kit to jumpstart your solution on creating a simple summarizer application. To use this kit to build your own solution, scroll down to refer sections Kit Deployment Instructions and Instruction to Run. Complexity : Medium This sample solution kit does extractive summarization on the given document.
Development Environment
VSCode and Jupyter Notebook are used for development and debugging. Jupyter Notebook is a web based interactive environment often used for experiments, whereas VSCode is used to get a typical experience of IDE for developers.
notebookby jupyter
Jupyter Interactive Notebook
notebookby jupyter
Jupyter Notebook 10204 Version:v7.0.0b4 License: Permissive (BSD-3-Clause)
Exploratory Data Analysis
For extensive analysis and exploration of data, and to deal with arrays, these libraries are used. They are also used for performing scientific computation and data manipulation.
numpyby numpy
The fundamental package for scientific computing with Python.
numpyby numpy
Python 23755 Version:v1.25.0rc1 License: Permissive (BSD-3-Clause)
pandasby pandas-dev
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
pandasby pandas-dev
Python 38689 Version:v2.0.2 License: Permissive (BSD-3-Clause)
Text Mining
Libraries in this group are used for the analysis and processing of unstructured natural language
spaCyby explosion
💫 Industrial-strength Natural Language Processing (NLP) in Python
spaCyby explosion
Python 26383 Version:v3.2.6 License: Permissive (MIT)
sentencepieceby google
Unsupervised text tokenizer for Neural Network-based text generation.
sentencepieceby google
C++ 7616 Version:v0.1.99 License: Permissive (Apache-2.0)
Machine Learning & Natural Language Processing
The library offers state-of-the-art pre-trained models for Natural Language Processing (NLP).
scikit-learnby scikit-learn
scikit-learn: machine learning in Python
scikit-learnby scikit-learn
Python 54584 Version:1.2.2 License: Permissive (BSD-3-Clause)
pytorchby pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
pytorchby pytorch
Python 67874 Version:v2.0.1 License: Others (Non-SPDX)
transformersby huggingface
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
transformersby huggingface
Python 104111 Version:v4.30.2 License: Permissive (Apache-2.0)
sentence-transformersby UKPLab
Multilingual Sentence & Image Embeddings with BERT
sentence-transformersby UKPLab
Python 10938 Version:v2.2.2 License: Permissive (Apache-2.0)
Utilities
library tqdm can be used to show progress bar for any long running process step in the code
tqdmby tqdm
A Fast, Extensible Progress Bar for Python and CLI
tqdmby tqdm
Python 25025 Version:v4.65.0 License: Others (Non-SPDX)
Testing
The libraries listed here can be used for unit testing as well as integration testing
pytestby pytest-dev
The pytest framework makes it easy to write small tests, yet scales to support complex functional testing
pytestby pytest-dev
Python 10300 Version:7.3.2 License: Permissive (MIT)
Kit Solution Source
bert-extractive-summarizerby dmmiller612
Easy to use extractive text summarization with BERT
bert-extractive-summarizerby dmmiller612
Python 1206 Version:0.10.1 License: Permissive (MIT)
Deployment Information
For Windows OS, Download, extract and double-click kit_installer file to install the kit. Note: Do ensure to extract the zip file before running it. The installation may take from 2 to 10 minutes based on bandwidth. 1. When you're prompted during the installation of the kit, press Y to launch the app automatically and execute cells in the notebook by selecting Cell --> Run All from Menu bar 2. To run the app manually, press N when you're prompted and locate the zip file Text_Summarizer.zip 3. Extract the zip file and navigate to the directory bert-extractive-summarizer-master 4. Open command prompt in the extracted directory bert-extractive-summarizer-master and run the command jupyter notebook For other Operating System, 1. Click here to install python 2. Click here to download the repository 3. Extract the zip file and navigate to the directory bert-extractive-summarizer-master 4. Open terminal in the extracted directory bert-extractive-summarizer-master 5. Install dependencies by executing the command pip install -r requirements.txt 6. Run the command jupyter notebook
Instruction to Run
Follow the below instructions to run the solution. 1. Locate and open the Terms-of-service-summarizer.ipynb notebook from the Jupyter Notebook browser window. 2. Execute cells in the notebook by selecting Cell --> Run All from Menu bar. 3.Output file summarized_text.txt will be saved in bert-extractive-summarizer-master directory from the kit_installer.bat location. For summarizing with your text, 1. Open the input file sample.txt in the bert-extractive-summarizer-master directory from the kit_installer.bat location. 2. Update the text that you want to summarize. 3. Execute cells in the notebook by selecting Cell --> Run All from Menu bar. 4. Output file summarized_text.txt will be saved in bert-extractive-summarizer-master directory from the kit_installer.bat location. Input file: sample.txt-contains content to be summarized. Output file: summarized_text.txt-contains summarized content.
Input Parameters
Input Parameters: 1.text varibale specifies the input text that you want to summarize. 2.minimum_length refers to the minimum length to accept as a sentence for summarizing. 3.maximum_length refers to the maximum length to accept as a sentence for summarizing. 4.sentences specifies the number of sentences in summarized text. You can additionally build interfaces and other enhancements for additional score. For any support, you can direct message us at #help-with-kandi-kits
Troubleshooting
1. While running batch file, if you encounter Windows protection alert, select More info --> Run anyway 2. During kit installer, if you encounter Windows security alert, click Allow
Support
For any support, you can direct message us at #help-with-kandi-kits