Disinformation: Starter Kit - Terms of Service Labeling and Readability
by kandikits Updated: Jan 24, 2022
Timing is crucial for everyone in this era of globalization. Sifting through lots of documents can be difficult and time consuming. Without an abstract or summary, it can take minutes just to figure out what is provided in a paper or document. Summarizer is an algorithm that extracts sentences from a text document, determines which are most important, and returns them in a readable and structured way. In this challenge, we are inviting to build a solution for summarization on documents such as terms and conditions that preserves all the essential points of this document. You can parse different sections of the document based on their headings and create summaries for individual sections and finally show a merged summary for showing the essence of the document. Please see below a sample solution kit to jumpstart your solution on creating a simple summarizer application. To use this kit to build your own solution, scroll down to refer sections Kit Deployment Instructions and Instruction to Run. Complexity : Medium This sample solution kit does extractive summarization on the given document.
VSCode and Jupyter Notebook are used for development and debugging. Jupyter Notebook is a web based interactive environment often used for experiments, whereas VSCode is used to get a typical experience of IDE for developers.
Jupyter Interactive Notebook
Jupyter Notebook 10114 Version:v7.0.0b2 License: Permissive (BSD-3-Clause)
Exploratory Data Analysis
For extensive analysis and exploration of data, and to deal with arrays, these libraries are used. They are also used for performing scientific computation and data manipulation.
The fundamental package for scientific computing with Python.
Python 23587 Version:v1.24.3 License: Permissive (BSD-3-Clause)
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Python 38499 Version:v2.0.2 License: Permissive (BSD-3-Clause)
Libraries in this group are used for the analysis and processing of unstructured natural language
💫 Industrial-strength Natural Language Processing (NLP) in Python
Python 26205 Version:v3.5.3 License: Permissive (MIT)
Unsupervised text tokenizer for Neural Network-based text generation.
C++ 7456 Version:v0.1.99 License: Permissive (Apache-2.0)
Machine Learning & Natural Language Processing
The library offers state-of-the-art pre-trained models for Natural Language Processing (NLP).
scikit-learn: machine learning in Python
Python 54399 Version:1.2.2 License: Permissive (BSD-3-Clause)
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Python 67319 Version:v2.0.1 License: Others (Non-SPDX)
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Python 102076 Version:v4.29.2 License: Permissive (Apache-2.0)
Multilingual Sentence & Image Embeddings with BERT
Python 10727 Version:v2.2.2 License: Permissive (Apache-2.0)
library tqdm can be used to show progress bar for any long running process step in the code
A Fast, Extensible Progress Bar for Python and CLI
Python 24859 Version:v4.65.0 License: Others (Non-SPDX)
The libraries listed here can be used for unit testing as well as integration testing
The pytest framework makes it easy to write small tests, yet scales to support complex functional testing
Python 10223 Version:7.3.1 License: Permissive (MIT)
Kit Solution Source
Easy to use extractive text summarization with BERT
Python 1132 Version:0.10.1 License: Permissive (MIT)
For Windows OS, Download, extract and double-click kit_installer file to install the kit. Note: Do ensure to extract the zip file before running it. The installation may take from 2 to 10 minutes based on bandwidth. 1. When you're prompted during the installation of the kit, press Y to launch the app automatically and execute cells in the notebook by selecting Cell --> Run All from Menu bar 2. To run the app manually, press N when you're prompted and locate the zip file Text_Summarizer.zip 3. Extract the zip file and navigate to the directory bert-extractive-summarizer-master 4. Open command prompt in the extracted directory bert-extractive-summarizer-master and run the command jupyter notebook For other Operating System, 1. Click here to install python 2. Click here to download the repository 3. Extract the zip file and navigate to the directory bert-extractive-summarizer-master 4. Open terminal in the extracted directory bert-extractive-summarizer-master 5. Install dependencies by executing the command pip install -r requirements.txt 6. Run the command jupyter notebook
Instruction to Run
Follow the below instructions to run the solution. 1. Locate and open the Terms-of-service-summarizer.ipynb notebook from the Jupyter Notebook browser window. 2. Execute cells in the notebook by selecting Cell --> Run All from Menu bar. 3.Output file summarized_text.txt will be saved in bert-extractive-summarizer-master directory from the kit_installer.bat location. For summarizing with your text, 1. Open the input file sample.txt in the bert-extractive-summarizer-master directory from the kit_installer.bat location. 2. Update the text that you want to summarize. 3. Execute cells in the notebook by selecting Cell --> Run All from Menu bar. 4. Output file summarized_text.txt will be saved in bert-extractive-summarizer-master directory from the kit_installer.bat location. Input file: sample.txt-contains content to be summarized. Output file: summarized_text.txt-contains summarized content.
Input Parameters: 1.text varibale specifies the input text that you want to summarize. 2.minimum_length refers to the minimum length to accept as a sentence for summarizing. 3.maximum_length refers to the maximum length to accept as a sentence for summarizing. 4.sentences specifies the number of sentences in summarized text. You can additionally build interfaces and other enhancements for additional score. For any support, you can direct message us at #help-with-kandi-kits
1. While running batch file, if you encounter Windows protection alert, select More info --> Run anyway 2. During kit installer, if you encounter Windows security alert, click Allow