by kandikits Updated: Jan 24, 2022
Timing is crucial for everyone in this era of globalization. Sifting through lots of documents can be difficult and time consuming. Without an abstract or summary, it can take minutes just to figure out what is provided in a paper or document. Summarizer is an algorithm that extracts sentences from a text document, determines which are most important, and returns them in a readable and structured way. In this challenge, we are inviting to build a solution for summarization on documents such as terms and conditions that preserves all the essential points of this document. You can parse different sections of the document based on their headings and create summaries for individual sections and finally show a merged summary for showing the essence of the document. Please see below a sample solution kit to jumpstart your solution on creating a simple summarizer application. To use this kit to build your own solution, scroll down to refer sections Kit Deployment Instructions and Instruction to Run. Complexity : Medium This sample solution kit does extractive summarization on the given document.
Development Environment
VSCode and Jupyter Notebook are used for development and debugging. Jupyter Notebook is a web based interactive environment often used for experiments, whereas VSCode is used to get a typical experience of IDE for developers.
Jupyter Notebook 9702 Version:v7.0.0a11
Jupyter Notebook 9702 Version:v7.0.0a11 License: Others (Non-SPDX)
Exploratory Data Analysis
For extensive analysis and exploration of data, and to deal with arrays, these libraries are used. They are also used for performing scientific computation and data manipulation.
Python 22522 Version:1.24.1
Python 22522 Version:1.24.1 License: Permissive (BSD-3-Clause)
Python 36647 Version:1.5.2
Python 36647 Version:1.5.2 License: Permissive (BSD-3-Clause)
Text Mining
Libraries in this group are used for the analysis and processing of unstructured natural language
Python 25086 Version:3.4.4
Python 25086 Version:3.4.4 License: Permissive (MIT)
C++ 6471 Version:0.1.97
C++ 6471 Version:0.1.97 License: Permissive (Apache-2.0)
Machine Learning & Natural Language Processing
The library offers state-of-the-art pre-trained models for Natural Language Processing (NLP).
Python 52681 Version:1.2.0
Python 52681 Version:1.2.0 License: Permissive (BSD-3-Clause)
C++ 62094 Version:v1.13.1
C++ 62094 Version:v1.13.1 License: Others (Non-SPDX)
Python 78856 Version:4.25.1
Python 78856 Version:4.25.1 License: Permissive (Apache-2.0)
Python 9198 Version:2.2.2
Python 9198 Version:2.2.2 License: Permissive (Apache-2.0)
Utilities
library tqdm can be used to show progress bar for any long running process step in the code
Python 23836 Version:v4.64.1
Python 23836 Version:v4.64.1 License: Others (Non-SPDX)
Testing
The libraries listed here can be used for unit testing as well as integration testing
Python 9720 Version:7.2.0
Python 9720 Version:7.2.0 License: Permissive (MIT)
Python 1115 Version:0.10.1
Python 1115 Version:0.10.1 License: Permissive (MIT)
Instruction to Run
Follow the below instructions to run the solution. 1. Locate and open the Terms-of-service-summarizer.ipynb notebook from the Jupyter Notebook browser window. 2. Execute cells in the notebook by selecting Cell --> Run All from Menu bar. 3.Output file summarized_text.txt will be saved in bert-extractive-summarizer-master directory from the kit_installer.bat location. For summarizing with your text, 1. Open the input file sample.txt in the bert-extractive-summarizer-master directory from the kit_installer.bat location. 2. Update the text that you want to summarize. 3. Execute cells in the notebook by selecting Cell --> Run All from Menu bar. 4. Output file summarized_text.txt will be saved in bert-extractive-summarizer-master directory from the kit_installer.bat location. Input file: sample.txt-contains content to be summarized. Output file: summarized_text.txt-contains summarized content.
Input Parameters
Input Parameters: 1.text varibale specifies the input text that you want to summarize. 2.minimum_length refers to the minimum length to accept as a sentence for summarizing. 3.maximum_length refers to the maximum length to accept as a sentence for summarizing. 4.sentences specifies the number of sentences in summarized text. You can additionally build interfaces and other enhancements for additional score. For any support, you can direct message us at #help-with-kandi-kits
Troubleshooting
1. While running batch file, if you encounter Windows protection alert, select More info --> Run anyway 2. During kit installer, if you encounter Windows security alert, click Allow
Support
For any support, you can direct message us at #help-with-kandi-kits
Open Weaver – Develop Applications Faster with Open Source