Build Text Summarizer in Python
by kandikits Updated: Oct 20, 2022
NLP text summarizer, is a Python package that summarizes texts and extracts the most important sentences from a given text. Text summarizer is commonly used in news feeding websites to summarize long articles. Summarizer shortens long texts such that the summarized text preserves all the essential points of the actual text. It uses spaCy, nltk, and NumPy to do the job. This solution is also used to summarize texts (in Extractive and abstractive techniques), extract key sentences and find their TF-IDF values. You can use this package for your own projects; we are sure you'll find it useful!
Extraction-based summarization involves selecting sentences from an original document and organizing them into a cohesive summary. In contrast to extraction-based summarization, abstraction-based summaries are created by using algorithms to produce abstracts that can be used as templates.
spaCy is a library for Natural Language Processing (NLP). It provides functions for tokenization, part of speech tagging, and parsing. The library also includes pre-trained models for some languages. NLTK (Natural Language Toolkit) is another popular toolkit for NLP tasks. It is used in many research papers to solve different problems related to NLP.
Please find the kit solution in this group.
- Download, extract and double-click the kit installer file to install the kit.
- After the successful installation of the kit, press 'Y' to run the kit.
- To run the kit manually, press 'N' and locate the zip file 'Text_Summarizer.zip'
- Extract the zip file and navigate to the directory 'bert-extractive-summarizer-master'
- Open command prompt in the extracted directory 'bert-extractive-summarizer-master' and run the command 'jupyter notebook'
- Locate and open the 'Text_Summarizer.ipynb' notebook from the Jupyter Notebook browser window.
- Execute cells in the notebook
Click on the button below to download the solution and follow the deployment instructions to begin set-up. This 1-click kit has all the required dependencies and resources you may need to build your Text Summarizer in Python.
For a detailed tutorial on installing & executing the solution as well as learning resources including training & certification opportunities, please visit the OpenWeaver Community
VSCode and Jupyter Notebook are used for development and debugging. Jupyter Notebook is a web based interactive environment often used for experiments, whereas VSCode is used to get a typical experience of IDE for developers.
Jupyter Interactive Notebook
Jupyter Notebook 9830 Version:v7.0.0a15 License: Permissive (BSD-3-Clause)
Exploratory Data Analysis
For extensive analysis and exploration of data, and to deal with arrays, these libraries are used. They are also used for performing scientific computation and data manipulation.
The fundamental package for scientific computing with Python.
Python 22957 Version:v1.24.2 License: Permissive (BSD-3-Clause)
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Python 37275 Version:v2.0.0rc1 License: Permissive (BSD-3-Clause)
Libraries in this group are used for analysis and processing of unstructured natural language.
💫 Industrial-strength Natural Language Processing (NLP) in Python
Python 25506 Version:v3.5.1 License: Permissive (MIT)
Unsupervised text tokenizer for Neural Network-based text generation.
C++ 6842 Version:v0.1.97 License: Permissive (Apache-2.0)
Machine Learning & Natural Language Processing
The library offers state-of-the-art pre-trained models for Natural Language Processing (NLP).
scikit-learn: machine learning in Python
Python 53431 Version:1.2.2 License: Permissive (BSD-3-Clause)
Tensors and Dynamic neural networks in Python with strong GPU acceleration
C++ 64166 Version:v2.0.0 License: Others (Non-SPDX)
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Python 86578 Version:v4.27.2 License: Permissive (Apache-2.0)
Multilingual Sentence & Image Embeddings with BERT
Python 9678 Version:v2.2.2 License: Permissive (Apache-2.0)
library tqdm can be used to show progress bar for any long running process step in the code
A Fast, Extensible Progress Bar for Python and CLI
Python 24277 Version:v4.65.0 License: Others (Non-SPDX)
The libraries listed here can be used for unit testing as well as integration testing
The pytest framework makes it easy to write small tests, yet scales to support complex functional testing
Python 9927 Version:7.2.2 License: Permissive (MIT)
Kit Solution Source
Easy to use extractive text summarization with BERT
Python 1132 Version:0.10.1 License: Permissive (MIT)