KeyBERT | Minimal keyword extraction with BERT | Natural Language Processing library
kandi X-RAY | KeyBERT Summary
kandi X-RAY | KeyBERT Summary
Although there are already many methods available for keyword generation (e.g., Rake, YAKE!, TF-IDF, etc.) I wanted to create a very basic, but powerful method for extracting keywords and keyphrases. This is where KeyBERT comes in! Which uses BERT-embeddings and simple cosine similarity to find the sub-phrases in a document that are the most similar to the document itself. First, document embeddings are extracted with BERT to get a document-level representation. Then, word embeddings are extracted for N-gram words/phrases. Finally, we use cosine similarity to find the words/phrases that are the most similar to the document. The most similar words could then be identified as the words that best describe the entire document. KeyBERT is by no means unique and is created as a quick and easy method for creating keywords and keyphrases. Although there are many great papers and solutions out there that use BERT-embeddings (e.g., 1, 2, 3, ), I could not find a BERT-based solution that did not have to be trained from scratch and could be used for beginners (correct me if I'm wrong!). Thus, the goal was a pip install keybert and at most 3 lines of code in usage.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Embed documents
- Embed a document using the tokenizer
KeyBERT Key Features
KeyBERT Examples and Code Snippets
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Nat
Community Discussions
Trending Discussions on KeyBERT
QUESTION
I'm using KeyBERT on Google Colab to extract keywords from the text.
...ANSWER
Answered 2021-Jun-24 at 03:46I couldn't reproduce this issue with the code you've provided but from the provided error message I believe you're just missing an 's' in the model name so just make sure that the model name is as follows:
distilbert-base-nli-mean-tokens
and not
distilbert-base-nli-mean-token
Also refer to this link for all models available for use.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install KeyBERT
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page