bert-sense | Source code accompanying the KONVENS 2019 paper | Natural Language Processing library

by uhh-lt Python Version: Current License: MIT

X-Ray Key Features Code Snippets Community Discussions(1)Vulnerabilities Install Support

kandi X-RAY | bert-sense Summary

bert-sense is a Python library typically used in Artificial Intelligence, Natural Language Processing, Bert applications. bert-sense has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. However bert-sense build file is not available. You can download it from GitHub.

Source code accompanying the KONVENS 2019 paper "Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings"

Support

Quality

Security

License

Reuse

Support

bert-sense has a low active ecosystem.

It has 39 star(s) with 9 fork(s). There are 10 watchers for this library.

It had no major release in the last 6 months.

There are 1 open issues and 2 have been closed. On average issues are closed in 41 days. There are 2 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of bert-sense is current.

Quality

bert-sense has 0 bugs and 16 code smells.

Security

bert-sense has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

bert-sense code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

bert-sense is licensed under the MIT License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

bert-sense releases are not available. You will need to build from source code and install.

bert-sense has no build file. You will be need to create the build yourself to build the component from source.

bert-sense saves you 281 person hours of effort in developing the same functionality from scratch.

It has 679 lines of code, 23 functions and 3 files.

It has high code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed bert-sense and discovered the below as its top functions. This is intended to give you an instant insight into bert-sense implemented functionality, and help decide if they suit your requirements.

Compute the accuracy of the trained model
Train Embeddings
Return a list of semcor Sentence objects
Collects a list of the words and their senses
Parse Sentence
Create the word embedding map
Compute tensorflow embeddings
Given a sentence return a list of tokens
Loads the word sense embedding
Opens an XML file
Applies the Bertran tokenizer

Get all kandi verified functions for this library.

bert-sense Key Features

No Key Features are available at this moment for bert-sense.

bert-sense Examples and Code Snippets

No Code Snippets are available at this moment for bert-sense.

Community Discussions

Trending Discussions on bert-sense

How to get word embeddings from the pretrained transformers

QUESTION

How to get word embeddings from the pretrained transformers

Asked 2021-Mar-30 at 08:16

I am working on a word-level classification task on multilingual data, I am using XLM-R, I know that XLM-R uses sentencepiece as tokenizers which sometimes tokenizes words into subword.

For example the sentence "deception master" is tokenized as de ception master, the word deception has been tokenized into two sub-words.

How can I get the embedding of deception. I can take the mean of the subwords to get the embedding of the word as done here. But I have to implement my code in TensorFlow and TensorFlow computational graph doesn't support NumPy.

I could store the final hidden embeddings after taking the mean of the subwords into a NumPy array and give this array as input to the model, but I want to fine-tune the transformer.

How to get the word embeddings from the sub-word embeddings given by the transformer

...

ANSWER

Answered 2021-Mar-30 at 08:16

Joining subword embeddings into words for word labeling is not how this problem is usually approached. The usual approach is the opposite: keep the subwords as they are, but adjust the labels to respect the tokenization of the pre-trained model.

One of the reasons is that the data is typically in batches. When merging subwords into words, every sentence in the batch would end up having a different length which would require processing each sentence independently and pad the batch again – this would be slow. Also, if you do not average the neighboring embeddings, you get more fine-grained information from the loss function, which tells explicitly what subword is responsible for an error.

When tokenizing using SentencePiece, you can get the indices in the original string:

Source https://stackoverflow.com/questions/66820943

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install bert-sense

You can download it from GitHub.
You can use bert-sense like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: