bert-sense | Source code accompanying the KONVENS 2019 paper | Natural Language Processing library

 by   uhh-lt Python Version: Current License: MIT

kandi X-RAY | bert-sense Summary

kandi X-RAY | bert-sense Summary

bert-sense is a Python library typically used in Artificial Intelligence, Natural Language Processing, Bert applications. bert-sense has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. However bert-sense build file is not available. You can download it from GitHub.

Source code accompanying the KONVENS 2019 paper "Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings"
Support
    Quality
      Security
        License
          Reuse

            kandi-support Support

              bert-sense has a low active ecosystem.
              It has 39 star(s) with 9 fork(s). There are 10 watchers for this library.
              OutlinedDot
              It had no major release in the last 6 months.
              There are 1 open issues and 2 have been closed. On average issues are closed in 41 days. There are 2 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of bert-sense is current.

            kandi-Quality Quality

              bert-sense has 0 bugs and 16 code smells.

            kandi-Security Security

              bert-sense has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              bert-sense code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              bert-sense is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              bert-sense releases are not available. You will need to build from source code and install.
              bert-sense has no build file. You will be need to create the build yourself to build the component from source.
              bert-sense saves you 281 person hours of effort in developing the same functionality from scratch.
              It has 679 lines of code, 23 functions and 3 files.
              It has high code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed bert-sense and discovered the below as its top functions. This is intended to give you an instant insight into bert-sense implemented functionality, and help decide if they suit your requirements.
            • Compute the accuracy of the trained model
            • Train Embeddings
            • Return a list of semcor Sentence objects
            • Collects a list of the words and their senses
            • Parse Sentence
            • Create the word embedding map
            • Compute tensorflow embeddings
            • Given a sentence return a list of tokens
            • Loads the word sense embedding
            • Opens an XML file
            • Applies the Bertran tokenizer
            Get all kandi verified functions for this library.

            bert-sense Key Features

            No Key Features are available at this moment for bert-sense.

            bert-sense Examples and Code Snippets

            No Code Snippets are available at this moment for bert-sense.

            Community Discussions

            QUESTION

            How to get word embeddings from the pretrained transformers
            Asked 2021-Mar-30 at 08:16

            I am working on a word-level classification task on multilingual data, I am using XLM-R, I know that XLM-R uses sentencepiece as tokenizers which sometimes tokenizes words into subword.

            For example the sentence "deception master" is tokenized as de ception master, the word deception has been tokenized into two sub-words.

            How can I get the embedding of deception. I can take the mean of the subwords to get the embedding of the word as done here. But I have to implement my code in TensorFlow and TensorFlow computational graph doesn't support NumPy.

            I could store the final hidden embeddings after taking the mean of the subwords into a NumPy array and give this array as input to the model, but I want to fine-tune the transformer.

            How to get the word embeddings from the sub-word embeddings given by the transformer

            ...

            ANSWER

            Answered 2021-Mar-30 at 08:16

            Joining subword embeddings into words for word labeling is not how this problem is usually approached. The usual approach is the opposite: keep the subwords as they are, but adjust the labels to respect the tokenization of the pre-trained model.

            One of the reasons is that the data is typically in batches. When merging subwords into words, every sentence in the batch would end up having a different length which would require processing each sentence independently and pad the batch again – this would be slow. Also, if you do not average the neighboring embeddings, you get more fine-grained information from the loss function, which tells explicitly what subword is responsible for an error.

            When tokenizing using SentencePiece, you can get the indices in the original string:

            Source https://stackoverflow.com/questions/66820943

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network

            Vulnerabilities

            No vulnerabilities reported

            Install bert-sense

            You can download it from GitHub.
            You can use bert-sense like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.

            Support

            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
            CLONE
          • HTTPS

            https://github.com/uhh-lt/bert-sense.git

          • CLI

            gh repo clone uhh-lt/bert-sense

          • sshUrl

            git@github.com:uhh-lt/bert-sense.git

          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link

            Consider Popular Natural Language Processing Libraries

            transformers

            by huggingface

            funNLP

            by fighting41love

            bert

            by google-research

            jieba

            by fxsjy

            Python

            by geekcomputers

            Try Top Libraries by uhh-lt

            sensegram

            by uhh-ltPython

            kaldi-tuda-de

            by uhh-ltShell

            newsleak

            by uhh-ltJava

            taxi

            by uhh-ltJupyter Notebook