InferSent | Supervised Learning of Universal Sentence Representations | Machine Learning library
kandi X-RAY | InferSent Summary
kandi X-RAY | InferSent Summary
The repo is an implementation of the paper "Supervised Learning of Universal Sentence Representations from Natural Language Inference Data" (a.k.a. InferSent) by Alexis Conneau et. al.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Calculate accuracy
- Fetch next batch of data
- Shuffle the dataframe
- Builds the graph
- Builds a BSTM as encoder
- Builds a vocabulary from data
- Read vectors from file
- Applies a function to the data
- Get next batch of data
- Build embedding matrix
- Replace tokenized data
- Pads the given rep
- Find a list of items that match a dictionary
InferSent Key Features
InferSent Examples and Code Snippets
Community Discussions
Trending Discussions on InferSent
QUESTION
I am trying to embed a sentence with the help of Infersent, and Infersent uses fastText vectors for word embedding. The fastText vector file is close to 5 GiB.
When we keep the fastText vector file along with the code repository it makes the repository size huge, and makes the code difficult to share/deploy (even creating a docker container).
Is there any method to avoid keeping the vector file along with the repository, but reuse it for embedding new sentences?
...ANSWER
Answered 2019-Mar-05 at 21:52What kind of sentences are you embedding, is it the same domain as the one on which fastText embeddings were generated?
Try to get a representation of your data in tokens i.e, a set of all tokens, or some representations of the most common tokens that appear in the sentences you want to embed using fastText.
Compute the overlap of your tokens with the tokens in fastText, remove the ones from fastText which don't appear in your data representation.
I did that recently and went from a 1.4GB file with some pre-trained word embeddings to 200 MB, mainly because the overlap with my corpus was around 10%.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install InferSent
You can use InferSent like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page