kneser-ney | Kneser-Ney implementation in Python | Build Tool library
kandi X-RAY | kneser-ney Summary
kandi X-RAY | kneser-ney Summary
Kneser-Ney implementation in Python
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Train the model
- Calculate the adjacency counts
- Calculates discounts for the given counts
- Calculate backoff probabilities
- Interpolate the given orders
- Returns the discount for a given count
- Calculate the probability of the given unigrams
- Generate a sentence
- Generate next word
- Return the context of the sentence
- The highest order probability of the model
- Compute the log probability of a sentence
- Returns the log probability of ngram
kneser-ney Key Features
kneser-ney Examples and Code Snippets
Community Discussions
Trending Discussions on kneser-ney
QUESTION
I want to train a language model using NLTK in python but I got into several problems. first of all, I don't know why my words turn into just characters as I write something like this :
...ANSWER
Answered 2019-Mar-02 at 20:35The padded_everygram_pipeline
function expects a list of list of n-grams. You should fix your first code snippet as follows. Also python generators are lazy sequences, you can't iterate them more than once.
QUESTION
I have the frequency distribution of my trigram followed by training the Kneser-Ney.
When I check for kneser_ney.prob
of a trigram that is not in the list_of_trigrams
I get zero! What am I doing wrong?
ANSWER
Answered 2019-Jul-15 at 11:07I think what you are observing is perfectly normal.
From the Wikipedia page (method section) for Kneser-Ney smoothing:
Please note that p_KN is a proper distribution, as the values defined in above way are non-negative and sum to one.
and the probability is 0
when the ngram
did not occurred in corpus.
Quoting from the answer you cite:
This is the whole point of smoothing, to reallocate some probability mass from the ngrams appearing in the corpus to those that don't so that you don't end up with a bunch of 0 probability ngrams.
The above sentence does not mean that with Kneser-Ney smoothing you will have a non-zero probability for any ngram you pick, it means that, given a corpus, it will assign a probability to existing ngrams in such a way that you have some spare probability to use for other ngrams in later analyses. This spare probability is something you have to assign for non-occurring ngrams, not something that is inherent to the Kneser-Ney smoothing.
EDITJust for the sake of completeness I report the code to observe the behavior (largely taken from here, and adapted to Python 3):
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install kneser-ney
You can use kneser-ney like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page