tensorflow-ctc-speech-recognition | Connectionist Temporal Classification | Speech library
kandi X-RAY | tensorflow-ctc-speech-recognition Summary
kandi X-RAY | tensorflow-ctc-speech-recognition Summary
Application of Connectionist Temporal Classification (CTC) for Speech Recognition (Tensorflow 1.0 but compatible with 2.0).
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Run ctc
- Generate next batch
- Converts inputs to CTCC format
- Convert a sequence of sequences into a sparse matrix
- Decode a batch of data
- Write array to file
- Write line to file
- Generate audio
- Argument parser
- Returns a random speaker list
- Get a list of all available speaker names
tensorflow-ctc-speech-recognition Key Features
tensorflow-ctc-speech-recognition Examples and Code Snippets
Community Discussions
Trending Discussions on tensorflow-ctc-speech-recognition
QUESTION
I'm trying to understand how CTC implementation works in TensorFlow. I've wrote a quick example just to test CTC function, but for some reason I'm gettign inf
for some target/input values and I'm sure why is that happing!?
Code:
...ANSWER
Answered 2018-Oct-02 at 09:21Look closely at your input texts (rand_target), I'm sure you see some simple pattern which correlates with the inf loss value ;-)
A short explanation of what is happening: CTC encodes text by allowing each character to be repeated and it also allows a non-character marker (called "CTC blank label") to be inserted between characters. Undoing this encoding (or decoding) then simply means throwing away repeated characters and then throwing away all blanks. To give some examples ("..." corresponds to text, '...' to encodings and '-' to the blank label):
- "to" -> 'tttooo', or 't-o' or 't-oo', or 'to', and so on ...
- "too" -> 'to-o', or 'tttoo---oo', or '---t-o-o--', but NOT 'too' (think about how the decoded 'too' would look like)
Now we know enough to see why some of your samples fail:
- the length of your input text is 2
- the length of the encodings is 2
- if the input character is repeated (e.g. '11', or as a python list: [1, 1]), then the only way to encode this would be by placing a blank in between (think abound decoding '11' and '1-1'). But then the encoding would have a length of 3.
- so, there is no way to encode texts of length 2 with a repeated character into a length 2 encoding, therefore the TF loss implementation returns inf
You can also imagine the encoding as a state machine - see illustration below. The text "11" can be represented by all possible paths starting at a start state (two leftmost states) and ending at a final state (two rightmost states). As you can see, the shortest possible path is '1-1'.
To conclude, you have to account for at least one additional blank to be inserted for each repeated character in the input text. Maybe this article helps in understanding CTC: https://towardsdatascience.com/3797e43a86c
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install tensorflow-ctc-speech-recognition
A very small subset of the VCTK Corpus composed of only one speaker: p225.
Only 5 sentences of this speaker, denoted as: 001, 002, 003, 004 and 005.
One LSTM layer rnn.LSTMCell with 100 units, completed by a softmax.
Batch size of 1.
Momentum Optimizer with learning rate of 0.005 and momentum of 0.9.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page