pykaldi | A Python wrapper for Kaldi | Speech library

by pykaldi Python Version: v0.2.2 License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(1)Vulnerabilities Install Support

kandi X-RAY | pykaldi Summary

pykaldi is a Python library typically used in Artificial Intelligence, Speech, Pytorch applications. pykaldi has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. However pykaldi has 4 bugs. You can download it from GitHub.

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit. It provides easy-to-use, low-overhead, first-class Python wrappers for the C++ code in Kaldi and OpenFst libraries. You can use PyKaldi to write Python code for things that would otherwise require writing C++ code such as calling low-level Kaldi functions, manipulating Kaldi and OpenFst objects in code or implementing new Kaldi tools. You can think of Kaldi as a large box of legos that you can mix and match to build custom speech recognition solutions. The best way to think of PyKaldi is as a supplement, a sidekick if you will, to Kaldi. In fact, PyKaldi is at its best when it is used alongside Kaldi. To that end, replicating the functionality of myriad command-line tools, utility scripts and shell-level recipes provided by Kaldi is a non-goal for the PyKaldi project.

Support

Quality

Security

License

Reuse

Support

pykaldi has a medium active ecosystem.

It has 936 star(s) with 243 fork(s). There are 41 watchers for this library.

It had no major release in the last 12 months.

There are 62 open issues and 206 have been closed. On average issues are closed in 308 days. There are 6 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of pykaldi is v0.2.2

Quality

pykaldi has 4 bugs (0 blocker, 0 critical, 3 major, 1 minor) and 280 code smells.

Security

pykaldi has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

pykaldi code analysis shows 0 unresolved vulnerabilities.

There are 12 security hotspots that need review.

License

pykaldi is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

pykaldi releases are available to install and integrate.

Build file is available. You can build the component from source.

Installation instructions, examples and code snippets are available.

pykaldi saves you 4450 person hours of effort in developing the same functionality from scratch.

It has 9419 lines of code, 705 functions and 132 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed pykaldi and discovered the below as its top functions. This is intended to give you an instant insight into pykaldi implemented functionality, and help decide if they suit your requirements.

Extract segments from a WAV file
Open an input file
Fast decoding function
Read a FST format file
Read from stream
Return the type of the object
Generates a SAD graph
Compile the grammar
Compute the VAD
Apply ARMA to a matrix
Determine the python library
Compute MFCC features from wav specification
Aligns the input text
Run CMAKE
Get the partial output
Segment a single frame
Determines iffst
Compute CMVN stats for two channels
Determinize the lattice of a given fst
Create a TransitionModel from the given files
Creates a field for a field
Calculate shortest shortest path
Svd decomposition
Element - wise eigenvalue operator
Eigenvectors of the matrix
Read a TransitionModel from files

Get all kandi verified functions for this library.

pykaldi Key Features

No Key Features are available at this moment for pykaldi.

pykaldi Examples and Code Snippets

No Code Snippets are available at this moment for pykaldi.

Community Discussions

Trending Discussions on pykaldi

Does Kaldi return any recognition confidence parameter, similar to Google Speech-To-Text API?

QUESTION

Does Kaldi return any recognition confidence parameter, similar to Google Speech-To-Text API?

Asked 2019-Oct-23 at 11:28

I am dealing with a speech recognition task. So far, I have been using the Google Cloud Speech Recognition API (in Python) with good results. The API returns a confidence value along with every chunk of the transcribed text. The confidence is a number between 0 and 1 as stated in the docs, but I did not find any deeper explanation of how Google's API derives this number, so I assume it somehow comes from the Neural Network that does the recognition.

The next step I want to take is to make my own (offline) automatic speech recognition program, and I found that pyKaldi should be fine up to the task. I did not start programming it yet, but I want to know beforehand (for research purposes) - can Kaldi return some similar value of confidence, as does the Google Speech-to-Text API? And what really is this "confidence", and how is it computed?

...

ANSWER

Answered 2019-Oct-23 at 11:28

Yes, pyKaldi supports confidence values (word confidence score), calculated with minimum bayes risk (MBR). You will find all the necessary information in the documentation. Here is the link to the description of the module:

https://pykaldi.github.io/api/kaldi.lat.html?highlight=mbr#module-kaldi.lat.sausages

As the name says, it is a confidence value, but it is not expressing how "probable" it is that the resulting text output for a word, derived (or given, in a probabilistic setting) from a sequence of audio chunks is correct. In my opinion the expressivity or meaningfulness is a bit fuzzy and depending on the quality of the model and the training data (noise, reverb etc.). It is meaningful in comparing alternatives, telling you the one with the higher value is more likely to be the correct one. This in turn poses the problem of which distance to call a significant difference. A single confidence value does not tell you anything, nor can you compare two different recognizer models only on the basis of their confidence values. Microsoft terms it "Instead, confidence scores provide a mechanism for comparing the relative accuracy of multiple recognition alternates for a given input. This facilitates returning the most accurate recognition result."

Source https://stackoverflow.com/questions/58397321

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install pykaldi

Like Kaldi, PyKaldi is primarily intended for speech recognition researchers and professionals. It is jam packed with goodies that one would need to build Python software taking advantage of the vast collection of utilities, algorithms and data structures provided by Kaldi and OpenFst libraries. If you are not familiar with FST-based speech recognition or have no interest in having access to the guts of Kaldi and OpenFst in Python, but only want to run a pre-trained Kaldi system as part of your Python application, do not fret. PyKaldi includes a number of high-level application oriented modules, such as asr, alignment and segmentation, that should be accessible to most Python programmers. If you are interested in using PyKaldi for research or building advanced ASR applications, you are in luck. PyKaldi comes with everything you need to read, write, inspect, manipulate or visualize Kaldi and OpenFst objects in Python. It includes Python wrappers for most functions and methods that are part of the public APIs of Kaldi and OpenFst C++ libraries. If you want to read/write files that are produced/consumed by Kaldi tools, check out I/O and table utilities in the util package. If you want to work with Kaldi matrices and vectors, e.g. convert them to NumPy ndarrays and vice versa, check out the matrix package. If you want to use Kaldi for feature extraction and transformation, check out the feat, ivector and transform packages. If you want to work with lattices or other FST structures produced/consumed by Kaldi tools, check out the fstext, lat and kws packages. If you want low-level access to Gaussian mixture models, hidden Markov models or phonetic decision trees in Kaldi, check out the gmm, sgmm2, hmm, and tree packages. If you want low-level access to Kaldi neural network models, check out the nnet3, cudamatrix and chain packages. If you want to use the decoders and language modeling utilities in Kaldi, check out the decoder, lm, rnnlm, tfrnnlm and online2 packages.
Kaldi Docs: Read these to learn more about Kaldi.
PyKaldi Docs: Consult these to learn more about the PyKaldi API.
PyKaldi Examples: Check these out to see PyKaldi in action.
PyKaldi Paper: Read this to learn more about the design of PyKaldi.
If you are using a relatively recent Linux or macOS, such as Ubuntu >= 16.04, CentOS >= 7 or macOS >= 10.13, you should be able to install PyKaldi without too much trouble. Otherwise, you will likely need to tweak the installation scripts.

Support

PyKaldi tfrnnlm package is built automatically along with the rest of PyKaldi if kaldi-tensorflow-rnnlm library can be found among Kaldi libraries. After building Kaldi, go to KALDI_DIR/src/tfrnnlm/ directory and follow the instructions given in the Makefile. Make sure the symbolic link for the kaldi-tensorflow-rnnlm library is added to the KALDI_DIR/src/lib/ directory.

Find more information at: