python_speech_features | library provides common speech features for ASR
kandi X-RAY | python_speech_features Summary
kandi X-RAY | python_speech_features Summary
This library provides common speech features for ASR including MFCCs and filterbank energies.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Compute the filter bank for a signal
- Generate filterbanks
- Calculate melcius
- Convert mel number tohz
- Compute MFCC of a given signal
- Compute the filter bank
- Calculate the FFT of a given frequency range
- Lift the cepstra
- Calculate the log powspec
- Compute the magnitude of a sequence of frames
- Compute the magnitude of frames in frames
- Generate frames of a signal
- Make a rolling window of an array
- Deframes the rec_signal signal
- Compute the log of a wavefunction
- Calculate the delta feature
python_speech_features Key Features
python_speech_features Examples and Code Snippets
@InProceedings{Nagrani17,
author = "Nagrani, A. and Chung, J.~S. and Zisserman, A.",
title = "VoxCeleb: a large-scale speaker identification dataset",
booktitle = "INTERSPEECH",
year = "2017",
}
@InProceedings{Nagrani17,
Community Discussions
Trending Discussions on python_speech_features
QUESTION
I am working on a .py
module, which requires me to use the python_speech_features
package. I wrote the following command in the Anaconda Prompt:
conda install -c contango python_speech_features
But I am getting the following error:
...ANSWER
Answered 2021-Sep-19 at 18:37Since it is pure Python and no one is actively maintaining a Conda build, feel free to install from PyPI:
QUESTION
how to calculate the timeline of an audio file after extracting MFCC features using python_speech_features
The idea is to get the timeline of the MFCC samples
...ANSWER
Answered 2020-Jun-21 at 06:04python_speech_features.mfcc(...) takes multiple additional arguments. One of them is winstep
, which specifies the amount of times between feature frames, i.e., mfcc
features. The default value is 0.01s = 10ms. In other context, e.g. librosa, this is also known as hop_length
, which is then specified in samples.
To find your timeline, you have to figure out the number of features and the feature rate. With winstep=0.01
, your features/second (your feature or frame rate) is 100 Hz. The number of frames you have is len(mfcc_feat)
.
So you'd end up with:
QUESTION
I want to train my model using 96 MFCC Features. I used Librosa and I didnt get a promising result. I then tried to use python_speech_features, however I can get no more than 26 features! why! This is the shape for the same audio file
using Librosa
ANSWER
Answered 2020-Apr-15 at 11:53So the implementations of librosa
and python_speech_features
differ from each other, structure-wise and even theory-wise. Based on the docs:
- https://librosa.github.io/librosa/generated/librosa.feature.mfcc.html (also https://librosa.github.io/librosa/generated/librosa.feature.melspectrogram.html)
- https://python-speech-features.readthedocs.io/en/latest/#python_speech_features.base.mfcc
You will notice that the outputs are different, librosa mfcc output shape = (n_mels, t)
whereas python_speech_features output = (num_frames, num_cep)
, so you need to transpose one of the two. Also you will notice that any num_ceps
value above 26 in python_speech_features
won't change a thing in the returned mfccs num_ceps
that is because you are limited by the number of filters used. Therefore, you will have to increase that too. Moreover, you need to make sure that the framing is using similar values (one is using samples count and the other durations) so you will have to fix that. Also python_speech_features
accepts int16 values returned by scipy read function but librosa requires a float32, so you have to convert the read array or use librosa.load()
. Here is a small snippet that includes the previous changes:
QUESTION
I'm trying to do extract MFCC features from audio (.wav file) and I have tried python_speech_features
and librosa
but they are giving completely different results:
ANSWER
Answered 2020-Mar-02 at 18:16There are at least two factors at play here that explain why you get different results:
- There is no single definition of the mel scale.
Librosa
implement two ways: Slaney and HTK. Other packages might and will use different definitions, leading to different results. That being said, overall picture should be similar. That leads us to the second issue... python_speech_features
by default puts energy as first (index zero) coefficient (appendEnergy
isTrue
by default), meaning that when you ask for e.g. 13 MFCC, you effectively get 12 + 1.
In other words, you were not comparing 13 librosa
vs 13 python_speech_features
coefficients, but rather 13 vs 12. The energy can be of different magnitude and therefore produce quite different picture due to the different colour scale.
I will now demonstrate how both modules can produce similar results:
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install python_speech_features
You can use python_speech_features like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page