STFT | STFT , ISTFT , mel-filterbank modules | Video Utils library

by kooBH C++ Version: Current License: Non-SPDX

X-Ray Key Features Code Snippets(3)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | STFT Summary

STFT is a C++ library typically used in Video, Video Utils applications. STFT has no bugs, it has no vulnerabilities and it has low support. However STFT has a Non-SPDX License. You can download it from GitHub.

I'm currently using FFT of Ooura. Since, it is fastest FFT in a single header file. But, sometimes (usually not), there are errors between MATLAB FFT output and Ooura FFT output. If you need to perfectly same output as MATLAB, you have to use other FFT library.

Support

Quality

Security

License

Reuse

Support

STFT has a low active ecosystem.

It has 8 star(s) with 6 fork(s). There are 2 watchers for this library.

It had no major release in the last 6 months.

There are 3 open issues and 2 have been closed. On average issues are closed in 88 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of STFT is current.

Quality

STFT has no bugs reported.

Security

STFT has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

STFT has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

STFT releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of STFT

Get all kandi verified functions for this library.

STFT Key Features

No Key Features are available at this moment for STFT.

STFT Examples and Code Snippets

Convolutional weight matrix .

python

Lines of Code : 125

License : Non-SPDX (Apache License 2.0)

Copy

def linear_to_mel_weight_matrix(num_mel_bins=20,
                                num_spectrogram_bins=129,
                                sample_rate=8000,
                                lower_edge_hertz=125.0,
                                upper

Inverse Fourier Transform .

python

Lines of Code : 118

License : Non-SPDX (Apache License 2.0)

Copy

def inverse_stft(stfts,
                 frame_length,
                 frame_step,
                 fft_length=None,
                 window_fn=window_ops.hann_window,
                 name=None):
  """Computes the inverse [Short-time Fourier Transf

Compute the MFCCs from a log - magnitude log - likelihood matrix .

python

Lines of Code : 81

License : Non-SPDX (Apache License 2.0)

Copy

def mfccs_from_log_mel_spectrograms(log_mel_spectrograms, name=None):
  """Computes [MFCCs][mfcc] of `log_mel_spectrograms`.

  Implemented with GPU-compatible ops and supports gradients.

  [Mel-Frequency Cepstral Coefficient (MFCC)][mfcc] calculati

Community Discussions

Trending Discussions on STFT

cannot reshape array of size 486 into shape (1,1)

Audio recognition and fingerprint using sklean & librosa

Python TypeError: reduce_noise() got an unexpected keyword

Plot Fourier in Frequency domain of Voice in Python

Keras custom Layer: "input_shape" is not suscriptable

Getting error while unit testing my machine learning model on Audio files

matplotlib line2d set data is very slow when displayed over an image

Librosa (Python) to Meyda (Node.js) conversion

Path for saving files (Python)

pytorch dataloader: to concatenate batch along one dimensions of the dataloader output

QUESTION

cannot reshape array of size 486 into shape (1,1)

Asked 2022-Mar-18 at 18:41

I've created a model to predict emotion by speaking! When i am trying to get features of voice i got the error

...

ANSWER

Answered 2022-Mar-18 at 18:41

IIUC, Your error came from shape of features, maybe this helps you.

For example you have features like below:

Source https://stackoverflow.com/questions/71531613

QUESTION

Audio recognition and fingerprint using sklean & librosa

Asked 2022-Jan-02 at 16:23

I want to create a model that can predict who has speak with different word.

In this case i try to use feature

...

ANSWER

Answered 2022-Jan-02 at 14:17

For the sound processing and feature extraction part, librosa is definitely going to provide you all you need.

For the machine learning part however, speaker identification (also called "voice recognition") is a relatively complex task. You probably will get more success using techniques from deep learning. You can certainly try to use random forests if you like, but you'll probably get a lower accuracy and will have to spend more time doing feature engineering. In fact, it will be a good exercise for you to compare the results you can get with the various techniques.

For an example tutorial on speaker identification using Keras, see e.g. this article.

Source https://stackoverflow.com/questions/70556124

QUESTION

Python TypeError: reduce_noise() got an unexpected keyword

Asked 2021-Dec-13 at 15:46

hi guys I'm trying to do audio classification using python and I installed a package and when I tried to use the functions, it said "TypeError: TypeError: reduce_noise() got an unexpected keyword argument 'audio_clip' hear the code of function.

import librosa import numpy as np import noisereduce as nr

def save_STFT(file, name, activity, subject): #read audio data audio_data, sample_rate = librosa.load(file) print(file)

...

ANSWER

Answered 2021-Sep-23 at 14:48

Answer to your question is in the error message.

Source https://stackoverflow.com/questions/69299518

QUESTION

Plot Fourier in Frequency domain of Voice in Python

Asked 2021-Dec-09 at 18:40

Iam facing a very strange problem with my plots. My code records my voice from the microphone and then makes some plots. A plot of voice in time domain, a plot in frequency domain and a spectrogramm. The problem is that my plot in frequency domain does not seems to be true. For example have a look at my plots.

So in this record iam saying 'one, two, three, four' or something like that. The time domain plot does make sense. The spectrogram also in my eyes does make sense because the loudest Fourier magnitudes are at normal human voice frequencies ~100 Hz.

The problem is My short time fourier transform in frequency domain plot, seems to plot very high frequencies with very high magnitude, and the human voice frequencies 1-1000 have zero value.

So what maybe is going wrong? Below i give my code

...

ANSWER

Answered 2021-Dec-09 at 18:40

With the 2D array voice (most likely Nx1, for mono recording), scipy.fft.fft ends up computing a batch of N 1D FFTs of length 1. Since the FFT of a sequence of 1 value is an identity, what you see in your 2nd plot is the absolute value of the first half of your time domain signal.

Try computing the FFT on a 1D array (a single channel), with e.g. :

Source https://stackoverflow.com/questions/70294656

QUESTION

Keras custom Layer: "input_shape" is not suscriptable

Asked 2021-Dec-02 at 14:57

Hi i'm trying to get a custom spectrogram layer going and I can't

...

ANSWER

Answered 2021-Dec-02 at 14:57

TensorFlow can't compute the output shape of your layer. As Conv2D requires a specific shape (4 dimensions), it will fail if the output shape of the previous layer is not known (None).

To fix that, you need to specify which axis you want to squeeze in you call function.

Here, I specify that this is the last axis that need to be squeezed (the channel axis).

Source https://stackoverflow.com/questions/70183877

QUESTION

Getting error while unit testing my machine learning model on Audio files

Asked 2021-Nov-15 at 21:11

I am getting errors when training my machine learning model which is for checking what a person is feeling while saying somthing. I am working with librosa, soundfile & MLPClassifier from sklearn. This is my code:

...

ANSWER

Answered 2021-Nov-15 at 21:11

Your call to os.path.basename("data/what.wav") returns 'what.wav'

You then split that using "-" as the splitter, which returns ['what.wav'], a list of one element.

But you then try to reference the third element of the list with [2], which throws an exception.

Source https://stackoverflow.com/questions/69980821

QUESTION

matplotlib line2d set data is very slow when displayed over an image

Asked 2021-Aug-22 at 15:49

Trying to figure this out for more than a week. I'm creating an acoustic labeling interactive application using matplotlib, and i want to enable users to click on a line presented on top of a spectrogram and drag it left/right using line.set_xdata(). It basically works, but VERY slow - 2-4 updated locations per second. when a spectrogram is not displayed, it works somewhat reasonable. a random matrix is added to simulate the affect.

Python==3.8.1 Matplotlib==3.4.3

I tried:

interactive mode on/off

canvas.draw_idle() instead of draw

canvas.flush_events()

And still no luck. Anybody? Thanks in advance!

Example to reproduce:

...

ANSWER

Answered 2021-Aug-22 at 15:49

If someone encounter this problem in the future - rendering with pcolorfast is significantly faster than with pcolormesh.

Source https://stackoverflow.com/questions/68880680

QUESTION

Librosa (Python) to Meyda (Node.js) conversion

Asked 2021-Aug-21 at 13:11

I am converting a Python program to Node.js, the program follows these steps:

Microphone listens with callbacks
Callbacks do a Librosa "log_mel_S" extraction
The "log_mel_S" is inferenced by an AI model
Sound is labeled

I have managed to translate all of the steps and their relatives from Python to Node.js, except for the Librosa extraction. This would be an example for the audio shape and type required:

...

ANSWER

Answered 2021-Aug-21 at 13:00

TL;DR Amplitude Spectrum is basically FFT of the signal, and Power Spectrum is a squared value of the Amplitude Spectrum, which is also referred as energy sometimes. Here is one of examples from Meyda website that is calculating Amplitude Spectrum https://github.com/catalli/audiotrainer-server/blob/df41322906c88cd6f899e8f9b9661ebb949f72e1/index.js#L17

Long answer:

Now, lets look into your code sample line by line and figure out what is it doing and how to implement it in javascript.

S = numpy.abs(librosa.stft(y=audio_sample, n_fft=1024, hop_length=500)) ** 2

this is calculating square values of 1024 bins fft of audio_sample y, which is basically a Power Spectrum or an Amplitude Spectrum squared. Please note that the abs of complex number is a vector lenth: sqrt(real_part^2 + img_part^2)

mel_S = numpy.dot(librosa.filters.mel(sr=44100, n_fft=1024, n_mels=64), S).T

this is an mfcc calculation, which is basically a product of predefined filter banks and fft squared.

log_mel_S = librosa.power_to_db(mel_S, ref=1.0, amin=1e-10, top_db=None)

this last one will convert the result to decibel (dB) units (10 * log10(S / ref))

i will extend this answer with js code-sample later, submitting it now because i think it will be helpful already as it is

Source https://stackoverflow.com/questions/68794186

QUESTION

Path for saving files (Python)

Asked 2021-Aug-13 at 09:53

I'm trying to take files from 'D:\Study\Progs\test\samples' and after transforming .wav to .png I want to save it to 'D:\Study\Progs\test\"input value"' but after "name = os.path.abspath(file)" program takes a wrong path "D:\Study\Progs\test\file.wav" not "D:\Study\Progs\test\samples\file.wav". What can I do this it? Here's my debug output And console output

...

ANSWER

Answered 2021-Aug-13 at 06:35

If you don't mind using pathlib as @Andrew suggests, I think what you're trying to do could be accomplished by using the current working directory and the stem of each .wav file to construct the filename for your .png.

Source https://stackoverflow.com/questions/68762246

QUESTION

pytorch dataloader: to concatenate batch along one dimensions of the dataloader output

Asked 2021-Jun-22 at 22:48

My dataset's __getitem__ function returns a torch.stft() M x N x D tensor with N being the audio input series with have variable length. Each item is read inside the __getitem__ function. I would like to have batches concatenated along the second dimension (N). So that by iterating the dataloader I would get data shaped as: M x (N x batch_size) x D. Is there a possible solution to this problem?

...

ANSWER

Answered 2021-Jun-22 at 17:56

You can do this with a custom collate function, passed to the DataLoader:

Source https://stackoverflow.com/questions/68087353

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install STFT

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: