vadnet | time Voice Activity Detection in Noisy Eniviroments | Machine Learning library
kandi X-RAY | vadnet Summary
kandi X-RAY | vadnet Summary
VadNet is a real-time voice activity detector for noisy enviroments. It implements an end-to-end learning approach based on Deep Neural Networks. In the extended version, gender and laughter detection are added. To see a demonstration click on the images below.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Sample from youtube
- Convert a timestamp to a number of milliseconds
- Write a voice activity file
- Sample from an audio file
- Download files
- Train the model
- Convert audio data into frames
- Load audio from file
- Generate annotation labels from a csv file
- Return a list of all URLs in the given path
- Parse a list of tables
- Plot filter weights
- Download filenames
- Print checkpoint
- Get all files in root directory
- Get a weight from a checkpoint
- Create an instance from a module
- Update a variable from a checkpoint
- Parse a list of table entries
- Get audio
- Performs dynamic rnn layer
- Sample the next URL
- Write a transcription to an annotation file
- Initialize the audio set
- Predict from a checkpoint
- Argument parser
- Generate a sample
- Sample from file
vadnet Key Features
vadnet Examples and Code Snippets
Community Discussions
Trending Discussions on vadnet
QUESTION
I am currently working on building a CNN for sound classification. The problem is relatively simple: I need my model to detect whether there is human speech on an audio record. I made a train / test set containing records of 3 seconds on which there is human speech (speech) or not (no_speech). From these 3 seconds fragments I get a mel-spectrogram of dimension 128 x 128 that is used to feed the model.
Since it is a simple binary problem I thought the a CNN would easily detect human speech but I may have been too cocky. However, it seems that after 1 or 2 epoch the model doesn’t learn anymore, i.e. the loss doesn’t decrease as if the weights do not update and the number of correct prediction stays roughly the same. I tried to play with the hyperparameters but the problem is still the same. I tried a learning rate of 0.1, 0.01 … until 1e-7. I also tried to use a more complex model but the same occur.
Then I thought it could be due to the script itself but I cannot find anything wrong: the loss is computed, the gradients are then computed with backward()
and the weights should be updated. I would be glad you could have a quick look at the script and let me know what could go wrong! If you have other ideas of why this problem may occur I would also be glad to receive some advice on how to best train my CNN.
I based the script on the LunaTrainingApp from “Deep learning in PyTorch” by Stevens as I found the script to be elegant. Of course I modified it to match my problem, I added a way to compute the precision and recall and some other custom metrics such as the % of correct predictions.
Here is the script:
...ANSWER
Answered 2021-Jun-02 at 12:50Read it once more and let it sink.
Do you understand now what is the problem?
A convolution layer learns a static/fixed local patterns and tries to match it everywhere in the input. This is very cool and handy for images where you want to be equivariant to translation and where all pixels have the same "meaning".
However, in spectrograms, different locations have different meanings - pixels at the top part of the spectrograms mean high frequencies while the lower indicates low frequencies. Therefore, if you have matched some local pattern to a local region in the spectrogram, it may mean a completely different thing if it is matched to the upper or lower part of the spectrogram. You need a different kind of model to process spectrograms. Maybe convert the spectrogram to a 1D signal with 128 channels (frequencies) and apply 1D convolutions to it?
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install vadnet
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page