vadnet | time Voice Activity Detection in Noisy Eniviroments | Machine Learning library

by hcmlab Python Version: Current License: LGPL-3.0

X-Ray Key Features Code Snippets Community Discussions(1)Vulnerabilities Install Support

kandi X-RAY | vadnet Summary

vadnet is a Python library typically used in Artificial Intelligence, Machine Learning, Deep Learning, Pytorch, Tensorflow, Keras applications. vadnet has no bugs, it has no vulnerabilities, it has a Weak Copyleft License and it has low support. However vadnet build file is not available. You can download it from GitHub.

VadNet is a real-time voice activity detector for noisy enviroments. It implements an end-to-end learning approach based on Deep Neural Networks. In the extended version, gender and laughter detection are added. To see a demonstration click on the images below.

Support

Quality

Security

License

Reuse

Support

vadnet has a low active ecosystem.

It has 359 star(s) with 72 fork(s). There are 18 watchers for this library.

It had no major release in the last 6 months.

There are 20 open issues and 11 have been closed. On average issues are closed in 70 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of vadnet is current.

Quality

vadnet has 0 bugs and 0 code smells.

Security

vadnet has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

vadnet code analysis shows 0 unresolved vulnerabilities.

There are 0 security hotspots that need review.

License

vadnet is licensed under the LGPL-3.0 License. This license is Weak Copyleft.

Weak Copyleft licenses have some restrictions, but you can use them in commercial projects.

Reuse

vadnet releases are not available. You will need to build from source code and install.

vadnet has no build file. You will be need to create the build yourself to build the component from source.

Installation instructions, examples and code snippets are available.

vadnet saves you 765 person hours of effort in developing the same functionality from scratch.

It has 1762 lines of code, 144 functions and 35 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed vadnet and discovered the below as its top functions. This is intended to give you an instant insight into vadnet implemented functionality, and help decide if they suit your requirements.

Sample from youtube
Convert a timestamp to a number of milliseconds
Write a voice activity file
Sample from an audio file
Download files
Train the model
Convert audio data into frames
Load audio from file
Generate annotation labels from a csv file
Return a list of all URLs in the given path
Parse a list of tables
Plot filter weights
Download filenames
Print checkpoint
Get all files in root directory
Get a weight from a checkpoint
Create an instance from a module
Update a variable from a checkpoint
Parse a list of table entries
Get audio
Performs dynamic rnn layer
Sample the next URL
Write a transcription to an annotation file
Initialize the audio set
Predict from a checkpoint
Argument parser
Generate a sample
Sample from file

Get all kandi verified functions for this library.

vadnet Key Features

No Key Features are available at this moment for vadnet.

vadnet Examples and Code Snippets

No Code Snippets are available at this moment for vadnet.

Community Discussions

Trending Discussions on vadnet

Why does my convolutional model does not learn?

QUESTION

Why does my convolutional model does not learn?

Asked 2021-Jun-02 at 12:50

I am currently working on building a CNN for sound classification. The problem is relatively simple: I need my model to detect whether there is human speech on an audio record. I made a train / test set containing records of 3 seconds on which there is human speech (speech) or not (no_speech). From these 3 seconds fragments I get a mel-spectrogram of dimension 128 x 128 that is used to feed the model.

Since it is a simple binary problem I thought the a CNN would easily detect human speech but I may have been too cocky. However, it seems that after 1 or 2 epoch the model doesn’t learn anymore, i.e. the loss doesn’t decrease as if the weights do not update and the number of correct prediction stays roughly the same. I tried to play with the hyperparameters but the problem is still the same. I tried a learning rate of 0.1, 0.01 … until 1e-7. I also tried to use a more complex model but the same occur.

Then I thought it could be due to the script itself but I cannot find anything wrong: the loss is computed, the gradients are then computed with backward() and the weights should be updated. I would be glad you could have a quick look at the script and let me know what could go wrong! If you have other ideas of why this problem may occur I would also be glad to receive some advice on how to best train my CNN.

I based the script on the LunaTrainingApp from “Deep learning in PyTorch” by Stevens as I found the script to be elegant. Of course I modified it to match my problem, I added a way to compute the precision and recall and some other custom metrics such as the % of correct predictions.

Here is the script:

...

ANSWER

Answered 2021-Jun-02 at 12:50

You are applying 2D 3x3 convolutions to spectrograms.

Read it once more and let it sink.
Do you understand now what is the problem?

A convolution layer learns a static/fixed local patterns and tries to match it everywhere in the input. This is very cool and handy for images where you want to be equivariant to translation and where all pixels have the same "meaning".
However, in spectrograms, different locations have different meanings - pixels at the top part of the spectrograms mean high frequencies while the lower indicates low frequencies. Therefore, if you have matched some local pattern to a local region in the spectrogram, it may mean a completely different thing if it is matched to the upper or lower part of the spectrogram. You need a different kind of model to process spectrograms. Maybe convert the spectrogram to a 1D signal with 128 channels (frequencies) and apply 1D convolutions to it?

Source https://stackoverflow.com/questions/67804707

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install vadnet

do_bin.cmd - Installs embedded Python and downloads SSI interpreter. During the installation the script tries to detect if a GPU is available and possibly installs the GPU version of tensorflow. This requires that a NVIDIA graphic card is detected and CUDA has been installed. Nevertheless, VadNet does fine on a CPU.

Support

VadNet is implemented using the Social Signal Interpretation (SSI) framework. The processing pipeline is defined in vad[ex].pipeline and can be configured by editing vad[ex].pipeline-config. Available options are:. If the option send:do is turned on, an XML string with the detection results is streamed to a socket (see send:url). You can change the format of the XML string by editing vad.xml. To run SSI in the background, click on the tray icon and select 'Hide windows'. For more information about SSI pipelines please consult the documentation of SSI.

Find more information at: