dataset_loader | python library to load partitioned data
kandi X-RAY | dataset_loader Summary
kandi X-RAY | dataset_loader Summary
A python library to load partitioned data (like in spark data frames).
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of dataset_loader
dataset_loader Key Features
dataset_loader Examples and Code Snippets
Community Discussions
Trending Discussions on dataset_loader
QUESTION
I am currently working on building a CNN for sound classification. The problem is relatively simple: I need my model to detect whether there is human speech on an audio record. I made a train / test set containing records of 3 seconds on which there is human speech (speech) or not (no_speech). From these 3 seconds fragments I get a mel-spectrogram of dimension 128 x 128 that is used to feed the model.
Since it is a simple binary problem I thought the a CNN would easily detect human speech but I may have been too cocky. However, it seems that after 1 or 2 epoch the model doesn’t learn anymore, i.e. the loss doesn’t decrease as if the weights do not update and the number of correct prediction stays roughly the same. I tried to play with the hyperparameters but the problem is still the same. I tried a learning rate of 0.1, 0.01 … until 1e-7. I also tried to use a more complex model but the same occur.
Then I thought it could be due to the script itself but I cannot find anything wrong: the loss is computed, the gradients are then computed with backward()
and the weights should be updated. I would be glad you could have a quick look at the script and let me know what could go wrong! If you have other ideas of why this problem may occur I would also be glad to receive some advice on how to best train my CNN.
I based the script on the LunaTrainingApp from “Deep learning in PyTorch” by Stevens as I found the script to be elegant. Of course I modified it to match my problem, I added a way to compute the precision and recall and some other custom metrics such as the % of correct predictions.
Here is the script:
...ANSWER
Answered 2021-Jun-02 at 12:50Read it once more and let it sink.
Do you understand now what is the problem?
A convolution layer learns a static/fixed local patterns and tries to match it everywhere in the input. This is very cool and handy for images where you want to be equivariant to translation and where all pixels have the same "meaning".
However, in spectrograms, different locations have different meanings - pixels at the top part of the spectrograms mean high frequencies while the lower indicates low frequencies. Therefore, if you have matched some local pattern to a local region in the spectrogram, it may mean a completely different thing if it is matched to the upper or lower part of the spectrogram. You need a different kind of model to process spectrograms. Maybe convert the spectrogram to a 1D signal with 128 channels (frequencies) and apply 1D convolutions to it?
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install dataset_loader
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page