Ai Hulks App Kit
by ddmasterdon Updated: Nov 2, 2021
Solution Kit
SPEAKER COUNTING It enhances understanding through automatic speech recognition Beneficial for real - world applications like call-center transcription and meeting transcription analytics Speaker Diarization is a developing field of study, with new approaches being published on a frequent basis. The Problem Not many studies have been done for estimating a large number of speakers. Diarization becomes extremely difficult when the number of speakers is huge. Providing the number of speakers to the diarization system can be advantageous Complete solution Architecture - Machine Learning model - To predict the no. of speakers and the time stamps of the speaker. Web App - Frontend for the user to use the feature. Middleware Flask Api - To connect Frontend and ML Model. We have build a Web App that a user can use to communicate and leverage the advantages of the our Machine learning model. Since the model we build and the web app are build on different platforms, we used REST API as a middleware to connect frontend and model.
ML Model Solution Process
These are used to create our Web UI using node as backend and VueJs as front end. 1. Preprocessing: Denoising -> Speech separation 2. Embedding Extraction: YAMNet sound & classification model 3. Speaker Counting: Machine learning model selection -> Model training -> Model prediction
Data Preprocessing
Technologies used for pre processing the audio data.
numpyby numpy
The fundamental package for scientific computing with Python.
numpyby numpy
Python
23036
Version:v1.24.2
License: Permissive (BSD-3-Clause)
matplotlibby matplotlib
matplotlib: plotting with Python
matplotlibby matplotlib
Python
17111
Version:v3.7.1
License: No License
pandasby pandas-dev
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
pandasby pandas-dev
Python
37439
Version:v2.0.0rc1
License: Permissive (BSD-3-Clause)
Audio Pre Processing
The additional libaries are use to processing the audio which are needed to be fed into the classifier model.
py-webrtcvadby wiseman
Python interface to the WebRTC Voice Activity Detector
py-webrtcvadby wiseman
C
1608
Version:Current
License: Others (Non-SPDX)
Resemblyzerby resemble-ai
A python package to analyze and compare voices with deep learning
Resemblyzerby resemble-ai
Python
2197
Version:0.1.1-dev
License: Permissive (Apache-2.0)
python-soundfileby bastibe
SoundFile is an audio library based on libsndfile, CFFI, and NumPy
python-soundfileby bastibe
Python
528
Version:0.12.1
License: Permissive (BSD-3-Clause)
SoundFileby bastibe
SoundFile is an audio library based on libsndfile, CFFI, and NumPy
SoundFileby bastibe
Python
316
Version:0.10.3post1
License: Permissive (BSD-3-Clause)
pydubby jiaaro
Manipulate audio with a simple and easy high level interface
pydubby jiaaro
Python
6964
Version:v0.25.1
License: Permissive (MIT)
Model Trainning
This libaries are used to create the two classifier models which are then both combined into one.
tensorflowby tensorflow
An Open Source Machine Learning Framework for Everyone
tensorflowby tensorflow
C++
172599
Version:v2.12.0
License: Permissive (Apache-2.0)
hubby tensorflow
A library for transfer learning by reusing parts of TensorFlow models.
hubby tensorflow
Python
3284
Version:v0.13.0
License: Permissive (Apache-2.0)
librosaby librosa
Python library for audio and music analysis
librosaby librosa
Python
5772
Version:0.10.0.post2
License: Permissive (ISC)
tqdmby tqdm
A Fast, Extensible Progress Bar for Python and CLI
tqdmby tqdm
Python
24341
Version:v4.65.0
License: Others (Non-SPDX)