It enhances understanding through automatic speech recognition
Beneficial for real - world applications like call-center transcription and meeting transcription analytics
Speaker Diarization is a developing field of study, with new approaches being published on a frequent basis.
Not many studies have been done for estimating a large number of speakers.
Diarization becomes extremely difficult when the number of speakers is huge.
Providing the number of speakers to the diarization system can be advantageous
Complete solution Architecture -
Machine Learning model - To predict the no. of speakers and the time stamps of the speaker.
Web App - Frontend for the user to use the feature.
Middleware Flask Api - To connect Frontend and ML Model.
We have build a Web App that a user can use to communicate and leverage the advantages of the our Machine learning model. Since the model we build and the web app are build on different platforms, we used REST API as a middleware to connect frontend and model.
ML Model Solution Process
These are used to create our Web UI using node as backend and VueJs as front end.
1. Preprocessing: Denoising -> Speech separation
2. Embedding Extraction: YAMNet sound & classification model
3. Speaker Counting: Machine learning model selection -> Model training -> Model prediction
Technologies used for pre processing the audio data.