transfusion-asr | Transcribing Speech with Multinomial Diffusion

by RF5 Python Version: v1.0 License: No License

X-Ray Key Features Code Snippets Community Discussions Vulnerabilities Install Support

kandi X-RAY | transfusion-asr Summary

transfusion-asr is a Python library. transfusion-asr has no bugs, it has no vulnerabilities, it has build file available and it has low support. You can download it from GitHub.

Transcribing Speech with Multinomial Diffusion, training code and models.

Support

Quality

Security

License

Reuse

Support

transfusion-asr has a low active ecosystem.

It has 54 star(s) with 4 fork(s). There are 8 watchers for this library.

It had no major release in the last 12 months.

There are 0 open issues and 2 have been closed. On average issues are closed in 1 days. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of transfusion-asr is v1.0

Quality

transfusion-asr has no bugs reported.

Security

transfusion-asr has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

transfusion-asr does not have a standard license declared.

Check the repository for any license declaration and review the terms closely.

Without a license, all rights are reserved, and you cannot use the library in your applications.

Reuse

transfusion-asr releases are available to install and integrate.

Build file is available. You can build the component from source.

Installation instructions, examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of transfusion-asr

Get all kandi verified functions for this library.

transfusion-asr Key Features

No Key Features are available at this moment for transfusion-asr.

transfusion-asr Examples and Code Snippets

No Code Snippets are available at this moment for transfusion-asr.

Community Discussions

No Community Discussions are available at this moment for transfusion-asr.Refer to stack overflow page for discussions.

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install transfusion-asr

We use torch hub to make model loading very easy -- no cloning of the repo needed! The steps to perform ASR inference with the trained checkpoint is simple:. That's it, trivial! You can modify the diffusion parameters using the DSH class in transfusion/score.py and in the diffuser config. By default it uses the optimal settings found in the paper.
Instal pip dependancies: ensure torch, torchaudio, numpy, omegaconf, fairseq, fastprogress, jiwer, and pandas are installed (for full training dependencies see requirements.txt). Make sure you are using python 3.10 or above, this repo uses certain new features of python 3.10.
Load models: load the trained TransFusion model and frozen WavLM encoder:
Compute WavLM features: load a 16kHz waveform and compute the WavLM features:
Predict transcript: Perform multinomial diffusion using all the additional techniques from the paper:

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: