tacotron | TensorFlow implementation of Google 's Tacotron speech | Speech library

 by   keithito Python Version: v0.2.0 License: MIT

kandi X-RAY | tacotron Summary

kandi X-RAY | tacotron Summary

tacotron is a Python library typically used in Institutions, Learning, Education, Artificial Intelligence, Speech, Tensorflow applications. tacotron has no bugs, it has no vulnerabilities, it has build file available, it has a Permissive License and it has medium support. You can download it from GitHub.

An implementation of Tacotron speech synthesis in TensorFlow.

            kandi-support Support

              tacotron has a medium active ecosystem.
              It has 2787 star(s) with 968 fork(s). There are 147 watchers for this library.
              It had no major release in the last 12 months.
              There are 129 open issues and 195 have been closed. On average issues are closed in 150 days. There are 8 open pull requests and 0 closed requests.
              It has a neutral sentiment in the developer community.
              The latest version of tacotron is v0.2.0

            kandi-Quality Quality

              tacotron has 0 bugs and 0 code smells.

            kandi-Security Security

              tacotron has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.
              tacotron code analysis shows 0 unresolved vulnerabilities.
              There are 0 security hotspots that need review.

            kandi-License License

              tacotron is licensed under the MIT License. This license is Permissive.
              Permissive licenses have the least restrictions, and you can use them in most projects.

            kandi-Reuse Reuse

              tacotron releases are available to install and integrate.
              Build file is available. You can build the component from source.
              Installation instructions are not available. Examples and code snippets are available.
              tacotron saves you 544 person hours of effort in developing the same functionality from scratch.
              It has 1274 lines of code, 142 functions and 28 files.
              It has high code complexity. Code complexity directly impacts maintainability of the code.

            Top functions reviewed by kandi - BETA

            kandi has reviewed tacotron and discovered the below as its top functions. This is intended to give you an instant insight into tacotron implemented functionality, and help decide if they suit your requirements.
            • Train the model
            • Create a model
            • Run the loop
            • Convert a sequence of symbols to text
            • Encoder with cbhg
            • 1D convolution layer
            • Compute cbhg
            • Parse CMU file
            • Get the pronunciation of a string
            • Synthesize sentences
            • Loads the model
            • Compute a training example from a training example
            • Parse start and end coordinates
            • Preprocess LJ speech data
            • Inverse of a spectrogram
            • Calculate the melpectrogram
            • Handles GET request
            • Cleans the text
            • Preprocess blurb files
            • Transliteration for transliteration
            • Find the endpoint of a WAV
            • Inverse of invocations
            • Convert text to English cleaners
            • Post cbhg
            • Build utterance from a path
            • Load the model
            • Computes the spectrogram of samples
            Get all kandi verified functions for this library.

            tacotron Key Features

            No Key Features are available at this moment for tacotron.

            tacotron Examples and Code Snippets

            Repository Structure:
            Pythondot img1Lines of Code : 46dot img1License : Permissive (MIT)
            copy iconCopy
            ├── datasets
            ├── en_UK		(0)
            │   └── by_book
            │       └── female
            ├── en_US		(0)
            │   └── by_book
            │       ├── female
            │       └── male
            ├── LJSpeech-1.1	(0)
            │   └── wavs
            ├── logs-Tacotr  
            Repository Structure:
            Pythondot img2Lines of Code : 23dot img2License : Permissive (MIT)
            copy iconCopy
            ├── datasets
            ├── LJSpeech-1.1	(0)
            │   └── wavs
            ├── logs-Tacotron	(2)
            │   ├── mel-spectrograms
            │   ├── plots
            │   ├── pretrained
            │   └── wavs
            ├── papers
            ├── tacotron
            │   ├── models
            Repository Structure:
            Pythondot img3Lines of Code : 23dot img3License : Permissive (MIT)
            copy iconCopy
            ├── datasets
            ├── LJSpeech-1.1	(0)
            │   └── wavs
            ├── logs-Tacotron	(2)
            │   ├── mel-spectrograms
            │   ├── plots
            │   ├── pretrained
            │   └── wavs
            ├── papers
            ├── tacotron
            │   ├── models

            Community Discussions


            About the usage of vocoders
            Asked 2022-Feb-01 at 23:05

            I'm quite new to AI and I'm currently developing a model for non-parallel voice conversions. One confusing problem that I have is the use of vocoders.

            So my model needs Mel spectrograms as the input and the current model that I'm working on is using the MelGAN vocoder (Github link) which can generate 22050Hz Mel spectrograms from raw wav files (which is what I need) and back. I recently tried WaveGlow Vocoder (PyPI link) which can also generate Mel spectrograms from raw wav files and back.

            But, in other models such as, WaveRNN , VocGAN , WaveGrad There's no clear explanation about wav to Mel spectrograms generation. Do most of these models don't require the wav to Mel spectrograms feature because they largely cater to TTS models like Tacotron? or is it possible that all of these have that feature and I'm just not aware of it?

            A clarification would be highly appreciated.



            Answered 2022-Feb-01 at 23:05
            How neural vocoders handle audio -> mel

            Check e.g. this part of the MelGAN code: https://github.com/descriptinc/melgan-neurips/blob/master/mel2wav/modules.py#L26

            Specifically, the Audio2Mel module simply uses standard methods to create log-magnitude mel spectrograms like this:

            • Compute the STFT by applying the Fourier transform to windows of the input audio,
            • Take the magnitude of the resulting complex spectrogram,
            • Multiply the magnitude spectrogram by a mel filter matrix. Note that they actually get this matrix from librosa!
            • Take the logarithm of the resulting mel spectrogram.
            Regarding the confusion

            Your confusion might stem from the fact that, usually, authors of Deep Learning papers only mean their mel-to-audio "decoder" when they talk about "vocoders" -- the audio-to-mel part is always more or less the same. I say this might be confusing since, to my understanding, the classical meaning of the term "vocoder" includes both an encoder and a decoder.

            Unfortunately, these methods will not always work exactly in the same manner as there are e.g. different methods to create the mel filter matrix, different padding conventions etc.

            For example, librosa.stft has a center argument that will pad the audio before applying the STFT, while tensorflow.signal.stft does not have this (it would require manual padding beforehand).

            An example for the different methods to create mel filters would be the htk argument in librosa.filters.mel, which switches between the "HTK" method and "Slaney". Again taking Tensorflow as an example, tf.signal.linear_to_mel_weight_matrix does not support this argument and always uses the HTK method. Unfortunately, I am not familiar with torchaudio, so I don't know if you need to be careful there, as well.

            Finally, there are of course many parameters such as the STFT window size, hop length, the frequencies covered by the mel filters etc, and changing these relative to what a reference implementation used may impact your results. Since different code repositories likely use slightly different parameters, I suppose the answer to your question "will every method do the operation(to create a mel spectrogram) in the same manner?" is "not really". At the end of the day, you will have to settle for one set of parameters either way...

            Bonus: Why are these all only decoders and the encoder is always the same?

            The direction Mel -> Audio is hard. Not even Mel -> ("normal") spectrogram is well-defined since the conversion to mel spectrum is lossy and cannot be inverted. Finally, converting a spectrogram to audio is difficult since the phase needs to be estimated. You may be familiar with methods like Griffin-Lim (again, librosa has it so you can try it out). These produce noisy, low-quality audio. So the research focuses on improving this process using powerful models.

            On the other hand, Audio -> Mel is simple, well-defined and fast. There is no need to define "custom encoders".

            Now, a whole different question is whether mel spectrograms are a "good" encoding. Using methods like variational autoencoders, you could perhaps find better (e.g. more compact, less lossy) audio encodings. These would include custom encoders and decoders and you would not get away with standard librosa functions...

            Source https://stackoverflow.com/questions/70942123


            How is sampling rate of audio related to hop length, filter length, window length of an audio and how does downsampling affect the audio parameters?
            Asked 2021-Apr-19 at 10:49

            I have audio data of around 20K files with a sampling rate of 44100Khz. I'm using the data for training the Text-to-Speech Tacotron model. However, the parameters configured for successful training are as below: Hence I need to downsample the data to 22.5Khz.



            Answered 2021-Apr-19 at 10:49

            It looks like your model requires a Mel spectrogram as input, which has been generated with the given parameters. I.e. sr=22050, hop_length=... etc. These parameters have nothing to do with downsampling.

            To create a suitable spectrogram, do something like this:

            Source https://stackoverflow.com/questions/67156469


            unable to evaluate symlinks in Dockerfile path: lstat no such file or directory
            Asked 2020-Aug-16 at 16:52

            I'm trying to run tacotron2 on docker within Ubuntu WSL2 (v.20.04) on Win10 2004 build. Docker is installed and running and I can run hello world successfully.

            (There's a nearly identical question here, but nobody has answered it.)

            When I try to run docker build -t tacotron-2_image docker/ I get the error:

            unable to prepare context: unable to evaluate symlinks in Dockerfile path: lstat /home/nate/docker/Dockerfile: no such file or directory

            So then I navigated in bash to where docker is installed (/var/lib/docker) and tried to run it there, and got the same error. In both cases I created a docker directory, but kept getting that error in all cases.

            How can I get this to work?



            Answered 2020-Aug-16 at 16:52

            As mentioned here, the error might have nothing to do with "symlinks", and everything with the lack of Dockerfile, which should be in the Tacotron-2/docker folder.

            docker build does mention:

            The docker build command builds Docker images from a Dockerfile and a “context”.
            A build’s context is the set of files located in the specified PATH or URL.

            In your case, docker build -t tacotron-2_image docker/ is supposed to be executed in the path you have cloned the Rayhane-mamah/Tacotron-2 repository.

            To be sure, you could specify said Dockerfile, but that should not be needed:

            Source https://stackoverflow.com/questions/63331959


            Running into a basic issue about navigating code in github rep as clicking on a function-reference doesn't hightlight it or show definition
            Asked 2020-Jun-24 at 06:09

            I was earlier able to browse the github repo at https://github.com/r9y9/Tacotron-2/blob/master/wavenet_vocoder/models/wavenet.py easily in browser, so that when I put cursor on top of jResidualConv1dGLU at Line84, it'd highlight and let me click on "Definition" and "References" of class ResidualConv1dGLU.

            But I used the same repo in the same browser today, and it doesn't do anything. It doesn't highlight ResidualConv1dGLU or show links for Definition/References of it. It's as if it doesn't know that it's a class.

            Is there some default setting needed to enable that? What am I missing?

            PS: (It was working a few days ago, so I am not sure what changed in just a few days)



            Answered 2020-Jun-24 at 06:09

            What might have changed yesteraday (June 23, 2020) is "Design updates to repositories and GitHub UI"

            Try and make sure to clear the cache of your browser and reload everything.

            That being said, when clicking on "Jump to", I see:

            "Code navigation not available for this commit", which is expected for a fork.
            But I see the same issue on the original repository Rayhane-mamah/Tacotron-2.

            Those repositories needs to be re-scanned by GitHub, as I mentioned here.

            Source https://stackoverflow.com/questions/62544829


            Run localhost server in Google Colab notebook
            Asked 2020-Mar-09 at 02:50

            I am trying to implement Tacotron speech synthesis with Tensorflow in Google Colab using this code form a repo in Github, below is my code and working good till the step of using localhost server, how I can to run a localhost server in a notebook in Google Colab?

            My code:



            Answered 2020-Mar-09 at 02:50

            You can do this by using tools like ngrok or remote.it

            They give you a URL that you can access from any browser to access your web server running on 8888

            Example 1: Tunneling tensorboard running on

            Source https://stackoverflow.com/questions/60571301

            Community Discussions, Code Snippets contain sources that include Stack Exchange Network


            No vulnerabilities reported

            Install tacotron

            You can download it from GitHub.
            You can use tacotron like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.


            For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .
            Find more information at:

            Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items

            Find more libraries
          • HTTPS


          • CLI

            gh repo clone keithito/tacotron

          • sshUrl


          • Stay Updated

            Subscribe to our newsletter for trending solutions and developer bootcamps

            Agree to Sign up and Terms & Conditions

            Share this Page

            share link