tacotron | Audio samples accompanying publications related to Tacotron | Speech library

by google HTML Version: Current License: Non-SPDX

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | tacotron Summary

tacotron is a HTML library typically used in Artificial Intelligence, Speech, Pytorch applications. tacotron has no bugs, it has no vulnerabilities and it has low support. However tacotron has a Non-SPDX License. You can download it from GitHub.

This repository contains audio samples accompanying publications related to Tacotron, an end-to-end speech synthesis model from the Sound Understanding and Brain teams at Google. This is not an official Google product.

Support

Quality

Security

License

Reuse

Support

tacotron has a low active ecosystem.

It has 522 star(s) with 85 fork(s). There are 76 watchers for this library.

It had no major release in the last 6 months.

tacotron has no issues reported. There are no pull requests.

It has a neutral sentiment in the developer community.

The latest version of tacotron is current.

Quality

tacotron has no bugs reported.

Security

tacotron has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

tacotron has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

tacotron releases are not available. You will need to build from source code and install.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of tacotron

Get all kandi verified functions for this library.

tacotron Key Features

No Key Features are available at this moment for tacotron.

tacotron Examples and Code Snippets

No Code Snippets are available at this moment for tacotron.

Community Discussions

Trending Discussions on tacotron

How is sampling rate of audio related to hop length, filter length, window length of an audio and how does downsampling affect the audio parameters?

unable to evaluate symlinks in Dockerfile path: lstat no such file or directory

Running into a basic issue about navigating code in github rep as clicking on a function-reference doesn't hightlight it or show definition

Run localhost server in Google Colab notebook

Reproduce sox spectrogram in scipy

'ModuleNotFoundError' when trying to import script from imported script

PyTorch and Chainer implementations of the Linear layer- are they equivalent?

Datasets like "The LJ Speech Dataset"

Fixing error output from seq2seq model

slow training with GPU google cloud ML engine

QUESTION

How is sampling rate of audio related to hop length, filter length, window length of an audio and how does downsampling affect the audio parameters?

Asked 2021-Apr-19 at 10:49

I have audio data of around 20K files with a sampling rate of 44100Khz. I'm using the data for training the Text-to-Speech Tacotron model. However, the parameters configured for successful training are as below: Hence I need to downsample the data to 22.5Khz.

...

ANSWER

Answered 2021-Apr-19 at 10:49

It looks like your model requires a Mel spectrogram as input, which has been generated with the given parameters. I.e. sr=22050, hop_length=... etc. These parameters have nothing to do with downsampling.

To create a suitable spectrogram, do something like this:

Source https://stackoverflow.com/questions/67156469

QUESTION

unable to evaluate symlinks in Dockerfile path: lstat no such file or directory

Asked 2020-Aug-16 at 16:52

I'm trying to run tacotron2 on docker within Ubuntu WSL2 (v.20.04) on Win10 2004 build. Docker is installed and running and I can run hello world successfully.

(There's a nearly identical question here, but nobody has answered it.)

When I try to run docker build -t tacotron-2_image docker/ I get the error:

unable to prepare context: unable to evaluate symlinks in Dockerfile path: lstat /home/nate/docker/Dockerfile: no such file or directory

So then I navigated in bash to where docker is installed (/var/lib/docker) and tried to run it there, and got the same error. In both cases I created a docker directory, but kept getting that error in all cases.

How can I get this to work?

...

ANSWER

Answered 2020-Aug-16 at 16:52

As mentioned here, the error might have nothing to do with "symlinks", and everything with the lack of Dockerfile, which should be in the Tacotron-2/docker folder.

docker build does mention:

The docker build command builds Docker images from a Dockerfile and a “context”.
A build’s context is the set of files located in the specified PATH or URL.

In your case, docker build -t tacotron-2_image docker/ is supposed to be executed in the path you have cloned the Rayhane-mamah/Tacotron-2 repository.

To be sure, you could specify said Dockerfile, but that should not be needed:

Source https://stackoverflow.com/questions/63331959

QUESTION

Running into a basic issue about navigating code in github rep as clicking on a function-reference doesn't hightlight it or show definition

Asked 2020-Jun-24 at 06:09

I was earlier able to browse the github repo at https://github.com/r9y9/Tacotron-2/blob/master/wavenet_vocoder/models/wavenet.py easily in browser, so that when I put cursor on top of jResidualConv1dGLU at Line84, it'd highlight and let me click on "Definition" and "References" of class ResidualConv1dGLU.

But I used the same repo in the same browser today, and it doesn't do anything. It doesn't highlight ResidualConv1dGLU or show links for Definition/References of it. It's as if it doesn't know that it's a class.

Is there some default setting needed to enable that? What am I missing?

PS: (It was working a few days ago, so I am not sure what changed in just a few days)

...

ANSWER

Answered 2020-Jun-24 at 06:09

What might have changed yesteraday (June 23, 2020) is "Design updates to repositories and GitHub UI"

Try and make sure to clear the cache of your browser and reload everything.

That being said, when clicking on "Jump to", I see:

"Code navigation not available for this commit", which is expected for a fork.
But I see the same issue on the original repository Rayhane-mamah/Tacotron-2.

Those repositories needs to be re-scanned by GitHub, as I mentioned here.

Source https://stackoverflow.com/questions/62544829

QUESTION

Run localhost server in Google Colab notebook

Asked 2020-Mar-09 at 02:50

I am trying to implement Tacotron speech synthesis with Tensorflow in Google Colab using this code form a repo in Github, below is my code and working good till the step of using localhost server, how I can to run a localhost server in a notebook in Google Colab?

My code:

...

ANSWER

Answered 2020-Mar-09 at 02:50

You can do this by using tools like ngrok or remote.it

They give you a URL that you can access from any browser to access your web server running on 8888

Example 1: Tunneling tensorboard running on

Source https://stackoverflow.com/questions/60571301

QUESTION

Reproduce sox spectrogram in scipy

Asked 2019-Jun-05 at 10:43

For example I have a wav file with speech.

I can create nice spectrogram visualization with sox:

...

ANSWER

Answered 2019-Jun-05 at 10:43

Notice the scale of the color bar in the plot generated by sox. The units are dBFS: decibels relative to full scale. To reproduce the plot with SciPy and Matplotlib, you'll need to scale the values so that the maximum is 1, and then take a logarithm of the values to convert to dB.

Here's a modified version of your script that includes an assortment of tweaks to the arguments of spectrogram and pcolormesh that creates a plot similar to the sox output.

Source https://stackoverflow.com/questions/56456419

QUESTION

'ModuleNotFoundError' when trying to import script from imported script

Asked 2019-Jun-03 at 00:47

My folder structure:

...

ANSWER

Answered 2019-Jun-03 at 00:47

This is because, when running ttsTacotron.py, Python looks up all non-relative imported modules in the directory containing ttsTacotron.py (and in the system module directories, which isn't relevant here), yet hparams.py is in the Tacotron-2 directory. The simplest fix is probably to add Tacotron-2 to the list of directories in which modules are looked up; this also eliminates the need to use importlib.

Source https://stackoverflow.com/questions/56420007

QUESTION

PyTorch and Chainer implementations of the Linear layer- are they equivalent?

Asked 2019-May-08 at 10:08

I want to use a Linear, Fully-Connected Layer as one of the input layers in my network. The input has shape (batch_size, in_channels, num_samples). It is based on the Tacotron paper: https://arxiv.org/pdf/1703.10135.pdf, the Enocder prenet part. It feels to me as if Chainer and PyTorch have different implementations of the Linear layer - are they really performing the same operations or am I misunderstanding something?

In PyTorch, behavior of the Linear layer follows the documentations: https://pytorch.org/docs/0.3.1/nn.html#torch.nn.Linear according to which, the shape of the input and output data are as follows:

Input: (N,∗,in_features) where * means any number of additional dimensions

Output: (N,∗,out_features) where all but the last dimension are the same shape as the input.

Now, let's try creating a linear layer in pytorch and performing the operation. I want an output with 8 channels, and the input data will have 3 channels.

...

ANSWER

Answered 2019-May-08 at 10:08

Chainer Linear layer (a bit frustratingly) does not apply the transformation to the last axis. Chainer flattens the rest of the axes. Instead you need to provide how many batch axes there are, documentation which is 2 in your case:

Source https://stackoverflow.com/questions/56033418

QUESTION

Datasets like "The LJ Speech Dataset"

Asked 2019-Mar-22 at 10:24

I am trying to find databases like the LJ Speech Dataset made by Keith Ito. I need to use these datasets in TacoTron 2 (Link), so I think datasets need to be structured in a certain way. the LJ database is linked directly into the tacotron 2 github page, so I think it's safe to assume it's made to work with it. So I think Databases should have the same structure as the LJ. I downloaded the Dataset and I found out that it's structured like this:

...

ANSWER

Answered 2019-Mar-22 at 10:24

There a few resources:

The main ones I would look at are Festvox (aka CMU artic) http://www.festvox.org/dbs/index.html and LibriVoc https://librivox.org/

these guys seem to be maintaining a list https://github.com/candlewill/Speech-Corpus-Collection

And I am part of a project that is collecting more (shameless self plug): https://github.com/Idlak/Living-Audio-Dataset

Source https://stackoverflow.com/questions/51123147

QUESTION

Fixing error output from seq2seq model

Asked 2018-Jun-21 at 08:09

I want to ask you how we can effectively re-train a trained seq2seq model to remove/mitigate a specific observed error output. I'm going to give an example about Speech Synthesis, but any idea from different domains, such as Machine Translation and Speech Recognition, using seq2seq model will be appreciated.

I learned the basics of seq2seq with attention model, especially for Speech Synthesis such as Tacotron-2. Using a distributed well-trained model showed me how naturally our computer could speak with the seq2seq (end-to-end) model (you can listen to some audio samples here). But still, the model fails to read some words properly, e.g., it fails to read "obey [əˈbā]" in multiple ways like [əˈbī] and [əˈbē].

The reason is obvious because the word "obey" appears too little, only three times out of 225,715 words, in our dataset (LJ Speech), and the model had no luck.

So, how can we re-train the model to overcome the error? Adding extra audio clips containing the "obey" pronunciation sounds impractical, but reusing the three audio clips has the danger of overfitting. And also, I suppose we use a well-trained model and "simply training more" is not an effective solution.

Now, this is one of the drawbacks of seq2seq model, which is not talked much. The model successfully simplified the pipelines of the traditional models, e.g., for Speech Synthesis, it replaced an acoustic model and a text analysis frontend etc by a single neural network. But we lost the controllability of our model at all. It's impossible to make the system read in a specific way.

Again, if you use a seq2seq model in any field and get an undesirable output, how do you fix that? Is there a data-scientific workaround to this problem, or maybe a cutting-edge Neural Network mechanism to gain more controllability in seq2seq model?

Thanks.

...

ANSWER

Answered 2018-Jun-21 at 08:09

I found an answer to my own question in Section 3.2 of the paper (Deep Voice 3). So, they trained both of phoneme-based model and character-based model, using phoneme inputs mainly except that character-based model is used if words cannot be converted to their phoneme representations.

Source https://stackoverflow.com/questions/50657546

QUESTION

slow training with GPU google cloud ML engine

Asked 2018-Mar-08 at 14:16

sorry if my question is so dump but i spent a lot of time trying to understand the reason of the problem but i couldn't so here it is

i'm training tacotron model on google cloud ML i have trained it before on floyd hub and it was pretty fast so i configured my project to be able to run on google ML

this is the major changes that i made to my project

original

...

ANSWER

Answered 2018-Mar-08 at 14:16

well i figured it out i should have put tensorflow-gpu==1.4 in required packages not tensorflow==1.4 ^^

Source https://stackoverflow.com/questions/49169295

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install tacotron

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: