vq-vae | A Tensorflow Implementation of VQ-VAE Speaker Conversion | Speech library
kandi X-RAY | vq-vae Summary
kandi X-RAY | vq-vae Summary
A Tensorflow Implementation of VQ-VAE Speaker Conversion
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- Get a batch of data
- Load audio files
- Return the id of a speaker
- Transformer decoder
- 1d convolutional convolution layer
- Resolve residual block
- Save audio data to file
- Load a wav file
- Computes the mu - law of the audio
- Transformer encoder
vq-vae Key Features
vq-vae Examples and Code Snippets
Community Discussions
Trending Discussions on vq-vae
QUESTION
I am trying to build a 2 stage VQ-VAE-2 + PixelCNN as shown in the paper: "Generating Diverse High-Fidelity Images with VQ-VAE-2" (https://arxiv.org/pdf/1906.00446.pdf). I have 3 implementation questions:
- The paper mentions:
We allow each level in the hierarchy to separately depend on pixels.
I understand the second latent space in the VQ-VAE-2 must be conditioned on a concatenation of the 1st latent space and a downsampled version of the image. Is that correct ?
- The paper "Conditional Image Generation with PixelCNN Decoders" (https://papers.nips.cc/paper/6527-conditional-image-generation-with-pixelcnn-decoders.pdf) says:
h is a one-hot encoding that specifies a class this is equivalent to adding a class dependent bias at every layer.
As I understand it, the condition is entered as a 1D tensor that is injected into the bias through a convolution. Now for a 2 stage conditional PixelCNN, one needs to condition on the class vector but also on the latent code of the previous stage. A possibility I see is to append them and feed a 3D tensor. Does anyone see another way to do this ?
- The loss and optimization are unchanged in 2 stages. One simply adds the loss of each stage into a final loss that is optimized. Is that correct ?
ANSWER
Answered 2020-Apr-01 at 15:29Discussing with one of the author of the paper, I received answers to all those questions and shared them below.
Question 1
This is correct, but the downsampling of the image is implemented with strided convolution rather than a non-parametric resize. This can be absorbed as part of the encoder architecture in something like this (the number after each variable indicates their spatial dim, so for example h64 is [B, 64, 64, D] and so on).
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install vq-vae
You can use vq-vae like any standard Python library. You will need to make sure that you have a development environment consisting of a Python distribution including header files, a compiler, pip, and git installed. Make sure that your pip, setuptools, and wheel are up to date. When using pip it is generally recommended to install packages in a virtual environment to avoid changes to the system.
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page