Implementing a Generative Adversarial Network (GAN) in PyTorch

share link

by dot icon Updated: Aug 1, 2023

technology logo
technology logo

Solution Kit Solution Kit  

Most Deep Learning frameworks either prioritize usability or performance. But Pytorch demonstrates that these two objectives may coexist. Pytorch is a Python-based Machine Learning framework. It supports imperative and Pythonic Programming Styles, supporting codes as models. It will make debugging easier, and it will remain efficient. It supports hardware accelerator tools like GPU and TPU.  


By training on various image datasets, we can show convincing evidence. The deep convolutional adversarial pair gets a hierarchy of representations from object parts. It is to scenes in both the generator and discriminator. We must implement the DCGAN architecture as a good next step try. The DCGAN paper uses 128 as batch size. The spatial size of the images is used for training. While building our DCGAN model will help with more stable training. It is used while training. Loading the dataset is very simple. It is like the PyTorch data loader.  


We can also build a full-scale DCGAN with convolutional and convolutional-transpose layers. It can take in images and generate fake, photorealistic images. We can refer to the detailed DCGAN tutorial in the PyTorch documentation. The unconditional GAN was trained on the MNIST dataset. It generates random numbers. But conditional MNIST GAN allows you to display which number the GAN will generate.  


Supervised learning with convolutional networks (CNNs) has huge adoption in computer vision applications. For instance, we can create a dummy network with the first convolutional layer. We produce a new framework for estimating generative models via an adversarial process. We train two models. In a generative model, G captures the data distribution. In the discriminative model, D estimates the sample probability from the training data. The training structure for G is to maximize the probability of D making a mistake.  


A DCGAN is a direct extension of the GAN. It uses convolutional and convolutional-transpose layers in the discriminator and generator. Fractionally-strided convolution, also known as transposed convolution. It is the opposite of a convolution operation. The generator is trying to outsmart during the training. The discriminator is generating better and better fakes. 


In contrast, the discriminator is working to become a better detective. It classifies real and fake images. The Classifier analyzes data provided by the generator. It tries to identify if it is fake generated data or real data. The training process has the training epochs and the batches for each epoch.  


This architecture can be extended with extra layers. But there is significance to the strided convolution, BatchNorm, and LLeakyReLUs. The Wasserstein GAN (WGAN) makes progress toward stable training of GANs. But sometimes, they can still generate only low-quality samples or fail to converge. DRAGAN enables quicker training. It achieves improved stability with fewer mode collapses. After calculating losses, we would add them and take the optimizer step for training. The discriminator is updated depending on the total loss generated by image sets. We may discard the last incomplete batch. It happens if the dataset size is not divisible by the batch size keeping the handling simple. The dataset class requires subdirectories in the dataset root folder.  


The boundary-seeking goal extends to continuous data. It can improve the stability of training. It is enough to use one linear layer with a sigmoid activation function. We are modifying the original hyperparameters. It provides training on larger batches and faster convergence. We can explore different training objectives, network architectures, and methods.  


In this solution, we use the unsqueeze function of the torch library

import keras
from keras import backend as K
from keras.layers import ReLU, LeakyReLU, Conv2D, Conv2DTranspose, BatchNormalization, concatenate, Flatten, Dense, Reshape
from keras.models import Model, clone_model, load_model
import numpy as np


# Build autoencoder to be the generator

img_shape = (152, 232, 1)
latent_dim = 16

inputs = keras.Input(shape=img_shape)
x = Conv2D(16, 3, padding='same', strides=(2,2), activation='relu')(inputs)
x = BatchNormalization()(x)
x = Conv2D(32, 3, padding='same', strides=(2,2), activation='relu')(x)
x = BatchNormalization()(x)
shape = K.int_shape(x)
x = Flatten()(x)
latent = Dense(latent_dim, name='latent_vector')(x)
x = Dense(shape[1] * shape[2] * shape[3])(latent)
x = Reshape((shape[1], shape[2], shape[3]))(x)
x = Conv2DTranspose(32, 3, padding='same')(x)
x = LeakyReLU()(x)
x = BatchNormalization()(x)
x = Conv2DTranspose(16, 3, padding='same', strides=(2,2))(x)
x = LeakyReLU()(x)
x = BatchNormalization()(x)
outputs = Conv2DTranspose(1, 3, padding='same', activation='tanh', strides=(2,2))(x)

generator = Model(inputs, outputs)

ae_disc = clone_model(generator)"autoencoder_discriminator"

inputs_1 = keras.Input(shape=img_shape)
inputs_2 = keras.Input(shape=img_shape)
dis_outputs_1 = ae_disc(inputs_1)
dis_outputs_2 = ae_disc(inputs_2)

# Build discriminator
discriminator = Model([inputs_1, inputs_2], [dis_outputs_1, dis_outputs_2])

# Define loss function for discriminator
loss_d = K.sum(K.abs(inputs_1 - dis_outputs_1)) - K.sum(K.abs(inputs_2 - dis_outputs_2))

# Compile discriminator
discriminator_optimizer = keras.optimizers.RMSprop(lr=0.0008, clipvalue=1.0, decay=1e-8)

# Freeze discriminator
discriminator.trainable = False 

gan_inputs = keras.Input(shape=img_shape)
dis_input_1 = keras.activations.linear(gan_inputs)
dis_input_2 = generator(gan_inputs)
[gan_outputs_1, gan_outputs_2] = discriminator([dis_input_1, dis_input_2])

# Build gan
gan = Model(gan_inputs, [gan_outputs_1, gan_outputs_2]) 

# Define gan loss
loss_g = K.sum(K.abs(gan_inputs - dis_input_2)) + K.sum(K.abs(dis_input_2 - gan_outputs_2))

# Compile gan
gan_optimizer = keras.optimizers.RMSprop(lr=0.0008, clipvalue=1.0, decay=1e-8)

# Train model 

# Squeeze pixel values into [-1, 1] since I use 'tanh' as activation for the autoencoder output
x_train = train_imgs.astype('float32') / 255.*2-1 

batch_size = 20

start = 0
for step in range(1000):
    stop = start + batch_size
    images = x_train[start: stop]
    generated_images = generator.predict(images)

    d_loss = discriminator.train_on_batch([images, generated_images], None)    
    g_loss = gan.train_on_batch(images, None)

    start += batch_size
    if start > len(x_train) - batch_size:
        start = 0

    # Print losses
    if step % 10 == 0:
        # Print metrics
        print('discriminator loss at step %s: %s' % (step, d_loss))
        print('generator loss at step %s: %s' % (step, g_loss))  
  1. import tensorflow and from tensorflow import keras
  2. Copy the code using the "Copy" button above, and paste it in a Python file in your IDE.
  3. Modify the values.
  4. Run the file and check the output.

I hope you found this useful. I have added the link to dependent libraries, version information in the following sections.

Dependent Libraries

pytorchby pytorch

Python doticonstar image 67874 doticonVersion:v2.0.1doticon
License: Others (Non-SPDX)

Tensors and Dynamic neural networks in Python with strong GPU acceleration


            pytorchby pytorch

            Python doticon star image 67874 doticonVersion:v2.0.1doticon License: Others (Non-SPDX)

            Tensors and Dynamic neural networks in Python with strong GPU acceleration

                      Environment Tested 

                      I tested this solution in the following versions. Be mindful of changes when working with other versions. 

                      1. The solution is created in Python3.11. 
                      2. The solution is tested on torch 2.0.0 version. 


                      1. For any support on kandi solution kits, please use the chat
                      2. For further learning resources, visit the Open Weaver Community learning page.


                      1. What is a deep convolutional adversarial pair, and how does it work?  

                      DCGAN, or Deep Convolutional GAN, is a generative adversarial network architecture. It uses the following steps:  

                      • Replacing any pooling layers with strided convolutions (discriminator).  
                      • fractional-strided convolutions - Using batchnorm in both the generator and the discriminator.  

                      2. How do inference networks help with the GAN PyTorch model?  

                      Inference can be defined as ALI augmenting GAN's generator with an extra n network. This network has received a data sample as input and produces a synthetic z as output.  


                      3. What is DCGAN and its applications in GAN PyTorch?  

                      A DCGAN is a direct extension of the GAN. It uses convolutional and convolutional-transpose layers in the discriminator and generator. It was first described by Radford et. It generates anime characters. Also, Animators draw characters with computer software and sometimes on paper.  


                      4. How can I use the PyTorch data loader to integrate data into my GAN model?  

                      The data loader combines a dataset and a sampler. It provides an iterable over the given dataset. The DataLoader supports map-style and iterable-style datasets with single- or multi-process loading. It has a customized loading order, optional automatic batching (collation), and memory pinning.  


                      5. How does Wasserstein GAN differ from Convolutional Neural Networks (CNNs)?  

                      The Wasserstein GAN is an extension of the generative adversarial network. It improves stability when training the model. It provides a loss function that correlates with the quality of generated images. 

                      See similar Kits and Libraries