Introduction Link to this heading

Image GAN is an innovative application that merges art and technology to create unique visuals. Utilizing advanced GAN (Generative Adversarial Network) algorithms, the app offers a range of features:

  • AI Landscapes: Explore AI-generated landscapes that capture the beauty of nature in a novel way.
  • Dynamic Cloudscapes: Experience ever-changing cloud patterns, curated solely by artificial intelligence.
  • AI Masterpieces: Witness artistry as AI channels the essence of legendary painters to create new masterworks.

For those interested in the technical aspects, this blog post will walk you through the process of implementing ImageGAN, focusing on data preparation and model training.

Preparing Training Data Link to this heading

Data Collection: Landscapes Link to this heading

To train our GAN model, we needed a dataset that was both high-quality and relevant to our application’s focus on landscapes. We used web scraping techniques to collect free and commercially usable natural landscape photos. If you’re interested, you can download the dataset from this Google Drive link.

Data Preprocessing Link to this heading

The collected images were preprocessed to fit the requirements of GAN training. Specifically, we resized the images to a uniform 512x512 pixel resolution. This step is crucial for the stability and effectiveness of the GAN model.

Training the Projected GAN Model Link to this heading

Choosing an Open-Source Project Link to this heading

For the model training, we chose the open-source project Projected GAN as our base code. Projected GAN is an exceptional work that allows for rapid training convergence. It was presented in a NeurIPS 2021 paper titled “Projected GANs Converge Faster” by Axel Sauer, Kashyap Chitta, Jens Müller, and Andreas Geiger. The repository also provides a quick start Colab notebook for those interested in trying it out.

Model Customization for Mobile Deployment Link to this heading

To ensure that the trained model could be efficiently deployed on mobile devices, we made some modifications to the model architecture. Specifically, we opted for the fastgan_lite model, which is a relatively lightweight version of the original model.

Code Modifications Link to this heading

We modified the FastganSynthesis class in the generator to adjust the ngf parameter from 128 to 64. This change effectively halved the size of the trained model without compromising the quality of generated images in our application.

Here’s a snippet of the modified code:

python
1class FastganSynthesis(nn.Module):
2    def __init__(self, ngf=64, z_dim=256, nc=3, img_resolution=256, lite=False):
3        super().__init__()
4        self.img_resolution = img_resolution
5        self.z_dim = z_dim
6        # ... (rest of the code remains the same)

By making these adjustments, we were able to train a model that not only converges quickly but is also optimized for mobile deployment.

Exporting the Trained Model to ONNX and Optimization Link to this heading

After the model training is complete, the next step is to export the model to ONNX (Open Neural Network Exchange) format so that it can run on different platforms and environments. This section will detail how to go about this process, with special emphasis on two key steps: model simplification and model compression.

Exporting to ONNX Format Link to this heading

First, we use PyTorch’s torch.onnx.export function to export the trained GAN model to ONNX format. Let’s assume the exported model file is named gan.onnx.

python
 1import torch.onnx
 2import torchvision.models as models
 3
 4### Initialize the model and input
 5model = YourTrainedGANModel()
 6x = torch.randn(batch_size, 3, 512, 512, requires_grad=True)
 7torch_out = model(x)
 8
 9### Export the model
10torch.onnx.export(model,                     # model being run
11                  x,                         # model input
12                  "gan.onnx",                # where to save the model
13                  export_params=True)        # store the trained parameter weights inside the model file

Model Simplification: Using onnxsim Link to this heading

Once the model is exported, we use the onnxsim tool to simplify the model. This step usually removes many redundant layers, thereby accelerating model inference and reducing the layers that need to be supported on the device.

bash
1pip install onnxsim
2onnxsim gan.onnx gan.sim.onnx

Model Compression: Quantization Link to this heading

Using INT8 Precision Link to this heading

The original FP32 model size is close to 100MB, which is too large for a device-side application. Therefore, we plan to use INT8 precision to compress the model, which usually reduces the model size to a quarter of the original.

K-means Quantization Link to this heading

In GAN models, traditional quantization methods like scale symmetric quantization or scale zeroPoint asymmetric quantization may lead to significant model accuracy loss. To address this, we use the K-means algorithm for quantization.

Specifically, for each well-trained weight matrix in the network, we use the K-means algorithm to cluster its values into 256 classes. Then, we record the class centers and the class IDs corresponding to each weight.

This K-means quantization method has been tested to almost completely retain the original network’s generated image quality. The Cosine Similarity with the original FP32 model’s output results is above 0.999.

python
 1# K-means Quantization Code Example (Pseudocode)
 2from sklearn.cluster import KMeans
 3
 4def kmeans_quantize(weight_matrix):
 5    original_shape = weight_matrix.shape
 6    weight_matrix = weight_matrix.reshape(-1, 1)
 7    kmeans = KMeans(n_clusters=256, random_state=0).fit(weight_matrix)
 8    centers = kmeans.cluster_centers_
 9    labels = kmeans.labels_
10    # Replace the original weights with the class centers
11    quantized_matrix = centers[labels].reshape(original_shape)
12    return quantized_matrix, centers, labels

Conclusion Link to this heading

Implementing ImageGAN involved a series of carefully planned steps, from data collection to model training and optimization. The end result is a mobile application that leverages advanced GAN algorithms to create unique and captivating visuals. Stay tuned for future posts where we will discuss the deployment and performance optimization of ImageGAN on mobile devices.