GAN Image Colorization with Controlled Parameters

Based off of


Given a grayscale (b&w) image and a set of parameters as the input, automatically generate a colorized image respecting these parameters as the output.

[Zhang, Isola, Efros]

Color Space

For the colorization process, the \(Lab\) color space is used; One of the channels (\(L\)) is the input, and this also prevents sudden jumps in both color and brightness, unlike RGB.



Symmetric with skip connections, the encoder part consisting of \(4 \times 4\) convolution layers with stride \(2\), each followed by batch normalization and LeakyReLU with slope \(0.2\); the decoder part consisting of \(4 \times 4\) transposed convolution layers with stride \(2\), concatenation with the activation map of its encoder mirrored layer, followed by batch normalization and ReLU; the last layer is a \(1 \times 1\) convolution with \(\tanh\) activation function.


Formed by a series of \(4 \times 4\) convolutional layers with stride \(2\), batch normalization, and LeakyReLU with slope \(0.2\). After the last layer, a convolution is applied to map a \(1\) dimensional output, followed by a sigmoid function in order to return a probability value of the input's veracity.

Changes for Better Performance

Cost Functions

The cost functions defined by [Nazeri, Kamyar and Ng, Eric and Ebrahimi, Mehran] are as follows:

\[\underset{\theta_G}{\min} J^{\left(G\right)} \left(\theta_D, \theta_G\right) = \underset{\theta_G}{\min} -\mathbb{E}_z \left[ \log \left( D \left( 0_z \mid x \right) \right) \right] + \lambda \left\lVert G\left( 0_z \mid x \right) - y \right\rVert_1\]

\[\underset{\theta_D}{\max} J^{\left(D\right)} \left(\theta_D, \theta_G\right) = \underset{\theta_D}{\max} \left( \mathbb{E}_y\left[ \log \left(D\left(y \mid x\right)\right) \right] + \mathbb{E}_z\left[ \log \left( 1 - D\left(G\left( 0_z \mid x \right) \mid x \right) \right) \right] \right)\]

In order to introduce controls over parameters, new terms are introduced as follows:

\[\cdots + \lambda_S \left\lvert \overline{G\left( 0_z \mid x \right)_S} - \sigma_S \right\rvert\] \[\cdots + \lambda_H \left\lvert \overline{G\left( 0_z \mid x \right)_H} - \sigma_H \right\rvert\] \[\cdots + \lambda_R \left\lvert \overline{G\left( 0_z \mid x \right)_R} - \sigma_R \right\rvert\] \[\cdots + \lambda_G \left\lvert \overline{G\left( 0_z \mid x \right)_G} - \sigma_G \right\rvert\] \[\cdots + \lambda_B \left\lvert \overline{G\left( 0_z \mid x \right)_B} - \sigma_B \right\rvert\] \[\vdots\]

Environment Setup


Download the code here;

To train the model under the CIFAR-10 dataset, use the following command:

python \
    --seed 100 \
    --dataset cifar10 \
    --dataset-path ./dataset/cifar10 \
    --checkpoints-path ./checkpoints \
    --batch-size 128 \
    --epochs 200 \
    --lr 3e-4 \
    --lr-decay-steps 1e4 \
    --augment True

and add other flags at the end, such as --desired-saturation 1, --saturation-weight 20, --desired-hue 0.6, etc.


Previous Results

From [Nazeri, Kaymar and Ng, Eric and Ebrahimi, Mehran]

Grayscale / Original / U-Net / GAN

\(\sigma_S = 1\)

(70000 steps = 179 epochs)

\(\sigma_S = 0.75\)

(83000 steps = 212 epochs)

\(\sigma_S = 0.5\)

(82000 steps = 209 epochs)

\(\sigma_S = 0.25\)

(85000 steps = 217 epochs)

\(\sigma_S = 0\)

(81000 steps = 207 epochs)

\(\sigma_H = 0\)

(78000 steps = 199 epochs)

\(\sigma_H = 0.25\)

(78000 steps = 199 epochs)

\(\sigma_H = 0.6\)

(78000 steps = 199 epochs)

\(\sigma_H = 0.85\)

(78000 steps = 199 epochs)

\(\sigma_S = 0.75, \sigma_H = 0.6\)

(73000 steps = 186 epochs)

Further Work