Colorizing Manga using Neural Networks (P1)

Colorizing Manga using Neural Networks (P1)

Being an avid reader of manga, I see digital artists coloring the black and white pages and putting on social platforms. They are cool

Before I could straight away jump right into create models to train, there was the Dataset part that was missing.

Dataset

First comes the data collection part. Neural networks are data hungry i.e they need loads and loads of data. I could not find any ready to use dataset on the net. So had to scrap from the net and download it.

Loads of data mean loads of manga chapters. In this case, only One Piece could save us. It has 1000+ chapters and since it has a huge fan base, artists have colored them and uploaded on sites. Each chapter is colorized by a dedicated team of people but it takes a lot of time.

The collection of data was very time consuming. It had its own set of problems — non-uniform channels, image types and many more. Cleaning of dataset took lots of time — ads, translator and QA of ODA had to be removed from the dataset.

A sample of Original One Piece Page

UNET

The first implementation that I tried was a basic UNET architecture of 4 blocks of downsampling, bridge connection and 4 blocks of upsampling. Experimented with different activation functions, backbones of MobileNet, DenseNet and MobileNetV2, with different epoch times for each network.

Outputs of different models (a,b,c)

The output was not the expected colored one but instead looked like segmentation type. Right one (c ) being the worst output, obtained while playing with the architecture — made the latent space extremely small. The middle (b) was obtained using RELU activation, MobileNet backbone and the first one (a)obtained using SWISH function, Normal convolutions.

GAN (Generative Adversarial Network)

“Generative Adversarial Networks is the most interesting idea in the last 10 years in Machine Learning” — Yann LeCun.

GAN’s are an exciting and rapidly changing field, delivering on the promise of generative models in their ability to generate realistic examples across a range of problem domains.
Most notably in image-to-image translation tasks such as translating photos of summer to winter or day to night, and in generating photorealistic photos of objects, scenes, and people that even humans cannot tell are fake.

The 3 basic parts of GAN’s are –

  1. ) Generator — Using generator to create outputs, using random noise or in our case using kernel initializer of random normal noise instead of the normal glorot uniform.
  2. ) Discriminator — Passing real (target image) and the fake generated output of generator simultaneously by concatenating to discriminator, to check if real or fake. It acts as a critic.
  3. ) Loss function — Creating a loss function based on generator output and discriminator output.

For this task, I decided to use Conditional Gan. Refer to paper. Various experimentation on GAN were done on dataset of 1k images (why? will explain later)—

1.)Activation functions — relu, leaky relu and swish

2.) Normalization techniques — [0,255] and [-1,1]

3.) Final activation layer — softmax and tanh

4.) Generator — No backbone, MobileNetV2 BottleNeck layer

5.) Increasing/ Decreasing Latent Space

The best result was obtained on the following parameters-

  • Using RELU in Generator, normalizing image to [-1,1] and using tanh function in the output layer.
  • Using LeakyRelu and Dropout in Discriminator.
  • Kernel Initializer — random_normal_initializer(0., 0.01)

Loss function

In this architecture, we have two losses — Generator and discriminator loss.

The generator loss is calculated finding the sigmoid cross-entropy loss of the output of the generator and an array of ones. Also, for the output to be structurally similar to the target image, we take L1 loss along with it. Total loss function is calculated based on the paper, using lambda as 100.

The discriminator loss is calculated similarly using the sigmoid cross-entropy. The total loss is the summation of the loss of the fake generated and loss of the real target image.

loss_object = tf.keras.losses.BinaryCrossentropy(from_logits=True)

LAMBDA = 100

def generator_loss(disc_generated_output, gen_output, target):

gan_loss = loss_object(tf.ones_like(disc_generated_output), disc_generated_output)

l1_loss = tf.reduce_mean(tf.abs(target - gen_output))

total_gen_loss = gan_loss + (LAMBDA * l1_loss)

return total_gen_loss, gan_loss, l1_loss

def discriminator_loss(disc_real_output, disc_generated_output):

real_loss = loss_object(tf.ones_like(disc_real_output), disc_real_output)

generated_loss = loss_object(tf.zeros_like(disc_generated_output), disc_generated_output)

total_disc_loss = real_loss + generated_loss

return total_disc_loss

As for optimizer, Adam was used.

Results

Considering the dataset of 1k images, the results are good enough but not outstanding.