Deep Neural Network for Multiclass Classification Using Keras.

by Pritish Jadhav, Mrunal Jadhav - Wed, 19 Jun 2019

Tags: #Python #Keras #image classification #multiclass #deep learning

CIFAR-10 Image Classification using Keras¶

With the increasing adoption of Deep Neural Nets for various machine learning tasks, acquaintance with different frameworks and tools for modeling complex machine learning problems is a must.
In this series, we shall focus on building various architectures of DNN using different frameworks (TensorFlow /Keras, PyTorch, etc).
The idea behind these articles is to familiarize with the syntax and good practices for building DNN models. Let's kick off the series with a Convolutional Neural Network model for classifying images using Keras.

About the Dataset -¶

To limit the training times, we shall be working with the Cifar-10 dataset which consists of images across 10 categories.
These categories are birds, airplanes, cars, cats, deer, dogs, frogs, horses, ships, and trucks.
Each category consists of 6000 images.
For more details about the dataset, check out the official website.

Let's start by importing the libraries needed for training the model.

In [63]:

import IPython
import os
import glob
from operator import itemgetter
import scipy.io 

import pandas as pd
import numpy as np

import keras.backend as K
from keras.layers import (Dense, Conv2D, Activation, Dropout, 
                Input, MaxPooling2D, Flatten, BatchNormalization, LeakyReLU)
from keras.models import Model
from keras import optimizers
from keras.utils import plot_model
from keras.utils.vis_utils import model_to_dot
from keras import callbacks
from keras import regularizers
from keras.datasets import mnist

from sklearn.utils import shuffle
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt
%matplotlib inline

from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML("<style>.container { width:100% !important; }</style>"))

Before we jump into the implementation bits, it would be helpful to chalk out the roadmap for successfully building and training a Deep Neural Network.

1. Loading the data-

First we shall implement the code for successfully reading the cifar-10 dataset.

2. Visualizing data-

Visualizing different aspects of our data often helps us to understand the problem statement and underlying dataset better.
This visualization step can be as detailed as one would like it to be.
This includes performing Exploratory analysis, Outlier Analysis, Image Visualization, Class distribution, etc.
Data Visualization and pre-processing steps are often under-rated but the analysis directly influences the choice of certain hyperparameters in your ML model.
For instance, consider a dataset with severe class imbalance, say a 99:1 ratio. That is, for every 99 positive data points, we have 1 negative data point. Choosing accuracy as our evaluation metric will result in incorrect results. In such cases, a quick visualization of class distribution will help us choose better evaluation metrics. Perhaps, something like recall, F1 score, beta F1-score.

3. Preparing Data for Training, Validation and Prediction -

Often, loading entire data set in memory is not efficient.
Different Deep Learning frameworks offer various wrappers for loading datasets using generators.
To gain better insights into how data batches should be generated, we shall implement a custom data generator.

4. Defining / Building the Model -

In this section, we shall build the actual model.
The focus of this section would be to familiarize ourselves with Keras syntax for adding complex layers.
In this section, we shall also highlight good practices that can help our model to learn faster.

5. Visual the Model -

Before we train the model, it often helps to visualize the neural network that has been implemented.
This step also helps in debugging the issues that may have been embedded in our model unknowingly resulting in code errors.

6. Training the model -

Once a model has been defined and data ready to be flown through it, the next logical step is to train the model.
We shall train a model using training and validation data.

7. Predicting -

DNN are known to overfit easily.
It is often helpful to test the model for out of bag data points. This will help us understand how well the model generalizes on unseen data.

8. Trouble Shooting / FAQs

In this last section, we shall focus on a few issues that one may encounter while training DNNs.
I will keep updating this list to ensure that all the possible pitfalls are documented.

Lets kick off implementation by writing some basic functions for loading the dataset

In [64]:

### reading Labels file 

def unpickle(file_name):
    '''
    Unpickles a pickled file. 
    
    Args:
        file_name : absolute path of the file that needs to be unpickled.
    Returns:
        dict_val: python dictionary containing data and labels. eg - {'data': np.array, labels: np.array}
    '''
    import cPickle
    with open(file_name, 'rb') as fo:
        dict_val = cPickle.load(fo)
    return dict_val

In [65]:

## read data 

def read_cifar_data(parent_directory_wildcard):
    '''
    The cifar-10 dataset is available in part files and hence this fucntion will unpickle and concatenate
    the content of all the files in a directory
    
    Args:
        parent_directory_wildcard: folder_path where all the training data is present
    Return:
        raw_X: dataframe containing all the data available for training.
        raw_labels: dataframe containing corresponding labels for the training data. 
        
    '''
    
    raw_X = []
    raw_labels = []
    for filename in glob.glob(parent_directory_wildcard):
        dict_val = unpickle(filename)
        raw_X.extend(dict_val['data'])
        raw_labels.extend(dict_val['labels'])
    
    return raw_X, raw_labels

Now thet we have our helper functions ready, lets load the entire dataset in pandas dataframe

In [66]:

raw_X, raw_labels = read_cifar_data('./data/cifar10/cifar-10-python/cifar-10-batches-py/data_batch*')
cifar_raw_data = pd.DataFrame(zip(raw_X, raw_labels), columns = ['np_images', 'labels'])

print(cifar_raw_data.head())

                                           np_images  labels
0  [255, 252, 253, 250, 238, 233, 245, 241, 232, ...       1
1  [127, 126, 127, 127, 128, 128, 128, 128, 129, ...       8
2  [116, 64, 19, 29, 36, 40, 57, 143, 173, 83, 39...       5
3  [205, 213, 235, 232, 112, 98, 95, 80, 98, 224,...       1
4  [189, 184, 181, 186, 191, 177, 186, 167, 147, ...       5

Label Binarizer -¶

It can be seen that for each training image, the labels are an integer.
Before we train our classifier, we need to transform these categorical labels into one-hot encoded vectors.
We shall achieve this using Sklearn's Label Binarizer.
For more details on building intuition behind LabelBinarizer, check out toy examples on sklearn documentation page.

In [67]:

label_binarizer = LabelBinarizer()
label_binarizer.fit(cifar_raw_data['labels'].unique())

Out[67]:

LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)

Now that we have trained our LabelBinarizer, lets quickly check out the unique classes encoded by the Binarizer. This is just a sanity check and can be skipped.

In [68]:

print("Successfully trained a Binarizer with %d unique classes"%len(label_binarizer.classes_))

Successfully trained a Binarizer with 10 unique classes

Awesome !!

We are now ready to move on to the step 2 of our blueprint - Data Visualization

In [69]:

def get_shuffled_data(grouped_data, n = 5):
    shuffled_data = shuffle(grouped_data).head(n)
    return shuffled_data

In [70]:

sampled_data = cifar_raw_data.groupby('labels', as_index = False).apply(get_shuffled_data, 3).reset_index(drop = True)

In [71]:

def generate_visualizations(vis_df):
    ## fix the height and width of the image to be displayed
    height, width =20, 5

    # decide the number of images to be displayed from each category
    columns = 3


    # Find the number of unique categories in the training dataset
    rows = vis_df['labels'].nunique()
    fig, axes = plt.subplots(nrows=rows, ncols=columns, figsize=(width,height), sharex = True, sharey = True);

    labels = vis_df['labels'].unique()
    for index in range((columns*rows)):
        ax = fig.add_subplot(rows,columns,index+1)
        ax.axis('off')
        ax.imshow(vis_df['np_images'].iloc[index].reshape(3, 32, 32).transpose(1,2,0)/255., interpolation='nearest')


    for ax, row in zip(axes[:, 0], labels):
        ax.set_ylabel(row)
        ax.set_yticks([])

    for ax, row in zip(axes[0, :], labels):
        ax.set_xticks([])

In [72]:

generate_visualizations(sampled_data)

Some Observations -¶

It can be seen from above visualizations that the image quality is below par.
By design, the categories are mutually exclusive however, things can get tricky for datasets with poor quality images and categories that are visually similar.

On that note, let's move on to Step 3 - Preparing data for Training and Validation

Since there is no class imbalance, let's keep the logic for splitting that dataset into training, test and validation fairly simple.
We shall piggyback on sklearn's train_test_split function for accomplishing the task.
It is important to note this particular function doesn't split data 3 ways and hence we shall be making a function call twice to further split the training data for model validation.

For more information on sklearn's train_test_split function checkout their official documentation.

In [74]:

#### split raw_data into train, test and validation sets

complete_train_data, test_data = train_test_split(cifar_raw_data, test_size = 0.05)
train_data, validation_data = train_test_split(complete_train_data, test_size = 0.08)

train_data = train_data.reset_index(drop = True)
validation_data = validation_data.reset_index(drop = True)
test_data = test_data.reset_index(drop = True)
print("Training data is of size %d"%len(train_data))
print("Validation data is of size %d"%len(validation_data))
print("Test data is of size %d"%len(test_data))

Training data is of size 43700
Validation data is of size 3800
Test data is of size 2500

Now that we have our training, validation and test data set cut out, we would like our training process to be memory efficient. To implement mini-batch training, we shall leverage the concept of generators in python. To learn more about the difference between a generator and iterator in python, check out this blog post.

For generating mini-batches of our data, we shall implement a generator function involving the following steps -

Args: Pass original_dataframe, trained Label Binarizer model and batch_size as input arguments. We can easily make the function more scalable by loading the Label Binarizer model from the pickle file. However, I would like to keep it as simple as possible for the sake of this article.

Initialize a counter variable.
While True:
--> Batch indices based on batch_size
--> Slice dataset.
--> yield data

In [99]:

def prep_training_generators(train_df, label_binarizer_model, batch_size):
    
    indices = range(len(train_df))
    indx_iterator = 0 
    while True:
        if (indx_iterator +1) * batch_size > len(indices):
            indx_iterator = 0 
            np.random.shuffle(indices)
        
        batch_indices = indices[indx_iterator* batch_size: (indx_iterator+1)*batch_size]

        batch_x = np.stack(train_df.loc[batch_indices, 'np_images'].values, axis = 0)
        reshaped_batch_x = (batch_x.reshape(batch_size, 3, 32, 32).transpose(0, 2, 3, 1))/255.
        
        raw_y = train_df.loc[batch_indices, 'labels'].values
        batch_y = label_binarizer_model.transform(raw_y)
        indx_iterator += 1
        yield reshaped_batch_x, batch_y

We shall combine steps 4 and 5 in the next section -¶

In this article, we will be building a model using Keras's Functional API.
I prefer the Functional API since it allows us to define and build more flexible models with Keras.
For more information regarding the difference and advantages of Functional API over Sequential API, refer to this blog post.

Each layer of our neural network shall consist of following elements -

Convolution2D layer.
BatchNormalization.
Activation Layer.
MaxPooling2D
Dropout Layer.

The output layer shall consist of neurons equivalent to the number of unique classes.
Note that, since this is a multiclass classification, categorical cross-entropy will be our choice of loss function. For Binary classification, one can use binary_crossentropy.
Also, note the activation function in the output layer. Since we would like a probability vector at the output layers, we shall be using softmax function.

With this basic building blocks in mind, feel free to experiment with the network architecture.

In [119]:

def keras_image_functional_model(image_shape, n_classes):
    """
    More info on losses - https://keras.io/losses/
    """
    weight_decay = 1e-4
    img_input = Input(shape = image_shape)
    
    img_emd = Conv2D(32, (3, 3), padding = 'same', strides=1, kernel_regularizer=regularizers.l2(weight_decay))(img_input)
    img_emd = BatchNormalization()(img_emd)
    img_emd = LeakyReLU(alpha = 0.2)(img_emd)
    img_emd = Dropout(0.2)(img_emd)
    
    img_emd = Conv2D(32, (3, 3), padding = 'same', kernel_regularizer=regularizers.l2(weight_decay))(img_emd)
    img_emd = BatchNormalization()(img_emd)
    img_emd = LeakyReLU(alpha = 0.2)(img_emd)
    img_emd = MaxPooling2D()(img_emd)
    img_emd = Dropout(0.2)(img_emd)
    
    img_emd = Conv2D(64, (3, 3), padding = 'same', kernel_regularizer=regularizers.l2(weight_decay))(img_emd)
    img_emd = BatchNormalization()(img_emd)
    img_emd = LeakyReLU(alpha = 0.2)(img_emd)
    img_emd = MaxPooling2D()(img_emd)
    img_emd = Dropout(0.2)(img_emd)
    
    img_emd = Flatten()(img_emd)
    img_emd = Dense(512, kernel_regularizer=regularizers.l2(weight_decay))(img_emd)
    img_emd = BatchNormalization()(img_emd)
    img_emd = Activation('relu')(img_emd)
    
    img_emd = Dense(128, kernel_regularizer=regularizers.l2(weight_decay))(img_emd)
    img_emd = BatchNormalization()(img_emd)
    img_emd = Activation('relu')(img_emd)
    
    out_logits = Dense(n_classes, activation = 'softmax')(img_emd)
    
    model = Model(inputs = img_input, outputs = out_logits, name = "image_model")
    
    model.summary(line_length=200)
    model.compile(optimizer= 'adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model

In [120]:

def plot_keras_model(model_name, show_shapes_bool = True):
    return IPython.display.SVG(model_to_dot(model_name, show_shapes= show_shapes_bool).create(prog='dot', format='svg'))

In [121]:

BATCH_SIZE = 32
tboard = callbacks.TensorBoard(log_dir='./logs', histogram_freq=0, batch_size=32, write_graph=True, write_grads=False,
                                       write_images=False, embeddings_freq=0)

train_data_gen = prep_training_generators(train_data, label_binarizer, BATCH_SIZE)
validation_data_gen = prep_training_generators(validation_data, label_binarizer, BATCH_SIZE)

image_only_model = keras_image_functional_model(image_shape = (32, 32, 3) , n_classes=10)

________________________________________________________________________________________________________________________________________________________________________________________________________
Layer (type)                                                                              Output Shape                                                                    Param #                       
========================================================================================================================================================================================================
input_19 (InputLayer)                                                                     (None, 32, 32, 3)                                                               0                             
________________________________________________________________________________________________________________________________________________________________________________________________________
conv2d_74 (Conv2D)                                                                        (None, 32, 32, 32)                                                              896                           
________________________________________________________________________________________________________________________________________________________________________________________________________
batch_normalization_88 (BatchNormalization)                                               (None, 32, 32, 32)                                                              128                           
________________________________________________________________________________________________________________________________________________________________________________________________________
leaky_re_lu_51 (LeakyReLU)                                                                (None, 32, 32, 32)                                                              0                             
________________________________________________________________________________________________________________________________________________________________________________________________________
dropout_74 (Dropout)                                                                      (None, 32, 32, 32)                                                              0                             
________________________________________________________________________________________________________________________________________________________________________________________________________
conv2d_75 (Conv2D)                                                                        (None, 32, 32, 32)                                                              9248                          
________________________________________________________________________________________________________________________________________________________________________________________________________
batch_normalization_89 (BatchNormalization)                                               (None, 32, 32, 32)                                                              128                           
________________________________________________________________________________________________________________________________________________________________________________________________________
leaky_re_lu_52 (LeakyReLU)                                                                (None, 32, 32, 32)                                                              0                             
________________________________________________________________________________________________________________________________________________________________________________________________________
max_pooling2d_38 (MaxPooling2D)                                                           (None, 16, 16, 32)                                                              0                             
________________________________________________________________________________________________________________________________________________________________________________________________________
dropout_75 (Dropout)                                                                      (None, 16, 16, 32)                                                              0                             
________________________________________________________________________________________________________________________________________________________________________________________________________
conv2d_76 (Conv2D)                                                                        (None, 16, 16, 64)                                                              18496                         
________________________________________________________________________________________________________________________________________________________________________________________________________
batch_normalization_90 (BatchNormalization)                                               (None, 16, 16, 64)                                                              256                           
________________________________________________________________________________________________________________________________________________________________________________________________________
leaky_re_lu_53 (LeakyReLU)                                                                (None, 16, 16, 64)                                                              0                             
________________________________________________________________________________________________________________________________________________________________________________________________________
max_pooling2d_39 (MaxPooling2D)                                                           (None, 8, 8, 64)                                                                0                             
________________________________________________________________________________________________________________________________________________________________________________________________________
dropout_76 (Dropout)                                                                      (None, 8, 8, 64)                                                                0                             
________________________________________________________________________________________________________________________________________________________________________________________________________
flatten_19 (Flatten)                                                                      (None, 4096)                                                                    0                             
________________________________________________________________________________________________________________________________________________________________________________________________________
dense_37 (Dense)                                                                          (None, 512)                                                                     2097664                       
________________________________________________________________________________________________________________________________________________________________________________________________________
batch_normalization_91 (BatchNormalization)                                               (None, 512)                                                                     2048                          
________________________________________________________________________________________________________________________________________________________________________________________________________
activation_38 (Activation)                                                                (None, 512)                                                                     0                             
________________________________________________________________________________________________________________________________________________________________________________________________________
dense_38 (Dense)                                                                          (None, 128)                                                                     65664                         
________________________________________________________________________________________________________________________________________________________________________________________________________
batch_normalization_92 (BatchNormalization)                                               (None, 128)                                                                     512                           
________________________________________________________________________________________________________________________________________________________________________________________________________
activation_39 (Activation)                                                                (None, 128)                                                                     0                             
________________________________________________________________________________________________________________________________________________________________________________________________________
dense_39 (Dense)                                                                          (None, 10)                                                                      1290                          
========================================================================================================================================================================================================
Total params: 2,196,330
Trainable params: 2,194,794
Non-trainable params: 1,536
________________________________________________________________________________________________________________________________________________________________________________________________________

In [122]:

# plot_keras_model(image_only_model, to_file='keras_cnn_cifar_model.png')
plot_model(image_only_model, to_file='keras_cnn_cifar_model.png', show_shapes=True, show_layer_names=True)

Step 6 - Lets train the model without much a do !!¶

a. Pay special attention to the parameters steps_per_epoch and validation_steps.
b. steps_per_epoch signifies the number of batches that should be drawn from the generator before shuffling the data and starting a new iteration of training.
c. There are two different parameters to control the steps per epoch to account for different data sizes of training and validation sets.

In [123]:

history = image_only_model.fit_generator(train_data_gen,
                    steps_per_epoch=np.ceil(len(train_data) / BATCH_SIZE),
                    epochs=15,
                    validation_data = validation_data_gen, 
                    validation_steps = np.ceil(len(validation_data) / BATCH_SIZE),                                         
                    verbose=1, callbacks=[tboard])

Epoch 1/15
1365/1365 [==============================] - 329s 241ms/step - loss: 1.4505 - acc: 0.5424 - val_loss: 1.3004 - val_acc: 0.6025
Epoch 2/15
1365/1365 [==============================] - 317s 232ms/step - loss: 1.1478 - acc: 0.6706 - val_loss: 1.1542 - val_acc: 0.6806
Epoch 3/15
1365/1365 [==============================] - 327s 240ms/step - loss: 1.0745 - acc: 0.7112 - val_loss: 1.0869 - val_acc: 0.7198
Epoch 4/15
1365/1365 [==============================] - 298s 218ms/step - loss: 1.0374 - acc: 0.7400 - val_loss: 1.0513 - val_acc: 0.7431
Epoch 5/15
1365/1365 [==============================] - 301s 220ms/step - loss: 1.0069 - acc: 0.7630 - val_loss: 1.1248 - val_acc: 0.7256
Epoch 6/15
1365/1365 [==============================] - 302s 221ms/step - loss: 0.9877 - acc: 0.7821 - val_loss: 1.2158 - val_acc: 0.7209
Epoch 7/15
1365/1365 [==============================] - 312s 228ms/step - loss: 0.9749 - acc: 0.7941 - val_loss: 1.3047 - val_acc: 0.6894
Epoch 8/15
1365/1365 [==============================] - 313s 229ms/step - loss: 0.9510 - acc: 0.8091 - val_loss: 1.0938 - val_acc: 0.7701
Epoch 9/15
1365/1365 [==============================] - 294s 216ms/step - loss: 0.9410 - acc: 0.8221 - val_loss: 1.2910 - val_acc: 0.7068
Epoch 10/15
1365/1365 [==============================] - 314s 230ms/step - loss: 0.9319 - acc: 0.8294 - val_loss: 1.1774 - val_acc: 0.7656
Epoch 11/15
1365/1365 [==============================] - 291s 213ms/step - loss: 0.9203 - acc: 0.8389 - val_loss: 1.1822 - val_acc: 0.7635
Epoch 12/15
1365/1365 [==============================] - 300s 220ms/step - loss: 0.9125 - acc: 0.8470 - val_loss: 1.2179 - val_acc: 0.7624
Epoch 13/15
1365/1365 [==============================] - 334s 245ms/step - loss: 0.8985 - acc: 0.8563 - val_loss: 1.1793 - val_acc: 0.7786
Epoch 14/15
1365/1365 [==============================] - 303s 222ms/step - loss: 0.8975 - acc: 0.8614 - val_loss: 1.2055 - val_acc: 0.7669
Epoch 15/15
1365/1365 [==============================] - 299s 219ms/step - loss: 0.8908 - acc: 0.8644 - val_loss: 1.3411 - val_acc: 0.7331

In [124]:

## Prediction 
def prep_test_generators(train_df, batch_size):
    
    indices = range(len(train_df))
    indx_iterator = 0 
    while True:
        if (indx_iterator +1) * batch_size > len(indices):
            indx_iterator = 0
        
        batch_indices = indices[indx_iterator* batch_size: (indx_iterator+1)*batch_size]

        batch_x = np.stack(train_df.loc[batch_indices, 'np_images'].values, axis = 0)
        reshaped_batch_x = batch_x.reshape(batch_size, 3, 32, 32).transpose(0, 2, 3, 1)/255.
        
        indx_iterator += 1
        yield reshaped_batch_x

In [125]:

test_pred_gen = prep_test_generators(test_data, batch_size=BATCH_SIZE)

In [126]:

test_prediction_logits = image_only_model.predict_generator(test_pred_gen, 
                                                            steps=np.ceil(len(test_data)/ BATCH_SIZE))

In [127]:

predicted_values = np.argmax(test_prediction_logits, axis = -1)

In [128]:

test_accuracy = np.float(np.sum(np.equal(predicted_values, test_data['labels'][:len(predicted_values)].values)))/len(test_data)
print("Test Accuracy is %f"%(test_accuracy*100))

Test Accuracy is 74.520000

End Comments -¶

Evidently, DNNs is doing much better than random predictions.
A comparable accuracy on train and test data reveals that the model is capable of generalizing and is not overfitting.
Finally, the purpose of this article is to highlight the process of building a DNN using Keras. Feel free to experiment with different architectures and check how accuracy improves.
Note that, bigger the model, larger will be the training times.
I will soon try and upload an article in which we shall leverage pre-trained models for faster train times and better accuracy.

Trouble Shooting -¶

Why is my loss is NaN/ Inf?

Be very careful while choosing activation function, especially ReLu. It is important to note that, ReLu activations are often unbounded and may result in an exploding gradient problem.
Use batch Normalization to minimize the probability of encountering the problem of exploding gradients.
Gradient Clipping strategy to counter the issue of Exploding Gradient is also widely adopted.
If you are using custom loss function, ensure that a 0/0 situation won't arise which may induce nan/inf in computations.
Check out the strategies suggested for a similar issue on stackoverflow.
If the problem persists, reach out to the community and seek help.

Why is my accuracy not changing?

Such a problem can be encountered due to multiple reasons. Start by checking the class distribution. If there is a severe class imbalance, a model may resort to predicting all zeros/ all ones resulting in stagnant accuracy.
Ensure that the generator used for generating mini-batches is working as expected.
Instead of building an ambitious and complex model, start by building a minimalistic version and add layers based on performance achieved.