MSDS458 Research Assignment 1:¶

  • In this notebook, we will build a DNN model for classifying MNIST digits. The DNN model will consist of 784 input nodes, a hidden layer with 128 nodes and 10 output nodes (corresponding to the 10 digits).
  • We use mnist.load_data() to get the 70,000 images divided into a set of 60,000 training images and 10,000 test images. We hold back 5,000 of the 60,000 training images for validation.
  • After training and evaluating our DNN model we analyze its performance. In particular, we use confusion matrices to compare the predicted classes with the class labels to try to determine why some images were misclassified by the model.
  • We then obtain the 60,000 activation values of one of the hidden nodes for the (original) set of training data. We want to use these activation values as "proxies" for the predicted classes of the 60,000 images.
  • And just like we compared the predicted classes with the class labels using confusion matrices to determine the efficacy of the model, we use box plots to visualize the relationship between the activation values of one hidden node and the class labels. We don't expect these activation values to have much "predictive power". In fact, the same activation values can be associated with multiple class labels resulting in a lot of overlap in the box plots.
  • We also perform similar experiments comparing the values at two pixel locations in the images with the class labels. This time we use scatter plots to visualize the relationship between the pair of pixel values with the class labels (represented by different colored dots).
  • Pixel values at two locations in image should not have much predictive value. To improve on this approach, we the PCA decomposition on both the raw data of 784 pixel values and 128 hidden node activation values to reduce the number of features to 2 in each case. Once again, we use a scatter plot to visualize the correlation between the two principal component values and the class labels.
  • Finally, we use a Random Forest Classifier to find the relative importance of the 784 features (pixels) in the training set. We then select the 70 most important feature (pixels) from the training, validation and test images to test our 'best' model on.

Importing Packages¶

  • First we import all the packages that will be used in the assignment.

  • Since Keras is integrated in TensorFlow 2.x, we import keras from tensorflow and use tenserflow.keras.xxx to import all other Keras packages. The seed argument produces a deterministic sequence of tensors across multiple calls.

In [79]:
import datetime
from packaging import version
from collections import Counter
import numpy as np
import pandas as pd

import matplotlib as mpl  # EA
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import confusion_matrix, classification_report
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import mean_squared_error as MSE
from sklearn.metrics import accuracy_score

import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow import keras
from tensorflow.keras import models
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.datasets import mnist
In [3]:
%matplotlib inline
np.set_printoptions(precision=3, suppress=True) 

Verify TensorFlow version and Keras version¶

In [4]:
print("This notebook requires TensorFlow 2.0 or above")
print("TensorFlow version: ", tf.__version__)
assert version.parse(tf.__version__).release[0] >=2
This notebook requires TensorFlow 2.0 or above
TensorFlow version:  2.10.0
In [5]:
print("Keras version: ", keras.__version__)
Keras version:  2.10.0

Mount Google Drive to Colab environment¶

In [6]:
#from google.colab import drive
#drive.mount('/content/gdrive')

Research Assignment Reporting Functions¶

In [7]:
def print_validation_report(test_labels, predictions):
    print("Classification Report")
    print(classification_report(test_labels, predictions))
    print('Accuracy Score: {}'.format(accuracy_score(test_labels, predictions)))
    print('Root Mean Square Error: {}'.format(np.sqrt(MSE(test_labels, predictions)))) 
    
def plot_confusion_matrix(y_true, y_pred):
    mtx = confusion_matrix(y_true, y_pred)
    fig, ax = plt.subplots(figsize=(8,8))
    sns.heatmap(mtx, annot=True, fmt='d', linewidths=.75,  cbar=False, ax=ax,cmap='Blues',linecolor='white')
    #  square=True,
    plt.ylabel('true label')
    plt.xlabel('predicted label')

def plot_history(history):
  losses = history.history['loss']
  accs = history.history['accuracy']
  val_losses = history.history['val_loss']
  val_accs = history.history['val_accuracy']
  epochs = len(losses)

  plt.figure(figsize=(16, 4))
  for i, metrics in enumerate(zip([losses, accs], [val_losses, val_accs], ['Loss', 'Accuracy'])):
    plt.subplot(1, 2, i + 1)
    plt.plot(range(epochs), metrics[0], label='Training {}'.format(metrics[2]))
    plt.plot(range(epochs), metrics[1], label='Validation {}'.format(metrics[2]))
    plt.legend()
  plt.show()

def plot_digits(instances, pos, images_per_row=5, **options):
    size = 28
    images_per_row = min(len(instances), images_per_row)
    images = [instance.reshape(size,size) for instance in instances]
    n_rows = (len(instances) - 1) // images_per_row + 1
    row_images = []
    n_empty = n_rows * images_per_row - len(instances)
    images.append(np.zeros((size, size * n_empty)))
    for row in range(n_rows):
        rimages = images[row * images_per_row : (row + 1) * images_per_row]
        row_images.append(np.concatenate(rimages, axis=1))
    image = np.concatenate(row_images, axis=0)
    pos.imshow(image, cmap = 'binary', **options)
    pos.axis("off")

def plot_digit(data):
    image = data.reshape(28, 28)
    plt.imshow(image, cmap = 'hot',
               interpolation="nearest")
    plt.axis("off")

Loading MNIST Dataset¶

  • The MNIST dataset of handwritten digits has a training set of 60,000 images, and a test set of 10,000 images. It comes prepackaged as part of tf.Keras. Use the tf.keras.datasets.mnist.load_data to the get these datasets (and the corresponding labels) as Numpy arrays.
In [8]:
(x_train, y_train), (x_test, y_test)= tf.keras.datasets.mnist.load_data()
  • Tuples of Numpy arrays: (x_train, y_train), (x_test, y_test)
  • x_train, x_test: uint8 arrays of grayscale image data with shapes (num_samples, 28, 28).
  • y_train, y_test: uint8 arrays of digit labels (integers in range 0-9)

EDA Training and Test Sets¶

  • Inspect the training and test sets as well as their labels as follows.
In [9]:
print('x_train:\t{}'.format(x_train.shape))
print('y_train:\t{}'.format(y_train.shape))
print('x_test:\t\t{}'.format(x_test.shape))
print('y_test:\t\t{}'.format(y_test.shape))
x_train:	(60000, 28, 28)
y_train:	(60000,)
x_test:		(10000, 28, 28)
y_test:		(10000,)

Review labels for training set¶

In [10]:
print("First ten labels training dataset:\n {}\n".format(y_train[0:10]))
First ten labels training dataset:
 [5 0 4 1 9 2 1 3 1 4]

Find frequency of each label in training and test sets¶

In [11]:
items = [{'Class': x, 'Count': y} for x, y in Counter(y_train).items()]
distribution = pd.DataFrame(items).sort_values(['Class'])
sns.barplot(x=distribution.Class, y=distribution.Count);
In [12]:
Counter(y_train).most_common()
Out[12]:
[(1, 6742),
 (7, 6265),
 (3, 6131),
 (2, 5958),
 (9, 5949),
 (0, 5923),
 (6, 5918),
 (8, 5851),
 (4, 5842),
 (5, 5421)]
In [13]:
Counter(y_test).most_common()
Out[13]:
[(1, 1135),
 (2, 1032),
 (7, 1028),
 (3, 1010),
 (9, 1009),
 (4, 982),
 (0, 980),
 (8, 974),
 (6, 958),
 (5, 892)]

Plot sample images with their labels¶

In [14]:
fig = plt.figure(figsize = (15, 9))

for i in range(50):
    plt.subplot(5, 10, 1+i)
    plt.title(y_train[i])
    plt.xticks([])
    plt.yticks([])
    plt.imshow(x_train[i].reshape(28,28), cmap='binary')

Preprocessing Data¶

  • Before we build our model, we need to prepare the data into the shape the network expected
  • More specifically, we will convert the labels (integers 0 to 9) to 1D numpy arrays of shape (10,) with elements 0s and 1s.
  • We also reshape the images from 2D arrays of shape (28,28) to 1D float32 arrays of shape (784,) and then rescale their elements to values between 0 and 1.

Apply one-hot encoding on the labels¶

We will change the way the labels are represented from numbers (0 to 9) to vectors (1D arrays) of shape (10, ) with all the elements set to 0 except the one which the label belongs to - which will be set to 1. For example:

original label one-hot encoded label
5 [0 0 0 0 0 1 0 0 0 0]
7 [0 0 0 0 0 0 0 1 0 0]
1 [0 1 0 0 0 0 0 0 0 0]
In [15]:
y_train_encoded = to_categorical(y_train)
y_test_encoded = to_categorical(y_test)

print("First ten entries of y_train:\n {}\n".format(y_train[0:10]))
print("First ten rows of one-hot y_train:\n {}".format(y_train_encoded[0:10,]))
First ten entries of y_train:
 [5 0 4 1 9 2 1 3 1 4]

First ten rows of one-hot y_train:
 [[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]]
In [16]:
print('y_train_encoded shape: ', y_train_encoded.shape)
print('y_test_encoded shape: ', y_test_encoded.shape)
y_train_encoded shape:  (60000, 10)
y_test_encoded shape:  (10000, 10)

Reshape the images to 1D arrays¶

Reshape the images from shape (28, 28) 2D arrays to shape (784, ) vectors (1D arrays).

In [17]:
# Before reshape:
print('x_train:\t{}'.format(x_train.shape))
print('x_test:\t\t{}'.format(x_test.shape))
x_train:	(60000, 28, 28)
x_test:		(10000, 28, 28)
In [18]:
np.set_printoptions(linewidth=np.inf)
print("{}".format(x_train[2020]))
[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0 167 208  19   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0  13 235 254  99   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0  74 254 234   4   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0 154 254 145   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0 224 254  92   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0  51 245 211  13   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   2 169 254 101   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0  27 254 254  88   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0  72 255 241  15   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0  88 254 153   0   0  33  53 155 156 102  15   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0 130 254  31   0 128 235 254 254 254 254 186  10   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0 190 254  51 178 254 246 213 111 109 186 254 145   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0 192 254 229 254 216  90   0   0   0  57 254 234   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0 235 254 254 247  85   0   0   0   0  32 254 234   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0 235 254 254 118   0   0   0   0   0 107 254 201   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0 235 255 254 102  12   0   0   0   8 188 248 119   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0 207 254 254 238 107   0   0  39 175 254 148   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0  84 254 248  74  11  32 115 238 254 176  11   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0  21 214 254 254 254 254 254 254 132   6   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0  14  96 176 254 254 214  48  12   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]]
In [19]:
# Reshape the images:
x_train_reshaped = np.reshape(x_train, (60000, 784))
x_test_reshaped = np.reshape(x_test, (10000, 784))

# After reshape:
print('x_train_reshaped shape: ', x_train_reshaped.shape)
print('x_test_reshaped shape: ', x_test_reshaped.shape)
x_train_reshaped shape:  (60000, 784)
x_test_reshaped shape:  (10000, 784)
  1. Each element in an image is a pixel value
  2. Pixel values range from 0 to 255
  3. 0 = White
  4. 255 = Black

Review unique values with set from 1st image¶

In [20]:
print(set(x_train_reshaped[0]))
{0, 1, 2, 3, 9, 11, 14, 16, 18, 23, 24, 25, 26, 27, 30, 35, 36, 39, 43, 45, 46, 49, 55, 56, 64, 66, 70, 78, 80, 81, 82, 90, 93, 94, 107, 108, 114, 119, 126, 127, 130, 132, 133, 135, 136, 139, 148, 150, 154, 156, 160, 166, 170, 171, 172, 175, 182, 183, 186, 187, 190, 195, 198, 201, 205, 207, 212, 213, 219, 221, 225, 226, 229, 238, 240, 241, 242, 244, 247, 249, 250, 251, 252, 253, 255}

Rescale the elements of the reshaped images¶

Rescale the elements between [0 and 1]

In [21]:
x_train_norm = x_train_reshaped.astype('float32') / 255
x_test_norm = x_test_reshaped.astype('float32') / 255
In [22]:
# Take a look at the first reshaped and normalized training image:
print(set(x_train_norm[0]))
{0.0, 0.011764706, 0.53333336, 0.07058824, 0.49411765, 0.6862745, 0.101960786, 0.6509804, 1.0, 0.96862745, 0.49803922, 0.11764706, 0.14117648, 0.36862746, 0.6039216, 0.6666667, 0.043137256, 0.05490196, 0.03529412, 0.85882354, 0.7764706, 0.7137255, 0.94509804, 0.3137255, 0.6117647, 0.41960785, 0.25882354, 0.32156864, 0.21960784, 0.8039216, 0.8666667, 0.8980392, 0.7882353, 0.52156866, 0.18039216, 0.30588236, 0.44705883, 0.3529412, 0.15294118, 0.6745098, 0.88235295, 0.99215686, 0.9490196, 0.7647059, 0.2509804, 0.19215687, 0.93333334, 0.9843137, 0.74509805, 0.7294118, 0.5882353, 0.50980395, 0.8862745, 0.105882354, 0.09019608, 0.16862746, 0.13725491, 0.21568628, 0.46666667, 0.3647059, 0.27450982, 0.8352941, 0.7176471, 0.5803922, 0.8117647, 0.9764706, 0.98039216, 0.73333335, 0.42352942, 0.003921569, 0.54509807, 0.67058825, 0.5294118, 0.007843138, 0.31764707, 0.0627451, 0.09411765, 0.627451, 0.9411765, 0.9882353, 0.95686275, 0.83137256, 0.5176471, 0.09803922, 0.1764706}

Create a dataframe with the pixel values and class labels¶

In [66]:
#Get the dataframe of all the pixel values
pixel_data = {'actual_class':y_train}
for k in range(0,784): 
    pixel_data[f"pix_val_{k}"] = x_train_norm[:,k]
pixel_df = pd.DataFrame(pixel_data)
pixel_df.head(15).round(3).T
Out[66]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
actual_class 5.0 0.0 4.0 1.0 9.0 2.0 1.0 3.0 1.0 4.0 3.0 5.0 3.0 6.0 1.0
pix_val_0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pix_val_1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pix_val_2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pix_val_3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
pix_val_779 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pix_val_780 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pix_val_781 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pix_val_782 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pix_val_783 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

785 rows × 15 columns

PCA Feature Reduction / Model Optimization¶

sklearn.decomposition.PCA
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

Use PCA decomposition to reduce the number of features from 784 features to 154 features¶

In [70]:
# Separating out the features
features = [*pixel_data][1:] # ['pix_val_0', 'pix_val_1',...]
x = pixel_df.loc[:, features].values 
len(x[0])
Out[70]:
784
In [71]:
pca = PCA(n_components=154)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents)
In [72]:
pixel_pca_df = pd.concat([principalDf, pixel_df[['actual_class']]], axis = 1)
In [73]:
pixel_pca_df.head().round(3)
Out[73]:
0 1 2 3 4 5 6 7 8 9 ... 145 146 147 148 149 150 151 152 153 actual_class
0 0.486 -1.226 -0.096 -2.179 -0.107 -0.912 0.918 0.627 -1.426 0.778 ... -0.229 -0.001 0.151 0.021 -0.049 -0.185 -0.005 -0.146 -0.077 5
1 3.968 -1.156 2.339 -1.807 -3.244 -0.714 -0.177 -0.412 0.159 0.592 ... -0.236 0.105 -0.065 0.072 0.144 0.083 -0.049 -0.120 -0.077 0
2 -0.203 1.538 -0.739 2.043 -1.203 -0.007 -3.369 1.445 -0.449 -0.700 ... -0.124 -0.135 -0.069 -0.190 -0.136 -0.086 0.285 -0.095 0.507 4
3 -3.134 -2.381 1.073 0.415 -0.007 2.744 -1.858 -0.264 1.187 0.044 ... -0.153 -0.022 0.012 -0.029 0.114 -0.104 -0.055 0.144 0.061 1
4 -1.501 2.865 0.064 -0.948 0.385 0.170 -0.359 -1.590 0.884 0.408 ... 0.177 -0.147 0.001 -0.092 -0.035 0.056 0.128 -0.053 -0.043 9

5 rows × 155 columns

In [77]:
pca.explained_variance_ratio_.sum()
Out[77]:
0.9498408

Build the DNN model¶

We use a Sequential class defined in Keras to create our model. All the layers are going to be Dense layers. This means, like the figure shown above, all the nodes of a layer would be connected to all the nodes of the preceding layer i.e. densely connected.

After the model is built, we view ....

In [81]:
model = Sequential([
    Dense(input_shape=[154], units=128, activation = tf.nn.relu,kernel_regularizer=tf.keras.regularizers.L2(0.001)),
    Dense(name = "output_layer", units = 10, activation = tf.nn.softmax)
])
In [82]:
model.summary()
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_1 (Dense)             (None, 128)               19840     
                                                                 
 output_layer (Dense)        (None, 10)                1290      
                                                                 
=================================================================
Total params: 21,130
Trainable params: 21,130
Non-trainable params: 0
_________________________________________________________________
In [83]:
keras.utils.plot_model(model, "mnist_model.png", show_shapes=True) 
You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) for plot_model to work.

Compile the DNN model¶

In addition to setting up our model architecture, we also need to define which algorithm should the model use in order to optimize the weights and biases as per the given data. We will use stochastic gradient descent.

We also need to define a loss function. Think of this function as the difference between the predicted outputs and the actual outputs given in the dataset. This loss needs to be minimized in order to have a higher model accuracy. That's what the optimization algorithm essentially does - it minimizes the loss during model training. For our multi-class classification problem, categorical cross entropy is commonly used.

Finally, we will use the accuracy during training as a metric to keep track of as the model trains.

tf.keras.optimizers.RMSprop
https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/RMSprop

tf.keras.losses.CategoricalCrossentropy
https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy

In [84]:
model.compile(optimizer='rmsprop',           
               loss = 'categorical_crossentropy',
               metrics=['accuracy'])

Train the DNN model¶

tf.keras.model.fit
https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

tf.keras.callbacks.EarlyStopping
https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping

In [85]:
history = model.fit(
    principalComponents
    ,y_train_encoded
    ,epochs = 200
    ,validation_split=0.20 
    ,callbacks=[tf.keras.callbacks.ModelCheckpoint("DNN_model.h5",save_best_only=True,save_weights_only=False)
                ,tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=2)] 
    )
Epoch 1/200
1500/1500 [==============================] - 2s 1ms/step - loss: 0.4238 - accuracy: 0.9104 - val_loss: 0.2290 - val_accuracy: 0.9562
Epoch 2/200
1500/1500 [==============================] - 2s 1ms/step - loss: 0.2014 - accuracy: 0.9593 - val_loss: 0.1793 - val_accuracy: 0.9663
Epoch 3/200
1500/1500 [==============================] - 2s 1ms/step - loss: 0.1622 - accuracy: 0.9694 - val_loss: 0.1560 - val_accuracy: 0.9706
Epoch 4/200
1500/1500 [==============================] - 2s 1ms/step - loss: 0.1426 - accuracy: 0.9747 - val_loss: 0.1441 - val_accuracy: 0.9737
Epoch 5/200
1500/1500 [==============================] - 2s 1ms/step - loss: 0.1296 - accuracy: 0.9762 - val_loss: 0.1392 - val_accuracy: 0.9734
Epoch 6/200
1500/1500 [==============================] - 2s 1ms/step - loss: 0.1199 - accuracy: 0.9786 - val_loss: 0.1344 - val_accuracy: 0.9725

Evaluate the DNN model¶

In order to ensure that this is not a simple "memorization" by the machine, we should evaluate the performance on the test set. This is easy to do, we simply use the evaluate method on our model.

In [89]:
principalComponentstest = pca.fit_transform(x_test_norm)
In [90]:
model = tf.keras.models.load_model("DNN_model.h5")
print(f"Test acc: {model.evaluate(principalComponentstest, y_test_encoded)[1]:.3f}")
313/313 [==============================] - 0s 705us/step - loss: 6.8867 - accuracy: 0.1260
Test acc: 0.126

Making Predictions¶

In [92]:
preds = model.predict(principalComponentstest)
print('shape of preds: ', preds.shape)
313/313 [==============================] - 0s 587us/step
shape of preds:  (10000, 10)

Look at the first 25 - Plot test set images along with their predicted and actual labels to understand how the trained model actually performed

In [93]:
plt.figure(figsize = (12, 8))

start_index = 0

for i in range(25):
    plt.subplot(5, 5, i + 1)
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])
    pred = np.argmax(preds[start_index + i])
    actual = np.argmax(y_test_encoded[start_index + i])
    col = 'g'
    if pred != actual:
        col = 'r'
    plt.xlabel('i={} | pred={} | true={}'.format(start_index + i, pred, actual), color = col)
    plt.imshow(x_test[start_index + i], cmap='binary')
plt.show()

Reviewing Performance¶

In [94]:
history_dict = history.history
history_dict.keys()
Out[94]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

Plot performance metrics¶

We use Matplotlib to create 2 plots--displaying the training and validation loss (resp. accuracy) for each (training) epoch side by side.

In [95]:
history_dict = history.history
history_dict.keys()
Out[95]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
In [96]:
losses = history.history['loss']
accs = history.history['accuracy']
val_losses = history.history['val_loss']
val_accs = history.history['val_accuracy']
epochs = len(losses)
In [97]:
history_df=pd.DataFrame(history_dict)
history_df.tail().round(3)
Out[97]:
loss accuracy val_loss val_accuracy
1 0.201 0.959 0.179 0.966
2 0.162 0.969 0.156 0.971
3 0.143 0.975 0.144 0.974
4 0.130 0.976 0.139 0.973
5 0.120 0.979 0.134 0.973
In [98]:
plot_history(history)
In [100]:
pred1= model.predict(principalComponentstest)
pred1=np.argmax(pred1, axis=1)
313/313 [==============================] - 0s 586us/step
In [101]:
print_validation_report(y_test, pred1)
Classification Report
              precision    recall  f1-score   support

           0       0.73      0.74      0.73       980
           1       0.04      0.01      0.02      1135
           2       0.11      0.13      0.12      1032
           3       0.05      0.07      0.06      1010
           4       0.02      0.02      0.02       982
           5       0.16      0.18      0.17       892
           6       0.13      0.09      0.10       958
           7       0.00      0.00      0.00      1028
           8       0.04      0.04      0.04       974
           9       0.01      0.02      0.01      1009

    accuracy                           0.13     10000
   macro avg       0.13      0.13      0.13     10000
weighted avg       0.13      0.13      0.12     10000

Accuracy Score: 0.126
Root Mean Square Error: 4.158064453564903

Create the confusion matrix¶

Let us see what the confusion matrix looks like. Using both sklearn.metrics. Then we visualize the confusion matrix and see what that tells us.

In [103]:
# Get the predicted classes:
# pred_classes = model.predict_classes(x_train_norm)# give deprecation warning
pred_classes = np.argmax(model.predict(principalComponentstest), axis=-1)
pred_classes;
313/313 [==============================] - 0s 570us/step
Correlation matrix that measures the linear relationships
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.corr.html
In [104]:
conf_mx = tf.math.confusion_matrix(y_test, pred_classes)
conf_mx;
In [105]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
df = pd.DataFrame(preds[0:20], columns = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
df.style.format("{:.2%}").background_gradient(cmap=cm)
Out[105]:
  0 1 2 3 4 5 6 7 8 9
0 0.00% 21.18% 0.03% 0.11% 0.00% 78.63% 0.01% 0.00% 0.04% 0.00%
1 0.00% 0.00% 0.05% 0.13% 1.04% 0.01% 0.00% 92.42% 0.01% 6.33%
2 0.07% 0.30% 10.66% 0.57% 0.23% 4.14% 0.01% 18.34% 0.34% 65.34%
3 97.14% 0.00% 2.73% 0.04% 0.00% 0.05% 0.00% 0.04% 0.00% 0.00%
4 0.00% 0.10% 92.31% 1.43% 0.00% 1.82% 4.27% 0.00% 0.07% 0.00%
5 0.00% 0.14% 0.38% 0.20% 0.04% 0.03% 0.00% 18.90% 0.04% 80.26%
6 0.00% 0.37% 5.87% 81.68% 0.00% 11.44% 0.08% 0.04% 0.52% 0.00%
7 0.01% 24.76% 43.71% 8.25% 8.70% 0.28% 0.01% 0.13% 13.66% 0.49%
8 0.53% 0.00% 21.63% 2.82% 1.71% 22.61% 12.01% 0.03% 38.68% 0.00%
9 0.01% 2.96% 0.39% 47.63% 0.00% 34.22% 0.00% 0.00% 14.79% 0.00%
10 82.22% 0.00% 0.08% 0.00% 9.96% 0.01% 0.34% 7.34% 0.00% 0.05%
11 1.17% 0.00% 66.30% 1.20% 20.12% 2.72% 2.63% 0.27% 4.37% 1.21%
12 0.00% 0.05% 0.07% 34.39% 0.00% 1.50% 0.00% 0.00% 63.98% 0.01%
13 96.78% 0.00% 0.75% 1.03% 0.00% 0.82% 0.08% 0.53% 0.00% 0.00%
14 0.00% 1.57% 0.17% 0.38% 0.01% 0.02% 0.00% 96.69% 0.95% 0.22%
15 0.07% 0.00% 0.24% 0.06% 36.47% 0.16% 1.83% 52.27% 0.12% 8.79%
16 0.00% 0.04% 14.30% 8.65% 0.00% 74.45% 1.21% 0.00% 1.35% 0.00%
17 0.00% 12.58% 0.07% 0.34% 0.00% 86.95% 0.02% 0.00% 0.04% 0.00%
18 0.42% 0.00% 4.61% 0.01% 13.18% 0.05% 3.46% 14.69% 54.85% 8.74%
19 0.04% 13.21% 26.12% 3.49% 0.03% 2.47% 0.24% 0.14% 54.22% 0.04%

Visualize the confusion matrix¶

We use code from chapter 3 of Hands on Machine Learning (A. Geron) (cf. https://github.com/ageron/handson-ml2/blob/master/03_classification.ipynb) to display a "heat map" of the confusion matrix. Then we normalize the confusion matrix so we can compare error rates.

See https://learning.oreilly.com/library/view/hands-on-machine-learning/9781492032632/ch03.html#classification_chapter

Correlation matrix that measures the linear relationships
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.corr.html
In [106]:
plot_confusion_matrix(y_test,pred_classes)