MSDS458 Research Assignment 1:¶

  • In this notebook, we will build a DNN model for classifying MNIST digits. The DNN model will consist of 784 input nodes, a hidden layer with 128 nodes and 10 output nodes (corresponding to the 10 digits).
  • We use mnist.load_data() to get the 70,000 images divided into a set of 60,000 training images and 10,000 test images. We hold back 5,000 of the 60,000 training images for validation.
  • After training and evaluating our DNN model we analyze its performance. In particular, we use confusion matrices to compare the predicted classes with the class labels to try to determine why some images were misclassified by the model.
  • We then obtain the 60,000 activation values of one of the hidden nodes for the (original) set of training data. We want to use these activation values as "proxies" for the predicted classes of the 60,000 images.
  • And just like we compared the predicted classes with the class labels using confusion matrices to determine the efficacy of the model, we use box plots to visualize the relationship between the activation values of one hidden node and the class labels. We don't expect these activation values to have much "predictive power". In fact, the same activation values can be associated with multiple class labels resulting in a lot of overlap in the box plots.
  • We also perform similar experiments comparing the values at two pixel locations in the images with the class labels. This time we use scatter plots to visualize the relationship between the pair of pixel values with the class labels (represented by different colored dots).
  • Pixel values at two locations in image should not have much predictive value. To improve on this approach, we the PCA decomposition on both the raw data of 784 pixel values and 128 hidden node activation values to reduce the number of features to 2 in each case. Once again, we use a scatter plot to visualize the correlation between the two principal component values and the class labels.
  • Finally, we use a Random Forest Classifier to find the relative importance of the 784 features (pixels) in the training set. We then select the 70 most important feature (pixels) from the training, validation and test images to test our 'best' model on.

Importing Packages¶

  • First we import all the packages that will be used in the assignment.

  • Since Keras is integrated in TensorFlow 2.x, we import keras from tensorflow and use tenserflow.keras.xxx to import all other Keras packages. The seed argument produces a deterministic sequence of tensors across multiple calls.

In [3]:
import datetime
from packaging import version
from collections import Counter
import numpy as np
import pandas as pd

import matplotlib as mpl  # EA
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import confusion_matrix, classification_report
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import mean_squared_error as MSE
from sklearn.metrics import accuracy_score

import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow import keras
from tensorflow.keras import models
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.datasets import mnist
2022-10-18 00:04:54.712252: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
In [4]:
%matplotlib inline
np.set_printoptions(precision=3, suppress=True) 

Verify TensorFlow version and Keras version¶

In [5]:
print("This notebook requires TensorFlow 2.0 or above")
print("TensorFlow version: ", tf.__version__)
assert version.parse(tf.__version__).release[0] >=2
This notebook requires TensorFlow 2.0 or above
TensorFlow version:  2.10.0
In [6]:
print("Keras version: ", keras.__version__)
Keras version:  2.10.0

Mount Google Drive to Colab environment¶

In [7]:
#from google.colab import drive
#drive.mount('/content/gdrive')

Research Assignment Reporting Functions¶

In [8]:
def print_validation_report(test_labels, predictions):
    print("Classification Report")
    print(classification_report(test_labels, predictions))
    print('Accuracy Score: {}'.format(accuracy_score(test_labels, predictions)))
    print('Root Mean Square Error: {}'.format(np.sqrt(MSE(test_labels, predictions)))) 
    
def plot_confusion_matrix(y_true, y_pred):
    mtx = confusion_matrix(y_true, y_pred)
    fig, ax = plt.subplots(figsize=(8,8))
    sns.heatmap(mtx, annot=True, fmt='d', linewidths=.75,  cbar=False, ax=ax,cmap='Blues',linecolor='white')
    #  square=True,
    plt.ylabel('true label')
    plt.xlabel('predicted label')

def plot_history(history):
  losses = history.history['loss']
  accs = history.history['accuracy']
  val_losses = history.history['val_loss']
  val_accs = history.history['val_accuracy']
  epochs = len(losses)

  plt.figure(figsize=(16, 4))
  for i, metrics in enumerate(zip([losses, accs], [val_losses, val_accs], ['Loss', 'Accuracy'])):
    plt.subplot(1, 2, i + 1)
    plt.plot(range(epochs), metrics[0], label='Training {}'.format(metrics[2]))
    plt.plot(range(epochs), metrics[1], label='Validation {}'.format(metrics[2]))
    plt.legend()
  plt.show()

def plot_digits(instances, pos, images_per_row=5, **options):
    size = 28
    images_per_row = min(len(instances), images_per_row)
    images = [instance.reshape(size,size) for instance in instances]
    n_rows = (len(instances) - 1) // images_per_row + 1
    row_images = []
    n_empty = n_rows * images_per_row - len(instances)
    images.append(np.zeros((size, size * n_empty)))
    for row in range(n_rows):
        rimages = images[row * images_per_row : (row + 1) * images_per_row]
        row_images.append(np.concatenate(rimages, axis=1))
    image = np.concatenate(row_images, axis=0)
    pos.imshow(image, cmap = 'binary', **options)
    pos.axis("off")

def plot_digit(data):
    image = data.reshape(28, 28)
    plt.imshow(image, cmap = 'hot',
               interpolation="nearest")
    plt.axis("off")

Loading MNIST Dataset¶

  • The MNIST dataset of handwritten digits has a training set of 60,000 images, and a test set of 10,000 images. It comes prepackaged as part of tf.Keras. Use the tf.keras.datasets.mnist.load_data to the get these datasets (and the corresponding labels) as Numpy arrays.
In [9]:
(x_train, y_train), (x_test, y_test)= tf.keras.datasets.mnist.load_data()
  • Tuples of Numpy arrays: (x_train, y_train), (x_test, y_test)
  • x_train, x_test: uint8 arrays of grayscale image data with shapes (num_samples, 28, 28).
  • y_train, y_test: uint8 arrays of digit labels (integers in range 0-9)

EDA Training and Test Sets¶

  • Inspect the training and test sets as well as their labels as follows.
In [10]:
print('x_train:\t{}'.format(x_train.shape))
print('y_train:\t{}'.format(y_train.shape))
print('x_test:\t\t{}'.format(x_test.shape))
print('y_test:\t\t{}'.format(y_test.shape))
x_train:	(60000, 28, 28)
y_train:	(60000,)
x_test:		(10000, 28, 28)
y_test:		(10000,)

Review labels for training set¶

In [11]:
print("First ten labels training dataset:\n {}\n".format(y_train[0:10]))
First ten labels training dataset:
 [5 0 4 1 9 2 1 3 1 4]

Find frequency of each label in training and test sets¶

In [12]:
items = [{'Class': x, 'Count': y} for x, y in Counter(y_train).items()]
distribution = pd.DataFrame(items).sort_values(['Class'])
sns.barplot(x=distribution.Class, y=distribution.Count);
In [13]:
items = [{'Class': x, 'Count': y} for x, y in Counter(y_test).items()]
distribution = pd.DataFrame(items).sort_values(['Class'])
sns.barplot(x=distribution.Class, y=distribution.Count);
In [14]:
Counter(y_train).most_common()
Out[14]:
[(1, 6742),
 (7, 6265),
 (3, 6131),
 (2, 5958),
 (9, 5949),
 (0, 5923),
 (6, 5918),
 (8, 5851),
 (4, 5842),
 (5, 5421)]
In [15]:
Counter(y_test).most_common()
Out[15]:
[(1, 1135),
 (2, 1032),
 (7, 1028),
 (3, 1010),
 (9, 1009),
 (4, 982),
 (0, 980),
 (8, 974),
 (6, 958),
 (5, 892)]

Plot sample images with their labels¶

In [16]:
fig = plt.figure(figsize = (15, 9))

for i in range(50):
    plt.subplot(5, 10, 1+i)
    plt.title(y_train[i])
    plt.xticks([])
    plt.yticks([])
    plt.imshow(x_train[i].reshape(28,28), cmap='binary')

Preprocessing Data¶

  • Before we build our model, we need to prepare the data into the shape the network expected
  • More specifically, we will convert the labels (integers 0 to 9) to 1D numpy arrays of shape (10,) with elements 0s and 1s.
  • We also reshape the images from 2D arrays of shape (28,28) to 1D float32 arrays of shape (784,) and then rescale their elements to values between 0 and 1.

Apply one-hot encoding on the labels¶

We will change the way the labels are represented from numbers (0 to 9) to vectors (1D arrays) of shape (10, ) with all the elements set to 0 except the one which the label belongs to - which will be set to 1. For example:

original label one-hot encoded label
5 [0 0 0 0 0 1 0 0 0 0]
7 [0 0 0 0 0 0 0 1 0 0]
1 [0 1 0 0 0 0 0 0 0 0]
In [17]:
y_train_encoded = to_categorical(y_train)
y_test_encoded = to_categorical(y_test)

print("First ten entries of y_train:\n {}\n".format(y_train[0:10]))
print("First ten rows of one-hot y_train:\n {}".format(y_train_encoded[0:10,]))
First ten entries of y_train:
 [5 0 4 1 9 2 1 3 1 4]

First ten rows of one-hot y_train:
 [[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]]
In [18]:
print('y_train_encoded shape: ', y_train_encoded.shape)
print('y_test_encoded shape: ', y_test_encoded.shape)
y_train_encoded shape:  (60000, 10)
y_test_encoded shape:  (10000, 10)

Reshape the images to 1D arrays¶

Reshape the images from shape (28, 28) 2D arrays to shape (784, ) vectors (1D arrays).

In [19]:
# Before reshape:
print('x_train:\t{}'.format(x_train.shape))
print('x_test:\t\t{}'.format(x_test.shape))
x_train:	(60000, 28, 28)
x_test:		(10000, 28, 28)
In [20]:
# Reshape the images:
x_train_reshaped = np.reshape(x_train, (60000, 784))
x_test_reshaped = np.reshape(x_test, (10000, 784))

# After reshape:
print('x_train_reshaped shape: ', x_train_reshaped.shape)
print('x_test_reshaped shape: ', x_test_reshaped.shape)
x_train_reshaped shape:  (60000, 784)
x_test_reshaped shape:  (10000, 784)
  1. Each element in an image is a pixel value
  2. Pixel values range from 0 to 255
  3. 0 = White
  4. 255 = Black

Review unique values with set from 1st image¶

In [21]:
print(set(x_train_reshaped[0]))
{0, 1, 2, 3, 9, 11, 14, 16, 18, 23, 24, 25, 26, 27, 30, 35, 36, 39, 43, 45, 46, 49, 55, 56, 64, 66, 70, 78, 80, 81, 82, 90, 93, 94, 107, 108, 114, 119, 126, 127, 130, 132, 133, 135, 136, 139, 148, 150, 154, 156, 160, 166, 170, 171, 172, 175, 182, 183, 186, 187, 190, 195, 198, 201, 205, 207, 212, 213, 219, 221, 225, 226, 229, 238, 240, 241, 242, 244, 247, 249, 250, 251, 252, 253, 255}

Rescale the elements of the reshaped images¶

Rescale the elements between [0 and 1]

In [22]:
x_train_norm = x_train_reshaped.astype('float32') / 255
x_test_norm = x_test_reshaped.astype('float32') / 255
In [23]:
# Take a look at the first reshaped and normalized training image:
print(set(x_train_norm[0]))
{0.0, 0.011764706, 0.53333336, 0.07058824, 0.49411765, 0.6862745, 0.101960786, 0.6509804, 1.0, 0.96862745, 0.49803922, 0.11764706, 0.14117648, 0.36862746, 0.6039216, 0.6666667, 0.043137256, 0.05490196, 0.03529412, 0.85882354, 0.7764706, 0.7137255, 0.94509804, 0.3137255, 0.6117647, 0.41960785, 0.25882354, 0.32156864, 0.21960784, 0.8039216, 0.8666667, 0.8980392, 0.7882353, 0.52156866, 0.18039216, 0.30588236, 0.44705883, 0.3529412, 0.15294118, 0.6745098, 0.88235295, 0.99215686, 0.9490196, 0.7647059, 0.2509804, 0.19215687, 0.93333334, 0.9843137, 0.74509805, 0.7294118, 0.5882353, 0.50980395, 0.8862745, 0.105882354, 0.09019608, 0.16862746, 0.13725491, 0.21568628, 0.46666667, 0.3647059, 0.27450982, 0.8352941, 0.7176471, 0.5803922, 0.8117647, 0.9764706, 0.98039216, 0.73333335, 0.42352942, 0.003921569, 0.54509807, 0.67058825, 0.5294118, 0.007843138, 0.31764707, 0.0627451, 0.09411765, 0.627451, 0.9411765, 0.9882353, 0.95686275, 0.83137256, 0.5176471, 0.09803922, 0.1764706}

Build the DNN model¶

We use a Sequential class defined in Keras to create our model. All the layers are going to be Dense layers. This means, like the figure shown above, all the nodes of a layer would be connected to all the nodes of the preceding layer i.e. densely connected.

After the model is built, we view ....

In [24]:
model = Sequential([
    Dense(input_shape=[784], units=1, activation = tf.nn.relu,kernel_regularizer=tf.keras.regularizers.L2(0.001)),
    Dense(name = "output_layer", units = 10, activation = tf.nn.softmax)
])
2022-10-18 00:05:01.923961: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
In [25]:
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 1)                 785       
                                                                 
 output_layer (Dense)        (None, 10)                20        
                                                                 
=================================================================
Total params: 805
Trainable params: 805
Non-trainable params: 0
_________________________________________________________________
In [26]:
keras.utils.plot_model(model, "mnist_model.png", show_shapes=True) 
You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) for plot_model to work.

Compile the DNN model¶

In addition to setting up our model architecture, we also need to define which algorithm should the model use in order to optimize the weights and biases as per the given data. We will use stochastic gradient descent.

We also need to define a loss function. Think of this function as the difference between the predicted outputs and the actual outputs given in the dataset. This loss needs to be minimized in order to have a higher model accuracy. That's what the optimization algorithm essentially does - it minimizes the loss during model training. For our multi-class classification problem, categorical cross entropy is commonly used.

Finally, we will use the accuracy during training as a metric to keep track of as the model trains.

tf.keras.optimizers.RMSprop
https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/RMSprop

tf.keras.losses.CategoricalCrossentropy
https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy

In [27]:
model.compile(optimizer='rmsprop',           
               loss = 'categorical_crossentropy',
               metrics=['accuracy'])

Train the DNN model¶

tf.keras.model.fit
https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

tf.keras.callbacks.EarlyStopping
https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping

In [28]:
history = model.fit(
    x_train_norm
    ,y_train_encoded
    ,epochs = 200
    ,validation_split=0.20 
    ,callbacks=[tf.keras.callbacks.ModelCheckpoint("DNN_model.h5",save_best_only=True,save_weights_only=False)
                ,tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=2)] 
    )
Epoch 1/200
1500/1500 [==============================] - 2s 938us/step - loss: 2.0451 - accuracy: 0.1872 - val_loss: 1.9215 - val_accuracy: 0.2074
Epoch 2/200
1500/1500 [==============================] - 1s 838us/step - loss: 1.8586 - accuracy: 0.2775 - val_loss: 1.7894 - val_accuracy: 0.3047
Epoch 3/200
1500/1500 [==============================] - 1s 833us/step - loss: 1.7613 - accuracy: 0.3082 - val_loss: 1.7260 - val_accuracy: 0.3273
Epoch 4/200
1500/1500 [==============================] - 1s 837us/step - loss: 1.7178 - accuracy: 0.3245 - val_loss: 1.6929 - val_accuracy: 0.3334
Epoch 5/200
1500/1500 [==============================] - 1s 833us/step - loss: 1.6914 - accuracy: 0.3292 - val_loss: 1.6702 - val_accuracy: 0.3079
Epoch 6/200
1500/1500 [==============================] - 1s 822us/step - loss: 1.6730 - accuracy: 0.3238 - val_loss: 1.6558 - val_accuracy: 0.3154

Evaluate the DNN model¶

In order to ensure that this is not a simple "memorization" by the machine, we should evaluate the performance on the test set. This is easy to do, we simply use the evaluate method on our model.

In [29]:
model = tf.keras.models.load_model("DNN_model.h5")
print(f"Test acc: {model.evaluate(x_test_norm, y_test_encoded)[1]:.3f}")
313/313 [==============================] - 0s 650us/step - loss: 1.6782 - accuracy: 0.3131
Test acc: 0.313
In [30]:
loss, accuracy = model.evaluate(x_test_norm, y_test_encoded)
print('test set accuracy: ', accuracy * 100)
313/313 [==============================] - 0s 786us/step - loss: 1.6782 - accuracy: 0.3131
test set accuracy:  31.310001015663147

Making Predictions¶

In [31]:
preds = model.predict(x_test_norm)
print('shape of preds: ', preds.shape)
313/313 [==============================] - 0s 545us/step
shape of preds:  (10000, 10)

Look at the first 25 - Plot test set images along with their predicted and actual labels to understand how the trained model actually performed

In [32]:
plt.figure(figsize = (12, 8))

start_index = 0

for i in range(25):
    plt.subplot(5, 5, i + 1)
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])
    pred = np.argmax(preds[start_index + i])
    actual = np.argmax(y_test_encoded[start_index + i])
    col = 'g'
    if pred != actual:
        col = 'r'
    plt.xlabel('i={} | pred={} | true={}'.format(start_index + i, pred, actual), color = col)
    plt.imshow(x_test[start_index + i], cmap='binary')
plt.show()

Reviewing Performance¶

In [33]:
history_dict = history.history
history_dict.keys()
Out[33]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

Plot performance metrics¶

We use Matplotlib to create 2 plots--displaying the training and validation loss (resp. accuracy) for each (training) epoch side by side.

In [34]:
history_dict = history.history
history_dict.keys()
Out[34]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
In [35]:
losses = history.history['loss']
accs = history.history['accuracy']
val_losses = history.history['val_loss']
val_accs = history.history['val_accuracy']
epochs = len(losses)
In [36]:
history_df=pd.DataFrame(history_dict)
history_df.tail().round(3)
Out[36]:
loss accuracy val_loss val_accuracy
1 1.859 0.278 1.789 0.305
2 1.761 0.308 1.726 0.327
3 1.718 0.324 1.693 0.333
4 1.691 0.329 1.670 0.308
5 1.673 0.324 1.656 0.315
In [37]:
plot_history(history)
In [38]:
pred1= model.predict(x_test_norm)
pred1=np.argmax(pred1, axis=1)
313/313 [==============================] - 0s 552us/step
In [39]:
print_validation_report(y_test, pred1)
Classification Report
              precision    recall  f1-score   support

           0       0.26      0.38      0.31       980
           1       0.32      0.59      0.41      1135
           2       0.44      0.62      0.51      1032
           3       0.26      0.36      0.30      1010
           4       0.21      0.06      0.10       982
           5       0.00      0.00      0.00       892
           6       0.27      0.13      0.18       958
           7       0.32      0.88      0.47      1028
           8       0.00      0.00      0.00       974
           9       0.00      0.00      0.00      1009

    accuracy                           0.31     10000
   macro avg       0.21      0.30      0.23     10000
weighted avg       0.21      0.31      0.23     10000

Accuracy Score: 0.3131
Root Mean Square Error: 3.1530144306678967
/Users/apoorvsara/opt/anaconda3/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/Users/apoorvsara/opt/anaconda3/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/Users/apoorvsara/opt/anaconda3/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

Create the confusion matrix¶

Let us see what the confusion matrix looks like. Using both sklearn.metrics. Then we visualize the confusion matrix and see what that tells us.

In [40]:
# Get the predicted classes:
# pred_classes = model.predict_classes(x_train_norm)# give deprecation warning
pred_classes = np.argmax(model.predict(x_test_norm), axis=-1)
pred_classes;
313/313 [==============================] - 0s 536us/step
Correlation matrix that measures the linear relationships
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.corr.html
In [41]:
conf_mx = tf.math.confusion_matrix(y_test, pred_classes)
conf_mx;
In [42]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
df = pd.DataFrame(preds[0:20], columns = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
df.style.format("{:.2%}").background_gradient(cmap=cm)
Out[42]:
  0 1 2 3 4 5 6 7 8 9
0 6.89% 3.79% 0.12% 1.11% 24.16% 2.34% 0.17% 29.98% 1.55% 29.88%
1 0.00% 0.01% 61.62% 0.96% 0.00% 0.04% 37.08% 0.00% 0.28% 0.00%
2 22.68% 20.85% 2.75% 11.68% 5.40% 15.28% 3.53% 2.62% 13.53% 1.68%
3 20.62% 21.22% 3.83% 13.72% 2.71% 16.15% 4.80% 1.07% 15.25% 0.62%
4 6.89% 3.79% 0.12% 1.11% 24.16% 2.34% 0.17% 29.98% 1.55% 29.88%
5 22.84% 20.71% 2.63% 11.40% 5.84% 15.11% 3.39% 2.91% 13.28% 1.89%
6 6.89% 3.79% 0.12% 1.11% 24.16% 2.34% 0.17% 29.98% 1.55% 29.88%
7 6.89% 3.79% 0.12% 1.11% 24.16% 2.34% 0.17% 29.98% 1.55% 29.88%
8 22.04% 16.95% 1.36% 7.56% 13.41% 11.70% 1.81% 9.03% 9.36% 6.76%
9 6.89% 3.79% 0.12% 1.11% 24.16% 2.34% 0.17% 29.98% 1.55% 29.88%
10 21.75% 21.19% 3.29% 12.78% 3.81% 15.84% 4.17% 1.66% 14.50% 1.02%
11 6.79% 13.35% 14.46% 19.65% 0.03% 12.60% 15.91% 0.00% 17.20% 0.00%
12 6.89% 3.79% 0.12% 1.11% 24.16% 2.34% 0.17% 29.98% 1.55% 29.88%
13 7.46% 4.17% 0.14% 1.24% 24.27% 2.59% 0.20% 29.33% 1.73% 28.87%
14 17.29% 20.43% 5.41% 15.75% 1.10% 16.28% 6.60% 0.34% 16.63% 0.17%
15 17.50% 20.51% 5.30% 15.64% 1.17% 16.29% 6.48% 0.36% 16.57% 0.19%
16 6.89% 3.79% 0.12% 1.11% 24.16% 2.34% 0.17% 29.98% 1.55% 29.88%
17 7.45% 4.16% 0.14% 1.24% 24.27% 2.59% 0.20% 29.34% 1.72% 28.89%
18 2.77% 7.88% 23.84% 18.59% 0.00% 8.41% 24.33% 0.00% 14.19% 0.00%
19 6.89% 3.79% 0.12% 1.11% 24.16% 2.34% 0.17% 29.98% 1.55% 29.88%

Visualize the confusion matrix¶

We use code from chapter 3 of Hands on Machine Learning (A. Geron) (cf. https://github.com/ageron/handson-ml2/blob/master/03_classification.ipynb) to display a "heat map" of the confusion matrix. Then we normalize the confusion matrix so we can compare error rates.

See https://learning.oreilly.com/library/view/hands-on-machine-learning/9781492032632/ch03.html#classification_chapter

Correlation matrix that measures the linear relationships
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.corr.html
In [43]:
plot_confusion_matrix(y_test,pred_classes)

Analyzing the Activation Values of the Hidden Nodes¶

We want to examine the contribution of the individual hidden nodes to the classifications made by the model. We first get the activation values of all the hidden nodes for each of the 60,000 training images and treat these 128 activations as the features that determine the classification class. For the sake of comparison, we also consider the 784 pixels of each training image and determine the contribution of the individual pixels to the predicted classification class.

Our goal is to use box and scatter plots to visualize how these features (pixel and activation values) correlate with the class labels. Because of the high dimension of the feature spaces, we apply PCA decomposition and t-Distributed stochastic neighbor embedding (t-SNE) to reduce the number of features in each case.

We use the following two articles as reference

  • https://towardsdatascience.com/pca-using-python-scikit-learn-e653f8989e60
  • https://towardsdatascience.com/visualising-high-dimensional-datasets-using-pca-and-t-sne-in-python-8ef87e7915b

Get the activation values of the hidden nodes¶

To get the activation values of the hidden nodes, we need to create a new model, activation_model, that takes the same input as our current model but outputs the activation value of the hidden layer, i.e. of the hidden node. Then use the predict function to get the activation values.

In [45]:
# Extracts the outputs of the 2 layers:
layer_outputs = [layer.output for layer in model.layers]

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

print(f"There are {len(layer_outputs)} layers")
layer_outputs; # description of the layers
There are 2 layers
In [46]:
# Get the outputs of all the hidden nodes for each of the 60000 training images
activations = activation_model.predict(x_train_norm)
hidden_layer_activation = activations[0]
output_layer_activations = activations[1]
hidden_layer_activation.shape   #  each of the 128 hidden nodes has one activation value per training image
1875/1875 [==============================] - 1s 581us/step
Out[46]:
(60000, 1)
In [47]:
output_layer_activations.shape
Out[47]:
(60000, 10)
In [48]:
print(f"The maximum activation value of the hidden nodes in the hidden layer is \
{hidden_layer_activation.max()}")
The maximum activation value of the hidden nodes in the hidden layer is 19.58106231689453
In [49]:
# Some stats about the output layer as an aside...
np.set_printoptions(suppress = True)  # display probabilities as decimals and NOT in scientific notation
ouput_layer_activation = activations[1]
print(f"The output node has shape {ouput_layer_activation.shape}")
print(f"The output for the first image are {ouput_layer_activation[0].round(4)}")
print(f"The sum of the probabilities is (approximately) {ouput_layer_activation[0].sum()}")
The output node has shape (60000, 10)
The output for the first image are [0.133 0.186 0.077 0.176 0.004 0.156 0.091 0.001 0.175 0.   ]
The sum of the probabilities is (approximately) 1.0000001192092896

Create a dataframe with the activation values and the class labels¶

In [51]:
#Get the dataframe of all the node values
activation_data = {'actual_class':y_train}
for k in range(0,1): 
    activation_data[f"act_val_{k}"] = hidden_layer_activation[:,k]

activation_df = pd.DataFrame(activation_data)
activation_df.head(15).round(3).T
Out[51]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
actual_class 5.000 0.000 4.000 1.000 9.0 2.000 1.000 3.000 1.000 4.0 3.000 5.000 3.000 6.00 1.000
act_val_0 2.617 0.298 0.426 2.375 0.0 3.019 2.262 3.189 1.768 0.0 2.458 3.321 4.357 5.95 2.088

Visualize the activation values with boxplots¶

We get the activation values of the first hidden node and combine them with the corresponding class labels into a DataFrame. We use both matplotlib and seaborn to create boxplots from the dataframe.

seaborn.boxplot
ps://seaborn.pydata.org/generated/seaborn.boxplot.html
In [52]:
# To see how closely the hidden node activation values correlate with the class labels
# Let us use seaborn for the boxplots this time.
bplot = sns.boxplot(y='act_val_0', x='actual_class', 
                 data=activation_df[['act_val_0','actual_class']], 
                 width=0.5,
                 palette="colorblind")

Displaying The Range Of Activation Values For Each Class Labels¶

In [53]:
activation_df.groupby("actual_class")["act_val_0"].apply(lambda x: [round(min(x.tolist()),2),
 round(max(x.tolist()),2)]).reset_index().rename(columns={"act_val_0": "range_of_act_values"})
Out[53]:
actual_class range_of_act_values
0 0 [0.0, 9.65]
1 1 [0.0, 12.25]
2 2 [0.0, 19.58]
3 3 [0.0, 13.1]
4 4 [0.0, 9.16]
5 5 [0.0, 13.41]
6 6 [0.0, 13.91]
7 7 [0.0, 6.7]
8 8 [0.0, 10.21]
9 9 [0.0, 3.87]