The aim of this lab is pattern recognition, in our case of hand-written digits and characters.
To do this, we use two sets of given data: training data and testing data. Both sets contain images of digits and the right answer to what digit is written in each image (labels).
First, we construct a model from the training data, using a method of machine learning. Afterwords, we check the quality of this model applying it to the testing data.
We shall proceed by steps:
We shall use the Extreme Learning Machine (ELM) method for image classification. This method minimizes two quantities:
Thus, we set the following problem for the training data:
$$ \underset{\mathbf{\beta}}{Min:}\quad L_{ELM}=\frac{1}{2}\Vert\mathbf{\boldsymbol{\beta}}\Vert^{2}+\frac{C}{2}\sum_{i=1}^{N}\Vert\mathbf{\xi}_{i}\Vert^{2} =\frac{1}{2}\Vert\mathbf{\boldsymbol{\beta}}\Vert^{2}+\frac{C}{2}\Vert\mathbf{Y}-\mathbf{H}\boldsymbol{\beta}\Vert^{2}. $$Here, $\Vert\cdot\Vert$ denotes the Frobenius norm of a matrix, $N$ is the number of training samples, $C$ is a regularizing parameter, and
Since the minimization problem is quadratic, its solution is given by $$ \overset{\star}{\mathbf{\beta}}=\mathbf{H}^{T}\mathbf{W} $$ where $\mathbf{W}$, the weights, are defined as $$ \mathbf{W}=\left(\frac{\mathbf{I}}{C}+\mathbf{\Omega}\right)^{-1}\mathbf{Y} $$ being $\mathbf{\Omega}=\mathbf{H}\mathbf{H^{T}}$ the linear kernel matrix. In an expanded form, the elements of $\mathbf{\Omega}$ are written as $$ \mathbf{\Omega}_{ij}=\mathbf{x}_i \centerdot \mathbf{x}_j = \sum_{k=1}^{n} x_{ik} x_{jk}, $$ where $\mathbf{x}_i$ denotes the $i-$row of $\mathbf{H}$ (training data). We shall see later other (nonlinear) choices for the kernel matrix.
The value for the testing samples is then given by $$ {\mathbf{Y}_{te}}=\mathbf{H}_{te} \overset{\star}{\mathbf{\beta}} = \mathbf{H}_{te}\mathbf{H}^{T}\mathbf{W}={\mathbf{\Omega}_{te}}\mathbf{W}. $$
with
$$ {\mathbf{\Omega}_{te}}= \mathbf{H}_{te}\mathbf{H}^{T}. $$In our application, the samples are images of size $28\times28$ of a hand-written digit. In the training matrix, rows correspond to images (reshaped to size $784\times 1$).
We have radomly chosen, from the MNIST data set, $900$ images for training and $100$ images for testing. They are saved, together with their corresponding labels, in file data_numbers.zip
.
import numpy as np
import matplotlib.pyplot as plt
We read the data
data_train = np.loadtxt('data_train1.txt')
labels_train = np.loadtxt('labels_train1.txt')
data_test = np.loadtxt('data_test1.txt')
labels_test = np.loadtxt('labels_test1.txt')
Let's have a look on the training images. First, we have to reshape to size $28\times28$.
plt.figure(figsize=(8,8))
for k in range(0, 100):
plt.subplot(10, 10, k+1)
image = data_train[k, ]
image = image.reshape(28, 28)
plt.imshow(image, cmap='gray')
plt.axis('off')
plt.show()
We define some parameters
C = 1 # regularization
number_classes = 10 # classes of digits
n = data_train.shape[0] # number of rows (images) in training matrix
m = data_test.shape[0] # number of rows (images) in testing matrix
I = np.identity(n)
And change type (integer to float) and normalize to $[0,1]$ both the training data and the test data:
H = data_train/255. #the images are in [0,255]
H_te = data_test/255.
Now we define the label matrices from the labels contained in labels_train
and labels_test
, which are vectors containing the correct labels. These matrices (one for training and another for testing) have $10$ columns (number of classes) and $900$ or $100$ rows (training and testing samples).
For the sample $k$ we fill the $k$ row with zeros except in the column corresponding to the correct class (first column for label $0$, second column for label $1$, etc.), where we write $1$:
Y = np.zeros((n, number_classes))
for i in range(0, n):
Y[i, int(labels_train[i])] = 1
We then compute the prediction given by the model,
$$ {\mathbf{Y}_{te}} = \mathbf{H}_{te}\mathbf{H}^{T}\mathbf{W},\quad \mathrm{with}\quad {\mathbf{\Omega}_{test}}=\mathbf{H}_{test}\mathbf{H}^{T}, $$for which we need to compute $\mathbf{W}$, which is given as the solution to
$$ \left(\frac{\mathbf{I}}{C}+\mathbf{\Omega}\right)\mathbf{W}=\mathbf{Y}\quad \mathrm{with}\quad {\mathbf{\Omega}}=\mathbf{H}\mathbf{H}^{T} $$Omega = np.dot(H, H.transpose())
W = np.linalg.solve(I/C + Omega, Y)
Therefore
Omega_te = np.dot( H_te, H.transpose())
Y_te = np.dot(Omega_te, W)
The label predicted for each testing image is obtained by extracting the maximum value of the corresponding column of Y_test
predicted_test = Y_te.argmax(axis=1)
We now check the success percentage:
ttsp = np.sum(predicted_test == labels_test)/float(m)*100.
print('Testing success = %.1f%%' % ttsp)
To look into the results with more detail we will compute the confusion matrix.
from sklearn.metrics import confusion_matrix
mc = confusion_matrix(labels_test, predicted_test)
print('Confusion matrix')
print(mc)
plt.figure(figsize=(6,6))
ticks = range(10)
plt.xticks(ticks)
plt.yticks(ticks)
plt.imshow(mc,cmap=plt.cm.Blues)
plt.colorbar(shrink=0.8)
w, h = mc.shape
for i in range(w):
for j in range(h):
plt.annotate(str(mc[i][j]), xy=(j, i),
horizontalalignment='center',
verticalalignment='center')
plt.xlabel('Predicted label')
plt.ylabel('Actual label')
plt.title('Confusion matrix')
plt.show()
Let us see the corresponding images:
print('\nLabels predicted for testing samples')
for i in range(0,m):
if i%10 == 9:
print(predicted_test[i]),
else:
print(predicted_test[i], end=" "),
print('\n')
print('Images corresponding to the above labels')
for k in range(0, 100):
plt.subplot(10, 10, k+1)
image = data_test[k, ]
image = image.reshape(28, 28)
plt.imshow(image, cmap=plt.cm.gray)
plt.axis('off')
plt.show()
For computing the kernel matrices $\Omega$ and $\Omega_{te}$ we used the dot product between elements of the training and testing data matrices $H$ and $H_{te}$. In machine learning, this is known as a linear kernel. Generically,
$$ K_{Lin}(\mathbf{x}_i,\mathbf{x}_j)=\mathbf{x}_i \centerdot \mathbf{x}_j = \sum_{k=1}^{n} x_{ik} x_{jk}. $$However, we may use a large class of kernel functions (satisying some suitable properties) among which the most usual are the Gaussian kernel, also called RBF (radial basis function), given by
$$ K_{RBF}(\mathbf{x}_i,\mathbf{x}_j)=\mathrm{exp}\left(-\dfrac{||\mathbf{x}_i-\mathbf{x}_j||^{2}}{\sigma}\right), $$or the polynomial kernel
$$ K_{Poly}(\mathbf{x}_i,\mathbf{x}_j)=(\mathbf{x}_i\centerdot\mathbf{x}_j+a)^{b}. $$In these nonlinear kernels we are introducing parameters that must be fixed in advance: $a$ and $b$ for the polynomial kernel and $\sigma$ for the RBF kernel.
Let's see if the previous result using the linear kernel may be improved. For instance, using the polynomial kernel,
C = 1 # Regularization
a = 1 # polynomial kernel
b = 3
We have already constructed the matrices $\mathbf{H}$, $\mathbf{H}_{test}$ e $\mathbf{Y}$.
# Polynomial kernel function
KernelPoly = lambda X, Y : (np.dot(X,Y.T)+a)**b
# Omega's
OmegaP = KernelPoly(H, H)
W = np.linalg.solve(I/C + OmegaP, Y)
OmegaP_te = KernelPoly(H_te, H)
YP_te = np.dot(OmegaP_te, W)
# prediction
predictedP_test = YP_te.argmax(axis=1)
# success percentage
percent = np.sum(predictedP_test == labels_test)/float(m)*100.
print('Testing success = %.1f%%' % percent)
# confusion matrix
mc = confusion_matrix(labels_test, predictedP_test)
print('Confusion matrix')
print(mc)
plt.figure(figsize=(6,6))
ticks = range(10)
plt.xticks(ticks)
plt.yticks(ticks)
plt.imshow(mc,cmap=plt.cm.Blues)
plt.colorbar(shrink=0.8)
w, h = mc.shape
for i in range(w):
for j in range(h):
plt.annotate(str(mc[i][j]), xy=(j, i),
horizontalalignment='center',
verticalalignment='center')
plt.xlabel('Predicted label')
plt.ylabel('Actual label')
plt.title('Confusion matrix')
plt.show()
# Viewing results
print('\r')
print('Labels predicted for testing samples')
for i in range(0,m):
if i%10 == 9:
print(predictedP_test[i]),
else:
print(predictedP_test[i], end=" "),
print('\n')
print('Images corresponding to the above labels')
for k in range(0, 100):
plt.subplot(10, 10, k+1)
image = data_test[k, ]
image = image.reshape(28, 28)
plt.imshow(image, cmap=plt.cm.gray)
plt.axis('off')
plt.show()
def KernelRBF(X,Y,g):
"""
Computes the kernel matrix
for the gaussian kernel
"""
m = X.shape[0]
n = Y.shape[0]
K = np.zeros((m,n))
for i in range(m):
for j in range(n):
dif = np.linalg.norm(X[i,:]-Y[j,:])
K[i,j] = np.exp(-dif**2/g)
return K
choose the best parameters $(C,\sigma)$, in the sense of maximum percentage success, where $(C,\sigma)$ may take the following values
C_list = [ 1., 10., 100., 1000.]
sigma_list = [ 1., 10., 100., 1000.]
Give a table with the success percentages, like the following (running may take several minutes). Give the predicted labels.
%run Exercise1
The file data_char.txt
contains images of size $20\times16$ of hand-written characters, each of them stored as a row. The file labels_char.txt
contains the corresponding labels (26 labels from 0 to 25, for the corresponding alphabetic characters).
Use $90\%$ of the samples for training and $10\%$ for testing under a RBF kernel, with $\sigma = 100$ and $C=1$. Plot the testing images together with the predicted labels.
%run Exercise2