MNIST digits classification using Logistic regression in Scikit-Learn¶

This notebook is broadly adopted from this blog and this scikit-learn example

Logistic regression on smaller built-in subset¶

Load the dataset¶

In [1]:

Copied!

from sklearn.datasets import load_digits
digits = load_digits()
from sklearn.datasets import load_digits
digits = load_digits()

In [2]:

Copied!

type(digits.data)
type(digits.data)

Out[2]:

numpy.ndarray

In [3]:

Copied!

(digits.data.shape, digits.target.shape, digits.images.shape)
(digits.data.shape, digits.target.shape, digits.images.shape)

Out[3]:

((1797, 64), (1797,), (1797, 8, 8))

1797 images, each 8x8 in dimension and 1797 labels.

Display sample data¶

In [4]:

Copied!

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [5]:

Copied!





plt.figure(figsize=(20,4))
for index, (image, label) in enumerate(zip(digits.data[0:5], 
                                           digits.target[0:5])):
    plt.subplot(1, 5, index + 1)
    plt.imshow(np.reshape(image, (8,8)), cmap=plt.cm.gray)
    plt.title('Training: %i\n' % label, fontsize = 20);
plt.figure(figsize=(20,4))
for index, (image, label) in enumerate(zip(digits.data[0:5], 
                                           digits.target[0:5])):
    plt.subplot(1, 5, index + 1)
    plt.imshow(np.reshape(image, (8,8)), cmap=plt.cm.gray)
    plt.title('Training: %i\n' % label, fontsize = 20);

No description has been provided for this image

Split into training and test¶

In [5]:

Copied!





from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(digits.data, 
                                                    digits.target,
                                                   test_size=0.25,
                                                   random_state=0)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(digits.data, 
                                                    digits.target,
                                                   test_size=0.25,
                                                   random_state=0)

In [7]:

Copied!

X_train.shape, X_test.shape
X_train.shape, X_test.shape

Out[7]:

((1347, 64), (450, 64))

Learning¶

Refer to the Logistic reg API ref for these parameters and the guide for equations, particularly how penalties are applied.

In [6]:

Copied!





from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(fit_intercept=True,
                        multi_class='auto',
                        penalty='l2', #ridge regression
                        solver='saga',
                        max_iter=10000,
                        C=50)
clf
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(fit_intercept=True,
                        multi_class='auto',
                        penalty='l2', #ridge regression
                        solver='saga',
                        max_iter=10000,
                        C=50)
clf

Out[6]:

LogisticRegression(C=50, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='saga', tol=0.0001, verbose=0,
                   warm_start=False)

In [9]:

Copied!

%%time
clf.fit(X_train, y_train)
%%time
clf.fit(X_train, y_train)

CPU times: user 6.81 s, sys: 9.52 ms, total: 6.81 s
Wall time: 6.82 s

Out[9]:

LogisticRegression(C=50, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='saga', tol=0.0001, verbose=0,
                   warm_start=False)

Let us see what the classifier has learned

In [10]:

Copied!

clf.classes_
clf.classes_

Out[10]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [11]:

Copied!

clf.coef_.shape
clf.coef_.shape

Out[11]:

(10, 64)

In [12]:

Copied!

clf.coef_[0].round(2) # prints weights for 8x8 image for class 0
clf.coef_[0].round(2) # prints weights for 8x8 image for class 0

Out[12]:

array([ 0.  , -0.  , -0.04,  0.1 ,  0.06, -0.14, -0.16, -0.02, -0.  ,
       -0.03, -0.04,  0.2 ,  0.09,  0.08, -0.05, -0.01, -0.  ,  0.06,
        0.15, -0.03, -0.39,  0.25,  0.09, -0.  , -0.  ,  0.13,  0.16,
       -0.18, -0.57,  0.02,  0.12, -0.  ,  0.  ,  0.16,  0.11, -0.16,
       -0.41,  0.05,  0.08,  0.  , -0.  , -0.06,  0.27, -0.11, -0.2 ,
        0.15,  0.04, -0.  , -0.  , -0.12,  0.08, -0.05,  0.2 ,  0.1 ,
       -0.04, -0.01, -0.  , -0.01, -0.09,  0.21, -0.04, -0.06, -0.1 ,
       -0.05])

In [13]:

Copied!

clf.intercept_ # for 10 classes - this is a One-vs-All classification
clf.intercept_ # for 10 classes - this is a One-vs-All classification

Out[13]:

array([ 0.0010181 , -0.07236521,  0.00379207,  0.00459855,  0.04585855,
        0.00014299, -0.00442972,  0.01179654,  0.04413398, -0.03454583])

In [14]:

Copied!

clf.n_iter_[0] # num of iterations before tolerance was reached
clf.n_iter_[0] # num of iterations before tolerance was reached

Out[14]:

Viewing coefficients as an image¶

Since there is a coefficient for each pixel in the 8x8 image, we can view them as an image itself. The code below is similar to the original viz code, but runs on coeff.

In [16]:

Copied!

coef = clf.coef_.copy()
plt.imshow(coef[0].reshape(8,8).round(2));  # proof of concept
coef = clf.coef_.copy()
plt.imshow(coef[0].reshape(8,8).round(2));  # proof of concept

In [17]:

Copied!





coef = clf.coef_.copy()
scale = np.abs(coef).max()
plt.figure(figsize=(10,5))

for i in range(10): # 0-9
    coef_plot = plt.subplot(2, 5, i + 1) # 2x5 plot

    coef_plot.imshow(coef[i].reshape(8,8), 
                     cmap=plt.cm.RdBu,
                     vmin=-scale, vmax=scale,
                    interpolation='bilinear')
    
    coef_plot.set_xticks(()); coef_plot.set_yticks(()) # remove ticks
    coef_plot.set_xlabel(f'Class {i}')

plt.suptitle('Coefficients for various classes');
coef = clf.coef_.copy()
scale = np.abs(coef).max()
plt.figure(figsize=(10,5))

for i in range(10): # 0-9
    coef_plot = plt.subplot(2, 5, i + 1) # 2x5 plot

    coef_plot.imshow(coef[i].reshape(8,8), 
                     cmap=plt.cm.RdBu,
                     vmin=-scale, vmax=scale,
                    interpolation='bilinear')
    
    coef_plot.set_xticks(()); coef_plot.set_yticks(()) # remove ticks
    coef_plot.set_xlabel(f'Class {i}')

plt.suptitle('Coefficients for various classes');

Prediction and scoring¶

Now predict on unknown dataset and compare with ground truth

In [18]:

Copied!

print(clf.predict(X_test[0:9]))
print(y_test[0:9])
print(clf.predict(X_test[0:9]))
print(y_test[0:9])

[2 8 2 6 6 7 1 9 8]
[2 8 2 6 6 7 1 9 8]

Score against training and test data

In [19]:

Copied!

clf.score(X_train, y_train) # training score
clf.score(X_train, y_train) # training score

Out[19]:

1.0

In [20]:

Copied!

score = clf.score(X_test, y_test) # test score
score
score = clf.score(X_test, y_test) # test score
score

Out[20]:

0.9555555555555556

Test score: 0.9555

Confusion matrix¶

In [21]:

Copied!

from sklearn import metrics
from sklearn import metrics

In [22]:

Copied!





predictions = clf.predict(X_test)

cm = metrics.confusion_matrix(y_true=y_test, 
                         y_pred = predictions, 
                        labels = clf.classes_)
cm
predictions = clf.predict(X_test)

cm = metrics.confusion_matrix(y_true=y_test, 
                         y_pred = predictions, 
                        labels = clf.classes_)
cm

Out[22]:

array([[37,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0, 40,  0,  0,  0,  0,  0,  0,  2,  1],
       [ 0,  0, 42,  2,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0, 43,  0,  0,  0,  0,  1,  1],
       [ 0,  0,  0,  0, 37,  0,  0,  1,  0,  0],
       [ 0,  0,  0,  0,  0, 46,  0,  0,  0,  2],
       [ 0,  1,  0,  0,  0,  0, 51,  0,  0,  0],
       [ 0,  0,  0,  1,  1,  0,  0, 46,  0,  0],
       [ 0,  3,  1,  0,  0,  0,  0,  0, 43,  1],
       [ 0,  0,  0,  0,  0,  1,  0,  0,  1, 45]])

Visualize confusion matrix as a heatmap

In [23]:

Copied!





import seaborn as sns

plt.figure(figsize=(10,10))
sns.heatmap(cm, annot=True, 
            linewidths=.5, square = True, cmap = 'Blues_r');

plt.ylabel('Actual label')
plt.xlabel('Predicted label')
all_sample_title = 'Accuracy Score: {0}'.format(score)
plt.title(all_sample_title);
import seaborn as sns

plt.figure(figsize=(10,10))
sns.heatmap(cm, annot=True, 
            linewidths=.5, square = True, cmap = 'Blues_r');

plt.ylabel('Actual label')
plt.xlabel('Predicted label')
all_sample_title = 'Accuracy Score: {0}'.format(score)
plt.title(all_sample_title);

Inspecting misclassified images¶

We compare predictions with labels to find which images are wrongly classified, then display them.

In [24]:

Copied!





index = 0
misclassified_images = []
for label, predict in zip(y_test, predictions):
    if label != predict: 
        misclassified_images.append(index)
    index +=1
index = 0
misclassified_images = []
for label, predict in zip(y_test, predictions):
    if label != predict: 
        misclassified_images.append(index)
    index +=1

In [25]:

Copied!

print(misclassified_images)
print(misclassified_images)

[56, 94, 118, 124, 130, 169, 181, 196, 213, 251, 315, 325, 331, 335, 378, 398, 425, 429, 430, 440]

In [26]:

Copied!





plt.figure(figsize=(10,10))
plt.suptitle('Misclassifications');

for plot_index, bad_index in enumerate(misclassified_images[0:20]):
    p = plt.subplot(4,5, plot_index+1) # 4x5 plot
    
    p.imshow(X_test[bad_index].reshape(8,8), cmap=plt.cm.gray,
            interpolation='bilinear')
    p.set_xticks(()); p.set_yticks(()) # remove ticks
    
    p.set_title(f'Pred: {predictions[bad_index]}, Actual: {y_test[bad_index]}');
plt.figure(figsize=(10,10))
plt.suptitle('Misclassifications');

for plot_index, bad_index in enumerate(misclassified_images[0:20]):
    p = plt.subplot(4,5, plot_index+1) # 4x5 plot
    
    p.imshow(X_test[bad_index].reshape(8,8), cmap=plt.cm.gray,
            interpolation='bilinear')
    p.set_xticks(()); p.set_yticks(()) # remove ticks
    
    p.set_title(f'Pred: {predictions[bad_index]}, Actual: {y_test[bad_index]}');

Predicting on full MNIST database¶

In the previous section, we worked with as tiny subset. In this section, we will download and play with the full MNIST dataset. Downloading for the first time from open ml db takes me about half a minute. Since this dataset is cached locally, subsequent runs should not take as much.

In [7]:

Copied!

%%time
from sklearn.datasets import fetch_openml
mnist = fetch_openml(data_id=554) # https://www.openml.org/d/554
%%time
from sklearn.datasets import fetch_openml
mnist = fetch_openml(data_id=554) # https://www.openml.org/d/554

CPU times: user 15.3 s, sys: 348 ms, total: 15.6 s
Wall time: 15.6 s

In [8]:

Copied!

type(mnist)
type(mnist)

Out[8]:

sklearn.utils.Bunch

In [9]:

Copied!

type(mnist.data), type(mnist.categories), type(mnist.feature_names), type(mnist.target)
type(mnist.data), type(mnist.categories), type(mnist.feature_names), type(mnist.target)

Out[9]:

(numpy.ndarray, dict, list, numpy.ndarray)

In [10]:

Copied!

mnist.data.shape, mnist.target.shape
mnist.data.shape, mnist.target.shape

Out[10]:

((70000, 784), (70000,))

There are 70,000 images, each of dimension 28x28 pixels.

Preview some images¶

In [11]:

Copied!

mnist.target[0]
mnist.target[0]

Out[11]:

'5'

In [12]:

Copied!





plt.figure(figsize=(20,4))
for index, (image, label) in enumerate(zip(mnist.data[0:5], 
                                           mnist.target[0:5])):
    plt.subplot(1, 5, index + 1)
    plt.imshow(np.reshape(image, (28,28)), cmap=plt.cm.gray)
    plt.title('Training: ' + label, fontsize = 20);
plt.figure(figsize=(20,4))
for index, (image, label) in enumerate(zip(mnist.data[0:5], 
                                           mnist.target[0:5])):
    plt.subplot(1, 5, index + 1)
    plt.imshow(np.reshape(image, (28,28)), cmap=plt.cm.gray)
    plt.title('Training: ' + label, fontsize = 20);

Split into training and test¶

In [13]:

Copied!

mnist.target.astype('int')
mnist.target.astype('int')

Out[13]:

array([5, 0, 4, ..., 4, 5, 6])

In [14]:

Copied!





from sklearn.model_selection import train_test_split
X2_train, X2_test, y2_train, y2_test = train_test_split(mnist.data, 
                                                    mnist.target.astype('int'), #targets str to int convert
                                                   test_size=1/7.0,
                                                   random_state=0)
from sklearn.model_selection import train_test_split
X2_train, X2_test, y2_train, y2_test = train_test_split(mnist.data, 
                                                    mnist.target.astype('int'), #targets str to int convert
                                                   test_size=1/7.0,
                                                   random_state=0)

In [15]:

Copied!

X2_train.shape, X2_test.shape
X2_train.shape, X2_test.shape

Out[15]:

((60000, 784), (10000, 784))

Are the different classes evenly distributed? We can find this by plotting a histogram of the labels in both test and training datasets.

In [77]:

Copied!





plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.hist(y2_train);
plt.title('Frequency of different classes - Training data');

plt.subplot(1,2,2)
plt.hist(y2_test);
plt.title('Frequency of different classes - Test data');
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.hist(y2_train);
plt.title('Frequency of different classes - Training data');

plt.subplot(1,2,2)
plt.hist(y2_test);
plt.title('Frequency of different classes - Test data');

Learning¶

In [57]:

Copied!





from sklearn.linear_model import LogisticRegression
clf2 = LogisticRegression(fit_intercept=True,
                        multi_class='auto',
                        penalty='l1', #lasso regression
                        solver='saga',
                        max_iter=1000,
                        C=50,
                        verbose=2, # output progress
                        n_jobs=5, # parallelize over 5 processes
                        tol=0.01
                         )
clf2
from sklearn.linear_model import LogisticRegression
clf2 = LogisticRegression(fit_intercept=True,
                        multi_class='auto',
                        penalty='l1', #lasso regression
                        solver='saga',
                        max_iter=1000,
                        C=50,
                        verbose=2, # output progress
                        n_jobs=5, # parallelize over 5 processes
                        tol=0.01
                         )
clf2

Out[57]:

LogisticRegression(C=50, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=1000,
                   multi_class='auto', n_jobs=5, penalty='l1',
                   random_state=None, solver='saga', tol=0.01, verbose=2,
                   warm_start=False)

Since there are 10 classes and 12 available cores, we will try to run the learning step in 5 jobs. Earlier, when I did not parallelize, the job did not finish within 1 hour, when I had to put the machine to sleep for a meeting.

In [58]:

Copied!

%%time
clf2.fit(X2_train, y2_train)
%%time
clf2.fit(X2_train, y2_train)

[Parallel(n_jobs=5)]: Using backend ThreadingBackend with 5 concurrent workers.

convergence after 47 epochs took 143 seconds
CPU times: user 9min 30s, sys: 469 ms, total: 9min 30s
Wall time: 2min 22s

[Parallel(n_jobs=5)]: Done   1 out of   1 | elapsed:  2.4min finished

Out[58]:

LogisticRegression(C=50, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=1000,
                   multi_class='auto', n_jobs=5, penalty='l1',
                   random_state=None, solver='saga', tol=0.01, verbose=2,
                   warm_start=False)

Note: Since the verbosity is set >0, the messages were printed, but they got printed on the terminal, not in the notebook.

Let us see what the classifier has learned

In [29]:

Copied!

clf2.classes_
clf2.classes_

Out[29]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [30]:

Copied!

clf2.coef_.shape
clf2.coef_.shape

Out[30]:

(10, 784)

Get the coefficients for a single class, 1 in this case:

In [59]:

Copied!

clf2.coef_[1].round(3) # prints weights for 8x8 image for class 0
clf2.coef_[1].round(3) # prints weights for 8x8 image for class 0

Out[59]:

array([ 0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   , -0.   , -0.   , -0.   , -0.   , -0.   ,
       -0.   , -0.   ,  0.   ,  0.   , -0.   , -0.   , -0.   , -0.   ,
       -0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   ,  0.   ,  0.   , -0.   , -0.   , -0.   ,
       -0.001, -0.001, -0.001,  0.   ,  0.002,  0.004,  0.001,  0.002,
        0.002,  0.001, -0.   , -0.   , -0.   , -0.   , -0.   , -0.   ,
       -0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
       -0.   , -0.   , -0.   , -0.001, -0.001, -0.002, -0.002, -0.003,
        0.001,  0.002, -0.001,  0.001,  0.002,  0.   , -0.002,  0.   ,
       -0.001, -0.001, -0.001, -0.001, -0.001, -0.   , -0.   ,  0.   ,
        0.   ,  0.   ,  0.   , -0.   ,  0.   ,  0.001,  0.   , -0.   ,
       -0.   , -0.   ,  0.   , -0.   ,  0.   ,  0.001,  0.   ,  0.   ,
       -0.   ,  0.001, -0.001,  0.001,  0.   , -0.   , -0.001, -0.001,
       -0.003, -0.002, -0.   , -0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.003,  0.001, -0.002, -0.003, -0.002, -0.003, -0.003,
       -0.002,  0.   , -0.001,  0.001, -0.001, -0.001,  0.   , -0.001,
        0.   ,  0.   ,  0.002,  0.002, -0.001, -0.002, -0.   , -0.   ,
        0.   ,  0.   , -0.   , -0.   ,  0.   ,  0.001,  0.   , -0.003,
       -0.002, -0.001,  0.   ,  0.   , -0.002, -0.002, -0.001, -0.002,
       -0.001, -0.003,  0.   , -0.001, -0.   , -0.   , -0.   ,  0.   ,
       -0.001, -0.002, -0.   , -0.   ,  0.   , -0.   , -0.   , -0.   ,
        0.   , -0.   , -0.002, -0.001, -0.003,  0.   , -0.   , -0.002,
        0.001, -0.002,  0.001, -0.003,  0.   , -0.001, -0.002, -0.   ,
        0.001,  0.   , -0.002, -0.001, -0.003, -0.002, -0.   , -0.   ,
        0.   , -0.   , -0.   , -0.   ,  0.001, -0.001, -0.002,  0.001,
       -0.002, -0.003, -0.001, -0.002, -0.   ,  0.001, -0.001,  0.   ,
       -0.003,  0.001, -0.001,  0.001, -0.002, -0.001, -0.001, -0.003,
       -0.004, -0.002, -0.   , -0.   , -0.   , -0.   , -0.   , -0.   ,
       -0.   , -0.002, -0.001,  0.001,  0.   , -0.002, -0.001, -0.001,
       -0.001, -0.   , -0.   ,  0.003,  0.001, -0.001,  0.   , -0.002,
       -0.002, -0.001, -0.001, -0.003, -0.002, -0.002, -0.   , -0.   ,
       -0.   , -0.   , -0.   , -0.001, -0.001, -0.003, -0.002, -0.001,
        0.   , -0.001, -0.001,  0.001, -0.   ,  0.002,  0.003,  0.003,
        0.002,  0.001, -0.001, -0.   , -0.002,  0.   , -0.001, -0.001,
       -0.002, -0.001, -0.   ,  0.   ,  0.   ,  0.   , -0.   , -0.   ,
       -0.001, -0.002, -0.004, -0.003, -0.002, -0.001, -0.001, -0.001,
       -0.001,  0.001,  0.003,  0.003,  0.   , -0.001,  0.   , -0.001,
       -0.   , -0.002, -0.001, -0.001, -0.001, -0.   , -0.   ,  0.   ,
        0.   , -0.   , -0.   , -0.   ,  0.   , -0.001, -0.001, -0.001,
       -0.001,  0.   , -0.001, -0.002, -0.   ,  0.002,  0.005,  0.003,
        0.   , -0.   , -0.001, -0.001, -0.001, -0.   , -0.   , -0.   ,
       -0.001, -0.   , -0.   ,  0.   ,  0.   , -0.   , -0.   ,  0.   ,
        0.001,  0.   ,  0.   , -0.001,  0.   ,  0.001, -0.004, -0.002,
        0.   ,  0.001,  0.002,  0.002,  0.001,  0.003, -0.003, -0.002,
        0.   ,  0.   , -0.001, -0.001, -0.001, -0.   , -0.   , -0.   ,
        0.   ,  0.   ,  0.   ,  0.   ,  0.001,  0.   , -0.001, -0.001,
        0.001, -0.001, -0.003, -0.002, -0.   ,  0.001,  0.004,  0.002,
        0.001, -0.   , -0.003, -0.003, -0.   , -0.001, -0.001, -0.001,
       -0.   , -0.   , -0.   , -0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
       -0.   , -0.001, -0.001, -0.   ,  0.001, -0.002, -0.003, -0.   ,
        0.001,  0.002,  0.004, -0.001,  0.003, -0.001, -0.002, -0.005,
       -0.002, -0.001, -0.001, -0.001, -0.   , -0.   , -0.   , -0.   ,
        0.   ,  0.   ,  0.   , -0.   , -0.   , -0.001, -0.001, -0.001,
       -0.001,  0.001, -0.   , -0.   , -0.001,  0.001,  0.003, -0.002,
        0.001, -0.005, -0.003, -0.003, -0.001, -0.   , -0.001, -0.001,
        0.001, -0.001, -0.   , -0.   ,  0.   ,  0.   ,  0.   , -0.   ,
       -0.   , -0.001, -0.002, -0.002, -0.001, -0.001,  0.   , -0.001,
        0.001,  0.003,  0.002,  0.001, -0.001, -0.005, -0.001, -0.   ,
       -0.   ,  0.   ,  0.   , -0.   , -0.   , -0.001,  0.   , -0.   ,
       -0.   ,  0.   , -0.   , -0.   , -0.   , -0.003, -0.005, -0.003,
        0.   ,  0.   , -0.002, -0.001,  0.   , -0.   ,  0.002, -0.001,
       -0.003,  0.   ,  0.002,  0.   ,  0.002,  0.   , -0.001, -0.   ,
        0.001,  0.002,  0.001,  0.   ,  0.   ,  0.   , -0.   , -0.   ,
       -0.001, -0.005, -0.002,  0.001, -0.001,  0.001, -0.001, -0.001,
       -0.   ,  0.002, -0.002, -0.001,  0.002, -0.001, -0.001,  0.001,
        0.001, -0.001, -0.001,  0.   ,  0.002,  0.001,  0.001,  0.   ,
        0.   , -0.   , -0.   , -0.   , -0.   , -0.005, -0.   ,  0.   ,
        0.   ,  0.002,  0.003,  0.002,  0.001, -0.001,  0.001, -0.   ,
        0.001,  0.002,  0.003,  0.001,  0.   , -0.001, -0.   ,  0.001,
        0.001,  0.001,  0.001,  0.   ,  0.   ,  0.   , -0.   ,  0.001,
        0.002,  0.003,  0.003,  0.002,  0.002, -0.001,  0.001, -0.001,
       -0.   ,  0.001,  0.002,  0.001,  0.001,  0.001,  0.003,  0.001,
        0.003,  0.003, -0.001,  0.   ,  0.003,  0.001,  0.   ,  0.   ,
        0.   ,  0.   , -0.   ,  0.   ,  0.001,  0.007,  0.003,  0.   ,
       -0.   ,  0.001, -0.002, -0.001, -0.002, -0.004, -0.001, -0.   ,
       -0.   ,  0.   ,  0.002,  0.001,  0.003,  0.002, -0.   , -0.   ,
        0.001,  0.001,  0.   ,  0.   ,  0.   ,  0.   , -0.   , -0.001,
       -0.002,  0.002,  0.001,  0.001,  0.   ,  0.001,  0.   ,  0.   ,
        0.001,  0.001, -0.001,  0.002,  0.001,  0.002, -0.   ,  0.   ,
       -0.001, -0.001, -0.001, -0.001, -0.   , -0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   , -0.001, -0.001, -0.001, -0.001, -0.   ,
       -0.001,  0.001, -0.001, -0.001, -0.001, -0.001, -0.001, -0.   ,
       -0.004,  0.001,  0.001,  0.   , -0.001, -0.001, -0.001, -0.   ,
       -0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   , -0.   ,
       -0.   , -0.001, -0.001, -0.002, -0.001, -0.003, -0.006, -0.004,
       -0.001, -0.002, -0.002, -0.003, -0.004, -0.003, -0.002, -0.002,
       -0.001, -0.   , -0.   , -0.   , -0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   , -0.   , -0.   , -0.   , -0.   , -0.   ,
       -0.   , -0.   , -0.001, -0.001, -0.002, -0.002, -0.001, -0.001,
       -0.   , -0.   , -0.001, -0.001, -0.   , -0.   , -0.   , -0.   ,
        0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   , -0.   , -0.   , -0.   , -0.   , -0.   , -0.   ,
       -0.   , -0.   , -0.   , -0.   , -0.   , -0.   , -0.   , -0.   ,
       -0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ])

convergence after 591 epochs took 1805 seconds

In [60]:

Copied!

clf2.intercept_ # for 10 classes - this is a One-vs-All classification
clf2.intercept_ # for 10 classes - this is a One-vs-All classification

Out[60]:

array([-1.11398188e-04,  1.38709472e-04,  1.16909054e-04, -2.37842193e-04,
        6.62466316e-05,  8.48133979e-04, -4.22181499e-05,  2.66499796e-04,
       -8.62715013e-04, -1.82325388e-04])

In [78]:

Copied!

clf2.n_iter_[0] # num of iterations before tolerance was reached
clf2.n_iter_[0] # num of iterations before tolerance was reached

Out[78]:

Visualize coefficients as an image¶

In [62]:

Copied!





coef = clf2.coef_.copy()
scale = np.abs(coef).max()
plt.figure(figsize=(13,7))

for i in range(10): # 0-9
    coef_plot = plt.subplot(2, 5, i + 1) # 2x5 plot

    coef_plot.imshow(coef[i].reshape(28,28), 
                     cmap=plt.cm.RdBu,
                     vmin=-scale, vmax=scale,
                    interpolation='bilinear')
    
    coef_plot.set_xticks(()); coef_plot.set_yticks(()) # remove ticks
    coef_plot.set_xlabel(f'Class {i}')

plt.suptitle('Coefficients for various classes');
coef = clf2.coef_.copy()
scale = np.abs(coef).max()
plt.figure(figsize=(13,7))

for i in range(10): # 0-9
    coef_plot = plt.subplot(2, 5, i + 1) # 2x5 plot

    coef_plot.imshow(coef[i].reshape(28,28), 
                     cmap=plt.cm.RdBu,
                     vmin=-scale, vmax=scale,
                    interpolation='bilinear')
    
    coef_plot.set_xticks(()); coef_plot.set_yticks(()) # remove ticks
    coef_plot.set_xlabel(f'Class {i}')

plt.suptitle('Coefficients for various classes');

Prediction and scoring¶

Now predict on unknown dataset and compare with ground truth

In [63]:

Copied!

print(clf2.predict(X2_test[0:9]))
print(y2_test[0:9])
print(clf2.predict(X2_test[0:9]))
print(y2_test[0:9])

[0 4 1 2 4 7 7 1 1]
[0 4 1 2 7 9 7 1 1]

Score against training and test data

In [64]:

Copied!

clf2.score(X2_train, y2_train) # training score
clf2.score(X2_train, y2_train) # training score

Out[64]:

0.9374333333333333

In [65]:

Copied!

score2 = clf2.score(X2_test, y2_test) # test score
score2
score2 = clf2.score(X2_test, y2_test) # test score
score2

Out[65]:

0.9191

Test Score: 0.9191 or 91%

Confusion matrix¶

In [46]:

Copied!

from sklearn import metrics
from sklearn import metrics

In [66]:

Copied!





predictions2 = clf2.predict(X2_test)

cm = metrics.confusion_matrix(y_true=y2_test, 
                         y_pred = predictions2, 
                        labels = clf2.classes_)
cm
predictions2 = clf2.predict(X2_test)

cm = metrics.confusion_matrix(y_true=y2_test, 
                         y_pred = predictions2, 
                        labels = clf2.classes_)
cm

Out[66]:

array([[ 967,    0,    1,    2,    1,    9,    9,    0,    7,    0],
       [   0, 1114,    5,    3,    1,    5,    0,    4,    7,    2],
       [   3,   13,  931,   18,   11,    1,   15,   10,   34,    4],
       [   1,    5,   33,  894,    0,   26,    2,   12,   27,   13],
       [   1,    2,    5,    1,  897,    1,   11,    9,    7,   28],
       [  10,    2,    6,   30,    9,  747,   16,    6,   30,    7],
       [   7,    3,    6,    0,   11,   18,  938,    1,    5,    0],
       [   2,    5,   13,    2,   11,    2,    1,  982,    4,   42],
       [   4,   18,    8,   18,    6,   25,    9,    2,  861,   12],
       [   3,    5,    6,   10,   35,    7,    2,   32,    9,  860]])

In [71]:

Copied!





import seaborn as sns

plt.figure(figsize=(12,12))
sns.heatmap(cm, annot=True, 
            linewidths=.5, square = True, cmap = 'Blues_r', fmt='0.4g');

plt.ylabel('Actual label')
plt.xlabel('Predicted label')
all_sample_title = 'Accuracy Score: {0}'.format(score2)
plt.title(all_sample_title);
import seaborn as sns

plt.figure(figsize=(12,12))
sns.heatmap(cm, annot=True, 
            linewidths=.5, square = True, cmap = 'Blues_r', fmt='0.4g');

plt.ylabel('Actual label')
plt.xlabel('Predicted label')
all_sample_title = 'Accuracy Score: {0}'.format(score2)
plt.title(all_sample_title);

convergence after 5470 epochs took 10856 seconds

Conclusion¶

This notebook shows performing multi-class classification using logistic regression using one-vs-all technique. When run on MNIST DB, the best accuracy is still just 91%. There is still scope for improvement.