MNIST digits classification using logistic regression from Scikit-Learn

Digits OCR

This notebook is broadly adopted from this blog and this scikit-learn example

Logistic regression on smaller built-in subset

Load the dataset

In [1]:
from sklearn.datasets import load_digits
digits = load_digits()
In [2]:
type(digits.data)
Out[2]:
numpy.ndarray
In [3]:
(digits.data.shape, digits.target.shape, digits.images.shape)
Out[3]:
((1797, 64), (1797,), (1797, 8, 8))

1797 images, each 8x8 in dimension and 1797 labels.

Display sample data

In [4]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [5]:
plt.figure(figsize=(20,4))
for index, (image, label) in enumerate(zip(digits.data[0:5], 
                                           digits.target[0:5])):
    plt.subplot(1, 5, index + 1)
    plt.imshow(np.reshape(image, (8,8)), cmap=plt.cm.gray)
    plt.title('Training: %i\n' % label, fontsize = 20);

Split into training and test

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(digits.data, 
                                                    digits.target,
                                                   test_size=0.25,
                                                   random_state=0)
In [7]:
X_train.shape, X_test.shape
Out[7]:
((1347, 64), (450, 64))

Learning

Refer to the Logistic reg API ref for these parameters and the guide for equations, particularly how penalties are applied.

In [6]:
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(fit_intercept=True,
                        multi_class='auto',
                        penalty='l2', #ridge regression
                        solver='saga',
                        max_iter=10000,
                        C=50)
clf
Out[6]:
LogisticRegression(C=50, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='saga', tol=0.0001, verbose=0,
                   warm_start=False)
In [9]:
%%time
clf.fit(X_train, y_train)
CPU times: user 6.81 s, sys: 9.52 ms, total: 6.81 s
Wall time: 6.82 s
Out[9]:
LogisticRegression(C=50, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='saga', tol=0.0001, verbose=0,
                   warm_start=False)

Let us see what the classifier has learned

In [10]:
clf.classes_
Out[10]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [11]:
clf.coef_.shape
Out[11]:
(10, 64)
In [12]:
clf.coef_[0].round(2) # prints weights for 8x8 image for class 0
Out[12]:
array([ 0.  , -0.  , -0.04,  0.1 ,  0.06, -0.14, -0.16, -0.02, -0.  ,
       -0.03, -0.04,  0.2 ,  0.09,  0.08, -0.05, -0.01, -0.  ,  0.06,
        0.15, -0.03, -0.39,  0.25,  0.09, -0.  , -0.  ,  0.13,  0.16,
       -0.18, -0.57,  0.02,  0.12, -0.  ,  0.  ,  0.16,  0.11, -0.16,
       -0.41,  0.05,  0.08,  0.  , -0.  , -0.06,  0.27, -0.11, -0.2 ,
        0.15,  0.04, -0.  , -0.  , -0.12,  0.08, -0.05,  0.2 ,  0.1 ,
       -0.04, -0.01, -0.  , -0.01, -0.09,  0.21, -0.04, -0.06, -0.1 ,
       -0.05])
In [13]:
clf.intercept_ # for 10 classes - this is a One-vs-All classification
Out[13]:
array([ 0.0010181 , -0.07236521,  0.00379207,  0.00459855,  0.04585855,
        0.00014299, -0.00442972,  0.01179654,  0.04413398, -0.03454583])
In [14]:
clf.n_iter_[0] # num of iterations before tolerance was reached
Out[14]:
1876

Viewing coefficients as an image

Since there is a coefficient for each pixel in the 8x8 image, we can view them as an image itself. The code below is similar to the original viz code, but runs on coeff.

In [16]:
coef = clf.coef_.copy()
plt.imshow(coef[0].reshape(8,8).round(2));  # proof of concept
In [17]:
coef = clf.coef_.copy()
scale = np.abs(coef).max()
plt.figure(figsize=(10,5))

for i in range(10): # 0-9
    coef_plot = plt.subplot(2, 5, i + 1) # 2x5 plot

    coef_plot.imshow(coef[i].reshape(8,8), 
                     cmap=plt.cm.RdBu,
                     vmin=-scale, vmax=scale,
                    interpolation='bilinear')
    
    coef_plot.set_xticks(()); coef_plot.set_yticks(()) # remove ticks
    coef_plot.set_xlabel(f'Class {i}')

plt.suptitle('Coefficients for various classes');

Prediction and scoring

Now predict on unknown dataset and compare with ground truth

In [18]:
print(clf.predict(X_test[0:9]))
print(y_test[0:9])
[2 8 2 6 6 7 1 9 8]
[2 8 2 6 6 7 1 9 8]

Score against training and test data

In [19]:
clf.score(X_train, y_train) # training score
Out[19]:
1.0
In [20]:
score = clf.score(X_test, y_test) # test score
score
Out[20]:
0.9555555555555556

Test score: 0.9555

Confusion matrix
In [21]:
from sklearn import metrics
In [22]:
predictions = clf.predict(X_test)

cm = metrics.confusion_matrix(y_true=y_test, 
                         y_pred = predictions, 
                        labels = clf.classes_)
cm
Out[22]:
array([[37,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0, 40,  0,  0,  0,  0,  0,  0,  2,  1],
       [ 0,  0, 42,  2,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0, 43,  0,  0,  0,  0,  1,  1],
       [ 0,  0,  0,  0, 37,  0,  0,  1,  0,  0],
       [ 0,  0,  0,  0,  0, 46,  0,  0,  0,  2],
       [ 0,  1,  0,  0,  0,  0, 51,  0,  0,  0],
       [ 0,  0,  0,  1,  1,  0,  0, 46,  0,  0],
       [ 0,  3,  1,  0,  0,  0,  0,  0, 43,  1],
       [ 0,  0,  0,  0,  0,  1,  0,  0,  1, 45]])

Visualize confusion matrix as a heatmap

In [23]:
import seaborn as sns

plt.figure(figsize=(10,10))
sns.heatmap(cm, annot=True, 
            linewidths=.5, square = True, cmap = 'Blues_r');

plt.ylabel('Actual label')
plt.xlabel('Predicted label')
all_sample_title = 'Accuracy Score: {0}'.format(score)
plt.title(all_sample_title);
Inspecting misclassified images

We compare predictions with labels to find which images are wrongly classified, then display them.

In [24]:
index = 0
misclassified_images = []
for label, predict in zip(y_test, predictions):
    if label != predict: 
        misclassified_images.append(index)
    index +=1
In [25]:
print(misclassified_images)
[56, 94, 118, 124, 130, 169, 181, 196, 213, 251, 315, 325, 331, 335, 378, 398, 425, 429, 430, 440]
In [26]:
plt.figure(figsize=(10,10))
plt.suptitle('Misclassifications');

for plot_index, bad_index in enumerate(misclassified_images[0:20]):
    p = plt.subplot(4,5, plot_index+1) # 4x5 plot
    
    p.imshow(X_test[bad_index].reshape(8,8), cmap=plt.cm.gray,
            interpolation='bilinear')
    p.set_xticks(()); p.set_yticks(()) # remove ticks
    
    p.set_title(f'Pred: {predictions[bad_index]}, Actual: {y_test[bad_index]}');

Predicting on full MNIST database

In the previous section, we worked with as tiny subset. In this section, we will download and play with the full MNIST dataset. Downloading for the first time from open ml db takes me about half a minute. Since this dataset is cached locally, subsequent runs should not take as much.

In [7]:
%%time
from sklearn.datasets import fetch_openml
mnist = fetch_openml(data_id=554) # https://www.openml.org/d/554
CPU times: user 15.3 s, sys: 348 ms, total: 15.6 s
Wall time: 15.6 s
In [8]:
type(mnist)
Out[8]:
sklearn.utils.Bunch
In [9]:
type(mnist.data), type(mnist.categories), type(mnist.feature_names), type(mnist.target)
Out[9]:
(numpy.ndarray, dict, list, numpy.ndarray)
In [10]:
mnist.data.shape, mnist.target.shape
Out[10]:
((70000, 784), (70000,))

There are 70,000 images, each of dimension 28x28 pixels.

Preview some images

In [11]:
mnist.target[0]
Out[11]:
'5'
In [12]:
plt.figure(figsize=(20,4))
for index, (image, label) in enumerate(zip(mnist.data[0:5], 
                                           mnist.target[0:5])):
    plt.subplot(1, 5, index + 1)
    plt.imshow(np.reshape(image, (28,28)), cmap=plt.cm.gray)
    plt.title('Training: ' + label, fontsize = 20);

Split into training and test

In [13]:
mnist.target.astype('int')
Out[13]:
array([5, 0, 4, ..., 4, 5, 6])
In [14]:
from sklearn.model_selection import train_test_split
X2_train, X2_test, y2_train, y2_test = train_test_split(mnist.data, 
                                                    mnist.target.astype('int'), #targets str to int convert
                                                   test_size=1/7.0,
                                                   random_state=0)
In [15]:
X2_train.shape, X2_test.shape
Out[15]:
((60000, 784), (10000, 784))

Are the different classes evenly distributed? We can find this by plotting a histogram of the labels in both test and training datasets.

In [77]:
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.hist(y2_train);
plt.title('Frequency of different classes - Training data');

plt.subplot(1,2,2)
plt.hist(y2_test);
plt.title('Frequency of different classes - Test data');

Learning

In [57]:
from sklearn.linear_model import LogisticRegression
clf2 = LogisticRegression(fit_intercept=True,
                        multi_class='auto',
                        penalty='l1', #lasso regression
                        solver='saga',
                        max_iter=1000,
                        C=50,
                        verbose=2, # output progress
                        n_jobs=5, # parallelize over 5 processes
                        tol=0.01
                         )
clf2
Out[57]:
LogisticRegression(C=50, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=1000,
                   multi_class='auto', n_jobs=5, penalty='l1',
                   random_state=None, solver='saga', tol=0.01, verbose=2,
                   warm_start=False)

Since there are 10 classes and 12 available cores, we will try to run the learning step in 5 jobs. Earlier, when I did not parallelize, the job did not finish within 1 hour, when I had to put the machine to sleep for a meeting.

In [58]:
%%time
clf2.fit(X2_train, y2_train)
[Parallel(n_jobs=5)]: Using backend ThreadingBackend with 5 concurrent workers.
convergence after 47 epochs took 143 seconds
CPU times: user 9min 30s, sys: 469 ms, total: 9min 30s
Wall time: 2min 22s
[Parallel(n_jobs=5)]: Done   1 out of   1 | elapsed:  2.4min finished
Out[58]:
LogisticRegression(C=50, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=1000,
                   multi_class='auto', n_jobs=5, penalty='l1',
                   random_state=None, solver='saga', tol=0.01, verbose=2,
                   warm_start=False)

Note: Since the verbosity is set >0, the messages were printed, but they got printed on the terminal, not in the notebook.

Let us see what the classifier has learned

In [29]:
clf2.classes_
Out[29]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [30]:
clf2.coef_.shape
Out[30]:
(10, 784)

Get the coefficients for a single class, 1 in this case:

In [59]:
clf2.coef_[1].round(3) # prints weights for 8x8 image for class 0
Out[59]:
array([ 0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   , -0.   , -0.   , -0.   , -0.   , -0.   ,
       -0.   , -0.   ,  0.   ,  0.   , -0.   , -0.   , -0.   , -0.   ,
       -0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   ,  0.   ,  0.   , -0.   , -0.   , -0.   ,
       -0.001, -0.001, -0.001,  0.   ,  0.002,  0.004,  0.001,  0.002,
        0.002,  0.001, -0.   , -0.   , -0.   , -0.   , -0.   , -0.   ,
       -0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
       -0.   , -0.   , -0.   , -0.001, -0.001, -0.002, -0.002, -0.003,
        0.001,  0.002, -0.001,  0.001,  0.002,  0.   , -0.002,  0.   ,
       -0.001, -0.001, -0.001, -0.001, -0.001, -0.   , -0.   ,  0.   ,
        0.   ,  0.   ,  0.   , -0.   ,  0.   ,  0.001,  0.   , -0.   ,
       -0.   , -0.   ,  0.   , -0.   ,  0.   ,  0.001,  0.   ,  0.   ,
       -0.   ,  0.001, -0.001,  0.001,  0.   , -0.   , -0.001, -0.001,
       -0.003, -0.002, -0.   , -0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.003,  0.001, -0.002, -0.003, -0.002, -0.003, -0.003,
       -0.002,  0.   , -0.001,  0.001, -0.001, -0.001,  0.   , -0.001,
        0.   ,  0.   ,  0.002,  0.002, -0.001, -0.002, -0.   , -0.   ,
        0.   ,  0.   , -0.   , -0.   ,  0.   ,  0.001,  0.   , -0.003,
       -0.002, -0.001,  0.   ,  0.   , -0.002, -0.002, -0.001, -0.002,
       -0.001, -0.003,  0.   , -0.001, -0.   , -0.   , -0.   ,  0.   ,
       -0.001, -0.002, -0.   , -0.   ,  0.   , -0.   , -0.   , -0.   ,
        0.   , -0.   , -0.002, -0.001, -0.003,  0.   , -0.   , -0.002,
        0.001, -0.002,  0.001, -0.003,  0.   , -0.001, -0.002, -0.   ,
        0.001,  0.   , -0.002, -0.001, -0.003, -0.002, -0.   , -0.   ,
        0.   , -0.   , -0.   , -0.   ,  0.001, -0.001, -0.002,  0.001,
       -0.002, -0.003, -0.001, -0.002, -0.   ,  0.001, -0.001,  0.   ,
       -0.003,  0.001, -0.001,  0.001, -0.002, -0.001, -0.001, -0.003,
       -0.004, -0.002, -0.   , -0.   , -0.   , -0.   , -0.   , -0.   ,
       -0.   , -0.002, -0.001,  0.001,  0.   , -0.002, -0.001, -0.001,
       -0.001, -0.   , -0.   ,  0.003,  0.001, -0.001,  0.   , -0.002,
       -0.002, -0.001, -0.001, -0.003, -0.002, -0.002, -0.   , -0.   ,
       -0.   , -0.   , -0.   , -0.001, -0.001, -0.003, -0.002, -0.001,
        0.   , -0.001, -0.001,  0.001, -0.   ,  0.002,  0.003,  0.003,
        0.002,  0.001, -0.001, -0.   , -0.002,  0.   , -0.001, -0.001,
       -0.002, -0.001, -0.   ,  0.   ,  0.   ,  0.   , -0.   , -0.   ,
       -0.001, -0.002, -0.004, -0.003, -0.002, -0.001, -0.001, -0.001,
       -0.001,  0.001,  0.003,  0.003,  0.   , -0.001,  0.   , -0.001,
       -0.   , -0.002, -0.001, -0.001, -0.001, -0.   , -0.   ,  0.   ,
        0.   , -0.   , -0.   , -0.   ,  0.   , -0.001, -0.001, -0.001,
       -0.001,  0.   , -0.001, -0.002, -0.   ,  0.002,  0.005,  0.003,
        0.   , -0.   , -0.001, -0.001, -0.001, -0.   , -0.   , -0.   ,
       -0.001, -0.   , -0.   ,  0.   ,  0.   , -0.   , -0.   ,  0.   ,
        0.001,  0.   ,  0.   , -0.001,  0.   ,  0.001, -0.004, -0.002,
        0.   ,  0.001,  0.002,  0.002,  0.001,  0.003, -0.003, -0.002,
        0.   ,  0.   , -0.001, -0.001, -0.001, -0.   , -0.   , -0.   ,
        0.   ,  0.   ,  0.   ,  0.   ,  0.001,  0.   , -0.001, -0.001,
        0.001, -0.001, -0.003, -0.002, -0.   ,  0.001,  0.004,  0.002,
        0.001, -0.   , -0.003, -0.003, -0.   , -0.001, -0.001, -0.001,
       -0.   , -0.   , -0.   , -0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
       -0.   , -0.001, -0.001, -0.   ,  0.001, -0.002, -0.003, -0.   ,
        0.001,  0.002,  0.004, -0.001,  0.003, -0.001, -0.002, -0.005,
       -0.002, -0.001, -0.001, -0.001, -0.   , -0.   , -0.   , -0.   ,
        0.   ,  0.   ,  0.   , -0.   , -0.   , -0.001, -0.001, -0.001,
       -0.001,  0.001, -0.   , -0.   , -0.001,  0.001,  0.003, -0.002,
        0.001, -0.005, -0.003, -0.003, -0.001, -0.   , -0.001, -0.001,
        0.001, -0.001, -0.   , -0.   ,  0.   ,  0.   ,  0.   , -0.   ,
       -0.   , -0.001, -0.002, -0.002, -0.001, -0.001,  0.   , -0.001,
        0.001,  0.003,  0.002,  0.001, -0.001, -0.005, -0.001, -0.   ,
       -0.   ,  0.   ,  0.   , -0.   , -0.   , -0.001,  0.   , -0.   ,
       -0.   ,  0.   , -0.   , -0.   , -0.   , -0.003, -0.005, -0.003,
        0.   ,  0.   , -0.002, -0.001,  0.   , -0.   ,  0.002, -0.001,
       -0.003,  0.   ,  0.002,  0.   ,  0.002,  0.   , -0.001, -0.   ,
        0.001,  0.002,  0.001,  0.   ,  0.   ,  0.   , -0.   , -0.   ,
       -0.001, -0.005, -0.002,  0.001, -0.001,  0.001, -0.001, -0.001,
       -0.   ,  0.002, -0.002, -0.001,  0.002, -0.001, -0.001,  0.001,
        0.001, -0.001, -0.001,  0.   ,  0.002,  0.001,  0.001,  0.   ,
        0.   , -0.   , -0.   , -0.   , -0.   , -0.005, -0.   ,  0.   ,
        0.   ,  0.002,  0.003,  0.002,  0.001, -0.001,  0.001, -0.   ,
        0.001,  0.002,  0.003,  0.001,  0.   , -0.001, -0.   ,  0.001,
        0.001,  0.001,  0.001,  0.   ,  0.   ,  0.   , -0.   ,  0.001,
        0.002,  0.003,  0.003,  0.002,  0.002, -0.001,  0.001, -0.001,
       -0.   ,  0.001,  0.002,  0.001,  0.001,  0.001,  0.003,  0.001,
        0.003,  0.003, -0.001,  0.   ,  0.003,  0.001,  0.   ,  0.   ,
        0.   ,  0.   , -0.   ,  0.   ,  0.001,  0.007,  0.003,  0.   ,
       -0.   ,  0.001, -0.002, -0.001, -0.002, -0.004, -0.001, -0.   ,
       -0.   ,  0.   ,  0.002,  0.001,  0.003,  0.002, -0.   , -0.   ,
        0.001,  0.001,  0.   ,  0.   ,  0.   ,  0.   , -0.   , -0.001,
       -0.002,  0.002,  0.001,  0.001,  0.   ,  0.001,  0.   ,  0.   ,
        0.001,  0.001, -0.001,  0.002,  0.001,  0.002, -0.   ,  0.   ,
       -0.001, -0.001, -0.001, -0.001, -0.   , -0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   , -0.001, -0.001, -0.001, -0.001, -0.   ,
       -0.001,  0.001, -0.001, -0.001, -0.001, -0.001, -0.001, -0.   ,
       -0.004,  0.001,  0.001,  0.   , -0.001, -0.001, -0.001, -0.   ,
       -0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   , -0.   ,
       -0.   , -0.001, -0.001, -0.002, -0.001, -0.003, -0.006, -0.004,
       -0.001, -0.002, -0.002, -0.003, -0.004, -0.003, -0.002, -0.002,
       -0.001, -0.   , -0.   , -0.   , -0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   , -0.   , -0.   , -0.   , -0.   , -0.   ,
       -0.   , -0.   , -0.001, -0.001, -0.002, -0.002, -0.001, -0.001,
       -0.   , -0.   , -0.001, -0.001, -0.   , -0.   , -0.   , -0.   ,
        0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   , -0.   , -0.   , -0.   , -0.   , -0.   , -0.   ,
       -0.   , -0.   , -0.   , -0.   , -0.   , -0.   , -0.   , -0.   ,
       -0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ])
convergence after 591 epochs took 1805 seconds
In [60]:
clf2.intercept_ # for 10 classes - this is a One-vs-All classification
Out[60]:
array([-1.11398188e-04,  1.38709472e-04,  1.16909054e-04, -2.37842193e-04,
        6.62466316e-05,  8.48133979e-04, -4.22181499e-05,  2.66499796e-04,
       -8.62715013e-04, -1.82325388e-04])
In [78]:
clf2.n_iter_[0] # num of iterations before tolerance was reached
Out[78]:
47
Visualize coefficients as an image
In [62]:
coef = clf2.coef_.copy()
scale = np.abs(coef).max()
plt.figure(figsize=(13,7))

for i in range(10): # 0-9
    coef_plot = plt.subplot(2, 5, i + 1) # 2x5 plot

    coef_plot.imshow(coef[i].reshape(28,28), 
                     cmap=plt.cm.RdBu,
                     vmin=-scale, vmax=scale,
                    interpolation='bilinear')
    
    coef_plot.set_xticks(()); coef_plot.set_yticks(()) # remove ticks
    coef_plot.set_xlabel(f'Class {i}')

plt.suptitle('Coefficients for various classes');

Prediction and scoring

Now predict on unknown dataset and compare with ground truth

In [63]:
print(clf2.predict(X2_test[0:9]))
print(y2_test[0:9])
[0 4 1 2 4 7 7 1 1]
[0 4 1 2 7 9 7 1 1]

Score against training and test data

In [64]:
clf2.score(X2_train, y2_train) # training score
Out[64]:
0.9374333333333333
In [65]:
score2 = clf2.score(X2_test, y2_test) # test score
score2
Out[65]:
0.9191

Test Score: 0.9191 or 91%

Confusion matrix

In [46]:
from sklearn import metrics
In [66]:
predictions2 = clf2.predict(X2_test)

cm = metrics.confusion_matrix(y_true=y2_test, 
                         y_pred = predictions2, 
                        labels = clf2.classes_)
cm
Out[66]:
array([[ 967,    0,    1,    2,    1,    9,    9,    0,    7,    0],
       [   0, 1114,    5,    3,    1,    5,    0,    4,    7,    2],
       [   3,   13,  931,   18,   11,    1,   15,   10,   34,    4],
       [   1,    5,   33,  894,    0,   26,    2,   12,   27,   13],
       [   1,    2,    5,    1,  897,    1,   11,    9,    7,   28],
       [  10,    2,    6,   30,    9,  747,   16,    6,   30,    7],
       [   7,    3,    6,    0,   11,   18,  938,    1,    5,    0],
       [   2,    5,   13,    2,   11,    2,    1,  982,    4,   42],
       [   4,   18,    8,   18,    6,   25,    9,    2,  861,   12],
       [   3,    5,    6,   10,   35,    7,    2,   32,    9,  860]])
In [71]:
import seaborn as sns

plt.figure(figsize=(12,12))
sns.heatmap(cm, annot=True, 
            linewidths=.5, square = True, cmap = 'Blues_r', fmt='0.4g');

plt.ylabel('Actual label')
plt.xlabel('Predicted label')
all_sample_title = 'Accuracy Score: {0}'.format(score2)
plt.title(all_sample_title);
convergence after 5470 epochs took 10856 seconds

Conclusion

This notebook shows performing multi-class classification using logistic regression using one-vs-all technique. When run on MNIST DB, the best accuracy is still just 91%. There is still scope for improvement.