# Classification of hand written letters or digits using Neural Network

So
far we have learned how to run a simple tensorflow example in python. Today I
will discuss the classification problem of hand written letters or digits using
Neural Network with a single hidden layer. Will also show accuracy comparison between logistic regression and neural network.

TensorFlow does all the work by itself
and you will not understand anything if you are not familiar with the basic of
Machine Learning techniques. I suppose you know the basics, lets proceed.

Neural
Networks are the combination of linear and nonlinear techniques. The linear is
the logistic regression used with an activation function (Non Linear) which can be Sigmoid
or ReLu (Rectified Linear Unit), this activation function are also referred as
the hidden layer(s) since the output of these cannot be observed. The below figure is the neural network structure with 2 NN layers and 1 hidden layers. From input to hidden there is a logistic function and same from hidden to output, while in hidden there is an activation.

Lets start with the python code, following
are some of the functions needed to be explained.

Graph
is a tensorflow core part, everything is dependent on the graph, so all variables,
constants, statements that are set to be run on tensorflow needed to be in the
graph and upon training that graph must be explicitly mentioned so that the
training could easily do what it wants.

graph
= tf.Graph()

with
graph.as_default():

Using the above graph in session:

with tf.Session(graph=graph) as session:

Placeholder
are just like Variable in tensorflow but with no initialization required, while
Constant from the name are unchangeable variables.

tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))

tf_valid_dataset = tf.constant(valid_dataset)

Neural_Network
function does simply the logistic regression function and feeds its output to
the relu function which further feeds hidden layer output to the final layer.

def
Neural_Network(x, weights, biases):

# Hidden layer with RELU activation

layer_1 = tf.add(tf.matmul(x,
weights['h1']), biases['b1'])

hidden_layer = tf.nn.relu(layer_1)

# Output layer with linear activation

out_layer = tf.matmul(hidden_layer,
weights['out']) + biases['out']

return out_layer

To
calculate the cost/loss:

loss= tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits,
tf_train_labels))

To
optimize it and then minimize using a learning rate of 0.5:

optimizer
= tf.train.GradientDescentOptimizer(0.5).minimize(loss)

I
have used Stochastic Gradient Descent (mini batch) rather than simple batch
Gradient Descent because of its quickness in converging to the global minimum, SGD are very useful in deep learning.
You can see that the batch size is mentioned and the loss is minimized by
iterating over batches rather than the whole dataset.

Softmax
function converts the scores into probabilities.

train_prediction
= tf.nn.softmax(logits)

This
is a 2 layer neural network with 1 hidden layer. Before you run it, you must download notMNIST.pickle file. This file contains 10,000 training, 500 validation and
500 test datasets, each having 28x28 images of letters from (A to J) randomly
shuffled (The same would work for digits). The pickle file is created so that it can be loaded in the memory
easily. I assigned 3GB RAM to Ubuntu and this data size is enough, if you have
more memory or want to try with different amount of data then you might need to
create the pickle file yourselves. For
help, I would prefer you to start Deep Learning course from Udacity, or you can
go directly to this assignment to create your own pickle file.

Following
is the full code, you can run directly using Jupyter Console. Run this from the
path of the notMNIST file. You might need to download several dependencies to run the code.

from __future__ import print_function

import numpy as np

import tensorflow as tf

from six.moves import cPickle as pickle

from six.moves import range

pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:

save = pickle.load(f)

train_dataset = save['train_dataset']

train_labels = save['train_labels']

valid_dataset = save['valid_dataset']

valid_labels = save['valid_labels']

test_dataset = save['test_dataset']

test_labels = save['test_labels']

del save # hint to help gc free up memory

print('Training set', train_dataset.shape, train_labels.shape)

print('Validation set', valid_dataset.shape, valid_labels.shape)

print('Test set', test_dataset.shape, test_labels.shape)

image_size = 28

num_labels = 10

def reformat(dataset, labels):

dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)

# Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]

labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)

return dataset, labels

train_dataset, train_labels = reformat(train_dataset, train_labels)

valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)

test_dataset, test_labels = reformat(test_dataset, test_labels)

print('Training set', train_dataset.shape, train_labels.shape)

print('Validation set', valid_dataset.shape, valid_labels.shape)

print('Test set', test_dataset.shape, test_labels.shape)

# With gradient descent training, even this much data is prohibitive.

# Subset the training data for faster turnaround.

train_subset = 10000

batch_size = 100

def Neural_Network(x, weights, biases):

# Hidden layer with RELU activation

layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])

hidden_layer = tf.nn.relu(layer_1)

# Output layer with linear activation

out_layer = tf.matmul(hidden_layer, weights['out']) + biases['out']

return out_layer

graph = tf.Graph()

with graph.as_default():

# Input data. For the training data, we use a placeholder that will be fed

# at run time with a training minibatch.

tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))

tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))

tf_valid_dataset = tf.constant(valid_dataset)

tf_test_dataset = tf.constant(test_dataset)

n_hidden_1 = 1024 # 1st layer number of features

n_input = 784 # MNIST data input (img shape: 28*28)

n_classes = 10 # MNIST total classes (0-9 digits)

# Training computation.

#Store layers weight & bias

weights = {

'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),

'out': tf.Variable(tf.random_normal([n_hidden_1, n_classes]))

}

biases = {

'b1': tf.Variable(tf.random_normal([n_hidden_1])),

'out': tf.Variable(tf.random_normal([n_classes]))

}

# Construct model

logits = Neural_Network(tf_train_dataset, weights, biases)

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))

# Optimizer.

optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

# Predictions for the training, validation, and test data.

train_prediction = tf.nn.softmax(logits)

logits1 = Neural_Network(tf_valid_dataset, weights, biases)

valid_prediction = tf.nn.softmax(logits1)

logits2 = Neural_Network(tf_test_dataset, weights, biases)

test_prediction = tf.nn.softmax(logits2)

num_steps = 801

def accuracy(predictions, labels):

return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))

/ predictions.shape[0])

with tf.Session(graph=graph) as session:

tf.global_variables_initializer().run()

print("Initialized")

for step in range(num_steps):

# Pick an offset within the training data, which has been randomized.

# Note: we could use better randomization across epochs.

offset = (step * batch_size) % (train_labels.shape[0] - batch_size)

# Generate a minibatch.

batch_data = train_dataset[offset:(offset + batch_size), :]

batch_labels = train_labels[offset:(offset + batch_size), :]

# Prepare a dictionary telling the session where to feed the minibatch.

# The key of the dictionary is the placeholder node of the graph to be fed,

# and the value is the numpy array to feed to it.

feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}

_, l, predictions = session.run(

[optimizer, loss, train_prediction], feed_dict=feed_dict)

if (step % 500 == 0):

print("Minibatch loss at step %d: %f" % (step, l))

print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))

print("Validation accuracy: %.1f%%" % accuracy(

valid_prediction.eval(), valid_labels))

#print(weights['out'].eval())

print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

import numpy as np

import tensorflow as tf

from six.moves import cPickle as pickle

from six.moves import range

pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:

save = pickle.load(f)

train_dataset = save['train_dataset']

train_labels = save['train_labels']

valid_dataset = save['valid_dataset']

valid_labels = save['valid_labels']

test_dataset = save['test_dataset']

test_labels = save['test_labels']

del save # hint to help gc free up memory

print('Training set', train_dataset.shape, train_labels.shape)

print('Validation set', valid_dataset.shape, valid_labels.shape)

print('Test set', test_dataset.shape, test_labels.shape)

image_size = 28

num_labels = 10

def reformat(dataset, labels):

dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)

# Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]

labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)

return dataset, labels

train_dataset, train_labels = reformat(train_dataset, train_labels)

valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)

test_dataset, test_labels = reformat(test_dataset, test_labels)

print('Training set', train_dataset.shape, train_labels.shape)

print('Validation set', valid_dataset.shape, valid_labels.shape)

print('Test set', test_dataset.shape, test_labels.shape)

# With gradient descent training, even this much data is prohibitive.

# Subset the training data for faster turnaround.

train_subset = 10000

batch_size = 100

def Neural_Network(x, weights, biases):

# Hidden layer with RELU activation

layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])

hidden_layer = tf.nn.relu(layer_1)

# Output layer with linear activation

out_layer = tf.matmul(hidden_layer, weights['out']) + biases['out']

return out_layer

graph = tf.Graph()

with graph.as_default():

# Input data. For the training data, we use a placeholder that will be fed

# at run time with a training minibatch.

tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size))

tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))

tf_valid_dataset = tf.constant(valid_dataset)

tf_test_dataset = tf.constant(test_dataset)

n_hidden_1 = 1024 # 1st layer number of features

n_input = 784 # MNIST data input (img shape: 28*28)

n_classes = 10 # MNIST total classes (0-9 digits)

# Training computation.

#Store layers weight & bias

weights = {

'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),

'out': tf.Variable(tf.random_normal([n_hidden_1, n_classes]))

}

biases = {

'b1': tf.Variable(tf.random_normal([n_hidden_1])),

'out': tf.Variable(tf.random_normal([n_classes]))

}

# Construct model

logits = Neural_Network(tf_train_dataset, weights, biases)

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))

# Optimizer.

optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

# Predictions for the training, validation, and test data.

train_prediction = tf.nn.softmax(logits)

logits1 = Neural_Network(tf_valid_dataset, weights, biases)

valid_prediction = tf.nn.softmax(logits1)

logits2 = Neural_Network(tf_test_dataset, weights, biases)

test_prediction = tf.nn.softmax(logits2)

num_steps = 801

def accuracy(predictions, labels):

return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))

/ predictions.shape[0])

with tf.Session(graph=graph) as session:

tf.global_variables_initializer().run()

print("Initialized")

for step in range(num_steps):

# Pick an offset within the training data, which has been randomized.

# Note: we could use better randomization across epochs.

offset = (step * batch_size) % (train_labels.shape[0] - batch_size)

# Generate a minibatch.

batch_data = train_dataset[offset:(offset + batch_size), :]

batch_labels = train_labels[offset:(offset + batch_size), :]

# Prepare a dictionary telling the session where to feed the minibatch.

# The key of the dictionary is the placeholder node of the graph to be fed,

# and the value is the numpy array to feed to it.

feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}

_, l, predictions = session.run(

[optimizer, loss, train_prediction], feed_dict=feed_dict)

if (step % 500 == 0):

print("Minibatch loss at step %d: %f" % (step, l))

print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))

print("Validation accuracy: %.1f%%" % accuracy(

valid_prediction.eval(), valid_labels))

#print(weights['out'].eval())

print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

Also available on my Github repo.

I have already mentioned in the past that I would not go in detail of each of the function since it is already present at the official website of TensorFlow, my focus is to discuss the problems that most of the people usually face while using tensorflow.

One
of the question I had was

**Why cannot Logistic Regression classify the hand written problems?**
Hand
written letters or digits classification is difficult to solve using linear
approaches because of its non-linearity. To deal with non-linearity there will
be the addition of thousands of extra features in the logistic regression
equation to represent the high polynomial degree which will try to fit it
perfectly. You can try it but you will see a great difference in the accuracy
as compared to neural networks.

The
code can be customized easily, only few lines of code needs to be changed.
Replace the weights and biases with the following

weights = tf.Variable(tf.truncated_normal([image_size *
image_size, num_labels]))

biases = tf.Variable(tf.zeros([num_labels]))

Change
the Neural_Network function to this:

def Neural_Network(x, weights, biases):

layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])

return layer_1

I
got around 87% from this and 91% from neural network, that seems to be a great
improvement. Let me know how much accuracy you got from both of the versions. You
might get different in case you make your own pickle.

In
the next part I will introduce the regularization techniques to overcome over
fitting issues in neural network which will further increase the accuracy.

## 0 comments: