Deep Learning Scribbling : Deep Learning Frameworks I

Using Deep Learning Frameworks an Intro

This is a series of posts about deep learning, not how to classify Fashion MNIST but more on how to use the science and it’s tools . I will discuss Frameworks, Architecturing, Solving Problems and a Bunch of flash notes for things that we forget about , alas we are not machines .

Deep Learning Frameworks

Deep Learning frameworks are quite interesting, because they require a very hardcore feat of engineering in between providing cross-platform software, ultra-fast computation, numerical correction and most of all a PYTHON interface .

They come in two varieties Dynamic and Static . I like to separate them using another metric UX there are those that can be used , and those that can’t be used . Usage = (Time to Solve) - (Time to Fight Tool) .

Away from this comedy of emacs vs vim as it’s all about user preference (but really tensorflow ?) Let’s try to decipher how the framework is built how can it be used and how to go from architecture to code .

P.S : I didn’t mention Keras because Keras is actually the knife compared to Tensorflow the rusty chainsaw or PyTorch the scalpel .

What is Deep Learning in 1 Line .

Deep Learning is trying to approximate an unknown function using a set of examples .

In More Lines

Learning Representations

Deep Learning is a model that approximates any function by learning representations of the functions and trying to generalize from . Learning Representations is what happens when layers or neurons weights are optimized . The Linear Operation $ Y = W*X+b$ is a way to learn linear relations where as by introducing a new component the activation function that introduces non linearities to the learning process , non-linear as $ Y = Z(W*X+b) = max(0,W*X+b) $ .

Representations a/k/a Features are the characteristics that describe your input data , features are essentially Random Variables , engineering features means constructing meaningful characteristics for your inputs . Good Features are essential for a successful and easier learning t , Deep Neural Networks have this ability to learn good features by training , the subsequent stacking of layers is like a representation filter that tries to learn good representations to give to the output layer for example softmax that act as a classifier .

The hidden layers act slike a feature engineering pipeline that does what used to be a manual domain-driven task automatically and maybe better (ConvNets) .

Layers such as the Convolutional Layer are efficient representation learner that learns small patterns in parts of images . Images are Tensors mathematically but more importantly Images have a visual structure a Flat Image would be hard to understand where as a Normal Image can tell a 1000 words . The Convolution operation that essentially scans a tensor and multiplies by a Filter or Kernel will learn a specific representation from each part and also keeps the structure of the image intact , * Nose is in the center * becomes a learnable representation (more on this another time )

**N.B : ** * A Note

Getting Started with Keras

Keras is a modular wrapper around Tensorflow it’s the actual reason Tensorflow is used by so many people* . Keras let’s you build models brick by brick in the literal sense (Sequential) or by Tele-Kinesis (Functional API <3 ) .

Keras provides you with a highly friendly API to turn any architecture you have in mind to code and train and test it at the same time .

Blocks

Deep Learning requires some tools , first you have to design the network architecture whether you are using a Fully-Connected-Layer or a series of Conv -> MaxPool you need to have in mind a way to approach the problem . Generally as a rule of thumb we have these heuristics :

CNNs for Images
RNN, 1D-CNNs for Text
Boosting, Random Forests or Categorical Embeddings or Wide & Deep for Structured Data
VAE, GAN , {xyz}-GAN for Generative
N.B : *
Tensorflow is the default Keras Backend

Next you need to preprocess your data , as you may know NNs love (0,1) values so you will have to often standarize your dataset .

You’ll also need to understand ** Loss Functions ** yes because you see different problems need different loss functions and the choice of your loss function may affect your convergence .

And a GPU or even better a colab

I love Google

At this point you are ready to train your neural network and watch it reach 99% accuracy on MNIST .

# KERAS SEQUENTIAL EXAMPLE FOR CLASSIFICATIOn

from keras import models,layers,datasets


(x_train,y_train),(x_test,y_test) = datasets.mnist.load_data()

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
11493376/11490434 [==============================] - 1s 0us/step

import numpy as np

from keras import utils

y_test = utils.to_categorical(y_test)
y_train = utils.to_categorical(y_train)

# NORMALIZE THE DATA

def normalizer(x):
    x = x.reshape((x.shape[0],28*28))
    x = x.astype('float32')
    x -= x.mean()
    x /= x.std()
    
    return x

x_train = normalizer(x_train)
x_test = normalizer(x_test)


# BUILD AN MLP BY STACKING LAYER OVER LAYER

model = models.Sequential()
model.add(layers.Dense(512,input_shape=(28*28,)))
model.add(layers.Dense(396,activation="relu"))
model.add(layers.Dense(256,activation="relu"))
model.add(layers.Dense(128,activation="elu"))
model.add(layers.Dense(64,activation="elu"))
model.add(layers.Dense(32,activation="elu"))
model.add(layers.Dense(10,activation="softmax"))

# COMPILE THE MODEL
model.compile(optimizer="adam",loss="categorical_crossentropy",metrics=["accuracy"])

# FIT THE MODEL

model.fit(x_train,y_train,epochs=10,batch_size=32,validation_data=(x_test,y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 22s 374us/step - loss: 0.2573 - acc: 0.9224 - val_loss: 0.1543 - val_acc: 0.9540
Epoch 2/10
60000/60000 [==============================] - 21s 343us/step - loss: 0.1489 - acc: 0.9561 - val_loss: 0.1494 - val_acc: 0.9594
Epoch 3/10
60000/60000 [==============================] - 21s 345us/step - loss: 0.1210 - acc: 0.9641 - val_loss: 0.1572 - val_acc: 0.9561
Epoch 4/10
60000/60000 [==============================] - 20s 341us/step - loss: 0.1052 - acc: 0.9693 - val_loss: 0.1074 - val_acc: 0.9706
Epoch 5/10
60000/60000 [==============================] - 21s 346us/step - loss: 0.0923 - acc: 0.9738 - val_loss: 0.1204 - val_acc: 0.9660
Epoch 6/10
60000/60000 [==============================] - 21s 354us/step - loss: 0.0823 - acc: 0.9768 - val_loss: 0.1306 - val_acc: 0.9656
Epoch 7/10
60000/60000 [==============================] - 21s 350us/step - loss: 0.0735 - acc: 0.9790 - val_loss: 0.1211 - val_acc: 0.9707
Epoch 8/10
60000/60000 [==============================] - 21s 349us/step - loss: 0.0735 - acc: 0.9799 - val_loss: 0.1092 - val_acc: 0.9730
Epoch 9/10
60000/60000 [==============================] - 21s 351us/step - loss: 0.0626 - acc: 0.9829 - val_loss: 0.1066 - val_acc: 0.9706
Epoch 10/10
60000/60000 [==============================] - 21s 346us/step - loss: 0.0620 - acc: 0.9834 - val_loss: 0.1070 - val_acc: 0.9709





<keras.callbacks.History at 0x7fa4672b96a0>

A Look Into PyTorch

Models as Code

PyTorch is essentially a library to build and train deep neural nets and also serves as a Numpy on GPU library .

PyTorch gives you modules (optim,nn,torchvision) that can be used collectively to write your model as code and computations will be done dynamically (no model compilation like in Tensorflow)

Let me show you an example similar to what we just did with Keras

# http://pytorch.org/
import torch

import torch.nn as nn
import torch.nn.functional as F

class NeuralNet(nn.Module):
    
    def __init__(self):
        super(NeuralNet,self).__init__()
        
        # Linear is the affine transformation y = w*x + b
        self.fc1 = nn.Linear(784,512)
        self.fc2 = nn.Linear(512,396)
        self.fc3 = nn.Linear(396,256)
        self.fc4 = nn.Linear(256,128)
        self.fc5 = nn.Linear(128,64)
        self.fc6 = nn.Linear(64,32)
        self.fc7 = nn.Linear(32,10)
        
    def forward(self,x):
        
        # The forward pass is what happens from each layer to layer in other words
        # the flow of inputs trough the network
        # here we describe the activation and maxpooling operations ...
        
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.relu(self.fc4(x))
        x = F.elu(self.fc5(x))
        x = F.elu(self.fc6(x))
        x = self.fc7(x)
        
        return x

net = NeuralNet()

print(net)

NeuralNet(
  (fc1): Linear(in_features=784, out_features=512)
  (fc2): Linear(in_features=512, out_features=396)
  (fc3): Linear(in_features=396, out_features=256)
  (fc4): Linear(in_features=256, out_features=128)
  (fc5): Linear(in_features=128, out_features=64)
  (fc6): Linear(in_features=64, out_features=32)
  (fc7): Linear(in_features=32, out_features=10)
)

Now we built a class NeuralNetwork with a specific architecture as you may notice you can create a NeuralNetwork factory that generates different models based on inputs but let’s keep that for later .

Let’s define a train function that trains a Neural Network

P.S : PyTorch deduces and executes the backward pass (backpropagation) from the forward operation .

from torchvision import datasets, transforms, utils

normalizer = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,), (1.0,))])
# load dataset
train_set = datasets.MNIST(root='./data', train=True, transform=normalizer, download=True)
test_set = datasets.MNIST(root='./data', train=False, transform=normalizer, download=True)

batch_size = 100

train_loader = torch.utils.data.DataLoader(
                 dataset=train_set,
                 batch_size=batch_size,
                 shuffle=True)
test_loader = torch.utils.data.DataLoader(
                dataset=test_set,
                batch_size=batch_size,
                shuffle=False)

Now that our data loaders (PyTorch data generators) are created we can move to implement a train function this could’ve been a method for the Neural Network defined above or just a function like we did now .



# loss function Categorical Cross Entropy
criterion = nn.CrossEntropyLoss()

# optimizer
optimizer = torch.optim.SGD(net.parameters(),lr=0.01,momentum=0.9)

def train(epochs):

    for epoch in range(epochs):
        # trainning
        loss = 0
        for batch_idx, (x, y) in enumerate(train_loader):
            
            optimizer.zero_grad()
            x = x.view(-1, 28*28)
            x = torch.autograd.Variable(x)
            y = torch.autograd.Variable(y)
            out = net(x)
            loss = criterion(out, y)
            loss = loss * 0.9 + loss.data[0] * 0.1
            loss.backward()
            optimizer.step()
            if (batch_idx+1) % 100 == 0 or (batch_idx+1) == len(train_loader):
                print('==>>> epoch: {} , loss : {}'.format(epoch,loss))

train(10)

==>>> epoch: 0 , loss : Variable containing:
 0.1895
[torch.FloatTensor of size 1]

==>>> epoch: 0 , loss : Variable containing:
 0.2116
[torch.FloatTensor of size 1]

==>>> epoch: 0 , loss : Variable containing:
 0.2514
[torch.FloatTensor of size 1]

==>>> epoch: 0 , loss : Variable containing:
 0.2384
[torch.FloatTensor of size 1]

==>>> epoch: 0 , loss : Variable containing:
 0.1595
[torch.FloatTensor of size 1]

==>>> epoch: 0 , loss : Variable containing:
 0.1628
[torch.FloatTensor of size 1]

==>>> epoch: 1 , loss : Variable containing:
 0.1872
[torch.FloatTensor of size 1]

==>>> epoch: 1 , loss : Variable containing:
 0.2300
[torch.FloatTensor of size 1]

==>>> epoch: 1 , loss : Variable containing:
 0.1256
[torch.FloatTensor of size 1]


==>>> epoch: 2 , loss : Variable containing:
1.00000e-02 *
  2.0228
[torch.FloatTensor of size 1]

    **many lines later**

==>>> epoch: 9 , loss : Variable containing:
 0.1006
[torch.FloatTensor of size 1]


# test function will evaluate the neural network accuracy 

def test(epoch):
    net.eval() 
    test_loss = 0
    correct = 0
    acc_history = []    
    for data, target in test_loader:
        data = data.view(-1,28*28) # view is equivalent to np.reshape
        data = torch.autograd.Variable(data, volatile=True) 
        target = torch.autograd.Variable(target)
        
        output = net(data)
        test_loss += F.nll_loss(output, target).data[0]
        pred = output.data.max(1)[1] # get the index of the max log-probability
        correct += pred.eq(target.data).cpu().sum()

        test_loss = test_loss
        test_loss /= len(test_loader) # loss function already averages over batch size
        accuracy = 100. * correct / len(test_loader.dataset)
        acc_history.append(accuracy)
        print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
            test_loss, correct, len(test_loader.dataset),
            accuracy))

test(10)

Test set: Average loss: -0.1730, Accuracy: 9147/10000 (91%)


Test set: Average loss: -0.1756, Accuracy: 9247/10000 (92%)


Test set: Average loss: -0.1712, Accuracy: 9344/10000 (93%)


Test set: Average loss: -0.1444, Accuracy: 9438/10000 (94%)


Test set: Average loss: -0.1326, Accuracy: 9533/10000 (95%)


Test set: Average loss: -0.1468, Accuracy: 9628/10000 (96%)


Test set: Average loss: -0.1493, Accuracy: 9728/10000 (97%)

Conclusion (for now)

This is was an appetizer to the difference between PyTorch and Keras later on we will explore how build models differ between the two and dive into the specifics of each such as how the Functional API works in Keras , and how to use Pytorch Loaders for your own data . Till then happy learning .