Deep Learning Scribbling : Deep Learning Frameworks I
Using Deep Learning Frameworks an Intro
This is a series of posts about deep learning, not how to classify Fashion MNIST but more on how to use the science and it’s tools . I will discuss Frameworks, Architecturing, Solving Problems and a Bunch of flash notes for things that we forget about , alas we are not machines .
Deep Learning Frameworks
Deep Learning frameworks are quite interesting, because they require a very hardcore feat of engineering in between providing cross-platform software, ultra-fast computation, numerical correction and most of all a PYTHON interface .
They come in two varieties Dynamic and Static . I like to separate them using another metric UX there are those that can be used , and those that can’t be used . Usage = (Time to Solve) - (Time to Fight Tool) .
Away from this comedy of emacs vs vim as it’s all about user preference (but really tensorflow ?) Let’s try to decipher how the framework is built how can it be used and how to go from architecture to code .
P.S : I didn’t mention Keras because Keras is actually the knife compared to Tensorflow the rusty chainsaw or PyTorch the scalpel .
What is Deep Learning in 1 Line .
Deep Learning is trying to approximate an unknown function using a set of examples .
In More Lines
Learning Representations
Deep Learning is a model that approximates any function by learning representations of the functions and trying to generalize from . Learning Representations is what happens when layers or neurons weights are optimized . The Linear Operation $ Y = W*X+b$ is a way to learn linear relations where as by introducing a new component the activation function that introduces non linearities to the learning process , non-linear as $ Y = Z(W*X+b) = max(0,W*X+b) $ .
Representations a/k/a Features are the characteristics that describe your input data , features are essentially Random Variables , engineering features means constructing meaningful characteristics for your inputs . Good Features are essential for a successful and easier learning t , Deep Neural Networks have this ability to learn good features by training , the subsequent stacking of layers is like a representation filter that tries to learn good representations to give to the output layer for example softmax that act as a classifier .
The hidden layers act slike a feature engineering pipeline that does what used to be a manual domain-driven task automatically and maybe better (ConvNets) .
Layers such as the Convolutional Layer are efficient representation learner that learns small patterns in parts of images . Images are Tensors mathematically but more importantly Images have a visual structure a Flat Image would be hard to understand where as a Normal Image can tell a 1000 words . The Convolution operation that essentially scans a tensor and multiplies by a Filter or Kernel will learn a specific representation from each part and also keeps the structure of the image intact , * Nose is in the center * becomes a learnable representation (more on this another time )
**N.B : ** * A Note
Getting Started with Keras
Keras is a modular wrapper around Tensorflow it’s the actual reason Tensorflow is used by so many people* . Keras let’s you build models brick by brick in the literal sense (Sequential) or by Tele-Kinesis (Functional API <3 ) .
Keras provides you with a highly friendly API to turn any architecture you have in mind to code and train and test it at the same time .
Blocks
Deep Learning requires some tools , first you have to design the network architecture whether you are using a Fully-Connected-Layer or a series of Conv -> MaxPool you need to have in mind a way to approach the problem . Generally as a rule of thumb we have these heuristics :
- CNNs for Images
- RNN, 1D-CNNs for Text
- Boosting, Random Forests or Categorical Embeddings or Wide & Deep for Structured Data
VAE, GAN , {xyz}-GAN for Generative
N.B : *
Tensorflow is the default Keras Backend
Next you need to preprocess your data , as you may know NNs love (0,1) values so you will have to often standarize your dataset .
You’ll also need to understand ** Loss Functions ** yes because you see different problems need different loss functions and the choice of your loss function may affect your convergence .
And a GPU or even better a colab
I love Google
At this point you are ready to train your neural network and watch it reach 99% accuracy on MNIST .
# KERAS SEQUENTIAL EXAMPLE FOR CLASSIFICATIOn
from keras import models,layers,datasets
(x_train,y_train),(x_test,y_test) = datasets.mnist.load_data()
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
11493376/11490434 [==============================] - 1s 0us/step
import numpy as np
from keras import utils
y_test = utils.to_categorical(y_test)
y_train = utils.to_categorical(y_train)
# NORMALIZE THE DATA
def normalizer(x):
x = x.reshape((x.shape[0],28*28))
x = x.astype('float32')
x -= x.mean()
x /= x.std()
return x
x_train = normalizer(x_train)
x_test = normalizer(x_test)
# BUILD AN MLP BY STACKING LAYER OVER LAYER
model = models.Sequential()
model.add(layers.Dense(512,input_shape=(28*28,)))
model.add(layers.Dense(396,activation="relu"))
model.add(layers.Dense(256,activation="relu"))
model.add(layers.Dense(128,activation="elu"))
model.add(layers.Dense(64,activation="elu"))
model.add(layers.Dense(32,activation="elu"))
model.add(layers.Dense(10,activation="softmax"))
# COMPILE THE MODEL
model.compile(optimizer="adam",loss="categorical_crossentropy",metrics=["accuracy"])
# FIT THE MODEL
model.fit(x_train,y_train,epochs=10,batch_size=32,validation_data=(x_test,y_test))
Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 22s 374us/step - loss: 0.2573 - acc: 0.9224 - val_loss: 0.1543 - val_acc: 0.9540
Epoch 2/10
60000/60000 [==============================] - 21s 343us/step - loss: 0.1489 - acc: 0.9561 - val_loss: 0.1494 - val_acc: 0.9594
Epoch 3/10
60000/60000 [==============================] - 21s 345us/step - loss: 0.1210 - acc: 0.9641 - val_loss: 0.1572 - val_acc: 0.9561
Epoch 4/10
60000/60000 [==============================] - 20s 341us/step - loss: 0.1052 - acc: 0.9693 - val_loss: 0.1074 - val_acc: 0.9706
Epoch 5/10
60000/60000 [==============================] - 21s 346us/step - loss: 0.0923 - acc: 0.9738 - val_loss: 0.1204 - val_acc: 0.9660
Epoch 6/10
60000/60000 [==============================] - 21s 354us/step - loss: 0.0823 - acc: 0.9768 - val_loss: 0.1306 - val_acc: 0.9656
Epoch 7/10
60000/60000 [==============================] - 21s 350us/step - loss: 0.0735 - acc: 0.9790 - val_loss: 0.1211 - val_acc: 0.9707
Epoch 8/10
60000/60000 [==============================] - 21s 349us/step - loss: 0.0735 - acc: 0.9799 - val_loss: 0.1092 - val_acc: 0.9730
Epoch 9/10
60000/60000 [==============================] - 21s 351us/step - loss: 0.0626 - acc: 0.9829 - val_loss: 0.1066 - val_acc: 0.9706
Epoch 10/10
60000/60000 [==============================] - 21s 346us/step - loss: 0.0620 - acc: 0.9834 - val_loss: 0.1070 - val_acc: 0.9709
<keras.callbacks.History at 0x7fa4672b96a0>
A Look Into PyTorch
Models as Code
PyTorch is essentially a library to build and train deep neural nets and also serves as a Numpy on GPU library .
PyTorch gives you modules (optim,nn,torchvision) that can be used collectively to write your model as code and computations will be done dynamically (no model compilation like in Tensorflow)
Let me show you an example similar to what we just did with Keras
# http://pytorch.org/
import torch
import torch.nn as nn
import torch.nn.functional as F
class NeuralNet(nn.Module):
def __init__(self):
super(NeuralNet,self).__init__()
# Linear is the affine transformation y = w*x + b
self.fc1 = nn.Linear(784,512)
self.fc2 = nn.Linear(512,396)
self.fc3 = nn.Linear(396,256)
self.fc4 = nn.Linear(256,128)
self.fc5 = nn.Linear(128,64)
self.fc6 = nn.Linear(64,32)
self.fc7 = nn.Linear(32,10)
def forward(self,x):
# The forward pass is what happens from each layer to layer in other words
# the flow of inputs trough the network
# here we describe the activation and maxpooling operations ...
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = F.relu(self.fc4(x))
x = F.elu(self.fc5(x))
x = F.elu(self.fc6(x))
x = self.fc7(x)
return x
net = NeuralNet()
print(net)
NeuralNet(
(fc1): Linear(in_features=784, out_features=512)
(fc2): Linear(in_features=512, out_features=396)
(fc3): Linear(in_features=396, out_features=256)
(fc4): Linear(in_features=256, out_features=128)
(fc5): Linear(in_features=128, out_features=64)
(fc6): Linear(in_features=64, out_features=32)
(fc7): Linear(in_features=32, out_features=10)
)
Now we built a class NeuralNetwork with a specific architecture as you may notice you can create a NeuralNetwork factory that generates different models based on inputs but let’s keep that for later .
Let’s define a train function that trains a Neural Network
P.S : PyTorch deduces and executes the backward pass (backpropagation) from the forward operation .
from torchvision import datasets, transforms, utils
normalizer = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,), (1.0,))])
# load dataset
train_set = datasets.MNIST(root='./data', train=True, transform=normalizer, download=True)
test_set = datasets.MNIST(root='./data', train=False, transform=normalizer, download=True)
batch_size = 100
train_loader = torch.utils.data.DataLoader(
dataset=train_set,
batch_size=batch_size,
shuffle=True)
test_loader = torch.utils.data.DataLoader(
dataset=test_set,
batch_size=batch_size,
shuffle=False)
Now that our data loaders (PyTorch data generators) are created we can move to implement a train function this could’ve been a method for the Neural Network defined above or just a function like we did now .
# loss function Categorical Cross Entropy
criterion = nn.CrossEntropyLoss()
# optimizer
optimizer = torch.optim.SGD(net.parameters(),lr=0.01,momentum=0.9)
def train(epochs):
for epoch in range(epochs):
# trainning
loss = 0
for batch_idx, (x, y) in enumerate(train_loader):
optimizer.zero_grad()
x = x.view(-1, 28*28)
x = torch.autograd.Variable(x)
y = torch.autograd.Variable(y)
out = net(x)
loss = criterion(out, y)
loss = loss * 0.9 + loss.data[0] * 0.1
loss.backward()
optimizer.step()
if (batch_idx+1) % 100 == 0 or (batch_idx+1) == len(train_loader):
print('==>>> epoch: {} , loss : {}'.format(epoch,loss))
train(10)
==>>> epoch: 0 , loss : Variable containing:
0.1895
[torch.FloatTensor of size 1]
==>>> epoch: 0 , loss : Variable containing:
0.2116
[torch.FloatTensor of size 1]
==>>> epoch: 0 , loss : Variable containing:
0.2514
[torch.FloatTensor of size 1]
==>>> epoch: 0 , loss : Variable containing:
0.2384
[torch.FloatTensor of size 1]
==>>> epoch: 0 , loss : Variable containing:
0.1595
[torch.FloatTensor of size 1]
==>>> epoch: 0 , loss : Variable containing:
0.1628
[torch.FloatTensor of size 1]
==>>> epoch: 1 , loss : Variable containing:
0.1872
[torch.FloatTensor of size 1]
==>>> epoch: 1 , loss : Variable containing:
0.2300
[torch.FloatTensor of size 1]
==>>> epoch: 1 , loss : Variable containing:
0.1256
[torch.FloatTensor of size 1]
==>>> epoch: 2 , loss : Variable containing:
1.00000e-02 *
2.0228
[torch.FloatTensor of size 1]
**many lines later**
==>>> epoch: 9 , loss : Variable containing:
0.1006
[torch.FloatTensor of size 1]
# test function will evaluate the neural network accuracy
def test(epoch):
net.eval()
test_loss = 0
correct = 0
acc_history = []
for data, target in test_loader:
data = data.view(-1,28*28) # view is equivalent to np.reshape
data = torch.autograd.Variable(data, volatile=True)
target = torch.autograd.Variable(target)
output = net(data)
test_loss += F.nll_loss(output, target).data[0]
pred = output.data.max(1)[1] # get the index of the max log-probability
correct += pred.eq(target.data).cpu().sum()
test_loss = test_loss
test_loss /= len(test_loader) # loss function already averages over batch size
accuracy = 100. * correct / len(test_loader.dataset)
acc_history.append(accuracy)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
accuracy))
test(10)
Test set: Average loss: -0.1730, Accuracy: 9147/10000 (91%)
Test set: Average loss: -0.1756, Accuracy: 9247/10000 (92%)
Test set: Average loss: -0.1712, Accuracy: 9344/10000 (93%)
Test set: Average loss: -0.1444, Accuracy: 9438/10000 (94%)
Test set: Average loss: -0.1326, Accuracy: 9533/10000 (95%)
Test set: Average loss: -0.1468, Accuracy: 9628/10000 (96%)
Test set: Average loss: -0.1493, Accuracy: 9728/10000 (97%)
Conclusion (for now)
This is was an appetizer to the difference between PyTorch and Keras later on we will explore how build models differ between the two and dive into the specifics of each such as how the Functional API works in Keras , and how to use Pytorch Loaders for your own data . Till then happy learning .