Building a CIFAR classifier neural network with PyTorch

Building a CIFAR classifier neural network with PyTorch - 图1

PyTorch 101, Part 2: Building Your First Neural Network

PyTorch

PyTorch 101, Part 2: Building Your First Neural Network

In this part, we will implement a neural network to classify CIFAR-10 images. We cover implementing the neural network, data loading pipeline and a decaying learning rate schedule.

  • Building a CIFAR classifier neural network with PyTorch - 图2

    Ayoosh Kathuria


Currently, a research assistant at IIIT-Delhi working on representation learning in Deep RL. Ex - Mathworks, DRDO.
More posts by Ayoosh Kathuria.
Building a CIFAR classifier neural network with PyTorch - 图3

Ayoosh Kathuria

17 Jun 2019 • 13 min read

Building a CIFAR classifier neural network with PyTorch - 图4

In this article, we will discuss how to use PyTorch to build custom neural network architectures, and how to configure your training loop. We will implement a ResNet to classify images from the CIFAR-10 Dataset.

Before, we begin, let me say that the purpose of this tutorial is not to achieve the best possible accuracy on the task, but to show you how to use PyTorch.

Let me also remind you that this is the Part 2 of the our tutorial series on PyTorch. Reading the first part, though not necessary for this article, is highly recommended.

  1. Understanding Graphs, Automatic Differentiation and Autograd
  2. Building Your First Neural Network
  3. Going Deep with PyTorch
  4. Memory Management and Using Multiple GPUs
  5. Understanding Hooks

You can get all the code in this post, (and other posts as well) in the Github repo here.


In this post, we will cover

  1. How to build neural networks using nn.Module class
  2. How to build custom data input pipelines with data augmentation using Dataset and Dataloader classes.
  3. How to configure your learning rate with different learning rate schedules
  4. Training a Resnet bases image classifier to classify images from the CIFAR-10 dataset.

Prerequisites

  1. Chain rule
  2. Basic Understanding of Deep Learning
  3. PyTorch 1.0
  4. Part 1 of this tutorial

You can get all the code in this post, (and other posts as well) in the Github repo here.

A Simple Neural Network

In this tutorial, we will be implementing a very simple neural network.

Building a CIFAR classifier neural network with PyTorch - 图5

Diagram of the Network

Building the Network

The torch.nn module is the cornerstone of designing neural networks in PyTorch. This class can be used to implement a layer like a fully connected layer, a convolutional layer, a pooling layer, an activation function, and also an entire neural network by instantiating a torch.nn.Module object. (From now on, I’ll refer to it as merely nn.module)

Multiple nn.Module objects can be strung together to form a bigger nn.Module object, which is how we can implement a neural network using many layers. In fact, nn.Module can be used to represent an arbitrary function f in PyTorch.

The nn.Module class has two methods that you have to override.

  1. __init__ function. This function is invoked when you create an instance of the nn.Module. Here you will define the various parameters of a layer such as filters, kernel size for a convolutional layer, dropout probability for the dropout layer.

  2. forward function. This is where you define how your output is computed. This function doesn’t need to be explicitly called, and can be run by just calling the nn.Module instance like a function with the input as it’s argument.

    Very simple layer that just multiplies the input by a number


class MyLayer(nn.Module):
def init(self, param):
super().init()
self.param = param
def forward(self, x):
return x * self.param
myLayerObject = MyLayer(5)
output = myLayerObject(torch.Tensor([5, 4, 3]) ) #calling forward inexplicitly
print(output)

Another widely used and important class is the nn.Sequential class. When initiating this class we can pass a list of nn.Module objects in a particular sequence. The object returned by nn.Sequential is itself a nn.Module object. When this object is run with an input, it sequentially runs the input through all the nn.Module object we passed to it, in the very same order as we passed them.

  1. combinedNetwork = nn.Sequential(MyLayer(5), MyLayer(10))
  2. output = combinedNetwork([3,4])
  3. #equivalent to..
  4. # out = MyLayer(5)([3,4])
  5. # out = MyLayer(10)(out)

Let us now start implementing our classification network. We will make use of convolutional and pooling layers, as well as a custom implemented residual block.

Building a CIFAR classifier neural network with PyTorch - 图6

Diagram of the Residual Block

While PyTorch provided many layers out of the box with it’s torch.nn module, we will have to implement the residual block ourselves. Before implementing the neural network, we implement the ResNet Block.

  1. class ResidualBlock(nn.Module):
  2. def __init__(self, in_channels, out_channels, stride=1):
  3. super(ResidualBlock, self).__init__()
  4. # Conv Layer 1
  5. self.conv1 = nn.Conv2d(
  6. in_channels=in_channels, out_channels=out_channels,
  7. kernel_size=(3, 3), stride=stride, padding=1, bias=False
  8. )
  9. self.bn1 = nn.BatchNorm2d(out_channels)
  10. # Conv Layer 2
  11. self.conv2 = nn.Conv2d(
  12. in_channels=out_channels, out_channels=out_channels,
  13. kernel_size=(3, 3), stride=1, padding=1, bias=False
  14. )
  15. self.bn2 = nn.BatchNorm2d(out_channels)
  16. # Shortcut connection to downsample residual
  17. # In case the output dimensions of the residual block is not the same
  18. # as it's input, have a convolutional layer downsample the layer
  19. # being bought forward by approporate striding and filters
  20. self.shortcut = nn.Sequential()
  21. if stride != 1 or in_channels != out_channels:
  22. self.shortcut = nn.Sequential(
  23. nn.Conv2d(
  24. in_channels=in_channels, out_channels=out_channels,
  25. kernel_size=(1, 1), stride=stride, bias=False
  26. ),
  27. nn.BatchNorm2d(out_channels)
  28. )
  29. def forward(self, x):
  30. out = nn.ReLU()(self.bn1(self.conv1(x)))
  31. out = self.bn2(self.conv2(out))
  32. out += self.shortcut(x)
  33. out = nn.ReLU()(out)
  34. return out

As you see, we define the layers, or the components of our network in the __init__ function. In the forward function, how are we going to string together these components to compute the output from our input.

Now, we can define our full network.

  1. class ResNet(nn.Module):
  2. def __init__(self, num_classes=10):
  3. super(ResNet, self).__init__()
  4. # Initial input conv
  5. self.conv1 = nn.Conv2d(
  6. in_channels=3, out_channels=64, kernel_size=(3, 3),
  7. stride=1, padding=1, bias=False
  8. )
  9. self.bn1 = nn.BatchNorm2d(64)
  10. # Create blocks
  11. self.block1 = self._create_block(64, 64, stride=1)
  12. self.block2 = self._create_block(64, 128, stride=2)
  13. self.block3 = self._create_block(128, 256, stride=2)
  14. self.block4 = self._create_block(256, 512, stride=2)
  15. self.linear = nn.Linear(512, num_classes)
  16. # A block is just two residual blocks for ResNet18
  17. def _create_block(self, in_channels, out_channels, stride):
  18. return nn.Sequential(
  19. ResidualBlock(in_channels, out_channels, stride),
  20. ResidualBlock(out_channels, out_channels, 1)
  21. )
  22. def forward(self, x):
  23. # Output of one layer becomes input to the next
  24. out = nn.ReLU()(self.bn1(self.conv1(x)))
  25. out = self.stage1(out)
  26. out = self.stage2(out)
  27. out = self.stage3(out)
  28. out = self.stage4(out)
  29. out = nn.AvgPool2d(4)(out)
  30. out = out.view(out.size(0), -1)
  31. out = self.linear(out)
  32. return out

Input Format

Now that we have our network object, we turn our focus to the input. We come across different types of input while working with Deep Learning. Images, audio or high dimensional structural data.

The kind of data we are dealing with will dictate what input we use. Generally, in PyTorch, you will realise that batch is always the first dimension. Since we are dealing with Images here, I will describe the input format required by images.

The input format for images is [B C H W]. Where B is the batch size, C are the channels, H is the height and W is the width.

The output of our neural network is gibberish right now since we have used random weights. Let us now train our network.

Loading The Data

Let us now load the data. We will be making the use of torch.utils.data.Dataset and torch.utils.data.Dataloader class for this.

We first start by downloading the CIFAR-10 dataset in the same directory as our code file.

Fire up the terminal, cd to your code directory and run the following commands.

  1. wget http://pjreddie.com/media/files/cifar.tgz
  2. tar xzf cifar.tgz

You might need to use curl if you’re on macOS or manually download it if you’re on windows.

We now read the labels of the classes present in the CIFAR dataset.

  1. data_dir = "cifar/train/"
  2. with open("cifar/labels.txt") as label_file:
  3. labels = label_file.read().split()
  4. label_mapping = dict(zip(labels, list(range(len(labels)))))

We will be reading images using PIL library. Before we write the functionality to load our data, we write a preprocessing function that does the following things.

  1. Randomly horizontally the image with a probability of 0.5

  2. Normalise the image with mean and standard deviation of CIFAR dataset

  3. Reshape it from W H C to C H W.
    def preprocess(image):
    image = np.array(image) ``` if random.random() > 0.5: image = image[::-1,:,:]

cifar_mean = np.array([0.4914, 0.4822, 0.4465]).reshape(1,1,-1) cifar_std = np.array([0.2023, 0.1994, 0.2010]).reshape(1,1,-1) image = (image - cifar_mean) / cifar_std

image = image.transpose(2,1,0) return image

  1. Normally, there are two classes PyTorch provides you in relation to build input pipelines to load data.
  2. 1. `torch.data.utils.dataset`, which we will just refer as the `dataset` class now.
  3. 2. `torch.data.utils.dataloader` , which we will just refer as the `dataloader` class now.
  4. <a name="torch.utils.data.dataset"></a>
  5. ### torch.utils.data.dataset
  6. `dataset` is a class that loads the data and returns a generator so that you iterate over it. It also lets you incorporate data augmentation techniques into the input Pipeline.
  7. If you want to create a `dataset` object for your data, you need to overload three functions.
  8. 1. `__init__` function. Here, you define things related to your dataset here. Most importantly, the location of your data. You can also define various data augmentations you want to apply.
  9. 2. `__len__` function. Here, you just return the length of the dataset.
  10. 3. `__getitem__` function. The function takes as an argument an index `i` and returns a data example. This function would be called every iteration during our training loop with a different `i` by the `dataset` object.
  11. Here is a implementation of our `dataset` object for the CIFAR dataset.

class Cifar10Dataset(torch.utils.data.Dataset): def init(self, data_dir, data_size = 0, transforms = None): files = os.listdir(data_dir) files = [os.path.join(data_dir,x) for x in files]

  1. if data_size < 0 or data_size > len(files):
  2. assert("Data size should be between 0 to number of files in the dataset")
  3. if data_size == 0:
  4. data_size = len(files)
  5. self.data_size = data_size
  6. self.files = random.sample(files, self.data_size)
  7. self.transforms = transforms
  8. def __len__(self):
  9. return self.data_size
  10. def __getitem__(self, idx):
  11. image_address = self.files[idx]
  12. image = Image.open(image_address)
  13. image = preprocess(image)
  14. label_name = image_address[:-4].split("_")[-1]
  15. label = label_mapping[label_name]
  16. image = image.astype(np.float32)
  17. if self.transforms:
  18. image = self.transforms(image)
  19. return image, label
  1. We also use the `__getitem__` function to extract the label for an image encoded in its file name.
  2. `Dataset` class allows us to incorporate the lazy data loading principle. This means instead of loading all data at once into the memory (which could be done by loading all the images in memory in the `__init__` function rather than just addresses), it only loads a data example whenever it is needed (when `__getitem__` is called).
  3. When you create an object of the `Dataset` class, you basically can iterate over the object as you would over any python iterable. Each iteration, `__getitem__` with the incremented index `i` as its input argument.
  4. <a name="99912399"></a>
  5. ### Data Augmentations
  6. I've passed a `transforms` argument in the `__init__` function as well. This can be any python function that does data augmentation. While you can do the data augmentation right inside your preprocess code, doing it inside the `__getitem__` is just a matter of taste.
  7. Here, we can also add data augmentation. These data augmentations can be implemented as either functions or classes. You just have to make sure that you are able to apply them to your desired outcome in the `__getitem__` function.
  8. We have a plethora of data augmentation libraries that can be used to augment data.
  9. For our case, `torchvision` library provides a lot of pre-built transforms along with the ability to compose them into one bigger transform. But we are going to keep our discussion limited to PyTorch here.
  10. <a name="torch.utils.data.Dataloader"></a>
  11. ### torch.utils.data.Dataloader
  12. The `Dataloader` class facilitates
  13. 1. Batching of Data
  14. 2. Shuffling of Data
  15. 3. Loading multiple data at a single time using threads
  16. 4. Prefetching, that is, while GPU crunches the current batch, `Dataloader` can load the next batch into memory in meantime. This means GPU doesn't have to wait for the next batch and it speeds up training.
  17. You instantiate a `Dataloader` object with a `Dataset` object. Then you can iterate over a `Dataloader` object instance just like you did with a `dataset` instance.
  18. However you can specify various options that can let you have more control on the looping options.

trainset = Cifar10Dataset(data_dir = “cifar/train/“, transforms=None) trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

testset = Cifar10Dataset(data_dir = “cifar/test/“, transforms=None) testloader = torch.utils.data.DataLoader(testset, batch_size=128, shuffle=True, num_workers=2)

  1. Both the `trainset` and `trainloader` objects are python generator objects which can be iterated over in the following fashion.

for data in trainloader: # or trainset img, label = data

  1. However, the `Dataloader` class makes things much more convenient than `Dataset` class. While on each iteration the `Dataset` class would only return us the output of the `__getitem__` function, `Dataloader` does much more than that.
  2. 1. Notice that the our `__getitem__` method of `trainset` returns a numpy array of shape `3 x 32 x 32`. `Dataloader` batches the images into Tensor of shape `128 x 3 x 32 x 32`. (Since `batch_size` = 128 in our code).
  3. 2. Also notice that while our `__getitem__` method outputs a numpy array, `Dataloader` class automatically converts it into a `Tensor`
  4. 3. Even if the `__getitem__` method returns a object which is of non-numerical type, the `Dataloader` class turns it into a list / tuple of size `B` (128 in our case). Suppose that `__getitem__` also return the a string, namely the label string. If we set batch = 128 while instantiating the dataloader, each iteration, `Dataloader` will give us a tuple of 128 strings.
  5. Add prefetching, multiple threaded loading to above benefits, using `Dataloader` class is preferred almost every time.
  6. <a name="48a3c5e8"></a>
  7. ## Training and Evaluation
  8. Before we start writing our training loop, we need to decide our hyperparameters and our optimisation algorithms. PyTorch provides us with many pre-built optimisation algorithms through its `torch.optim` .
  9. <a name="torch.optim"></a>
  10. ### torch.optim
  11. `torch.optim` module provides you with multiple functionalities associated with training / optimisation like.
  12. 1. Different optimisation algorithms (like `optim.SGD`, `optim.Adam`)
  13. 2. Ability to schedule the learning rate (with `optim.lr_scheduler`)
  14. 3. Ability to having different learning rates for different parameters (we will not discuss this in this post though).
  15. We use a cross entropy loss, with momentum based SGD optimisation algorithm. Our learning rate is decayed by a factor of 0.1 at 150th and 200th epoch.

device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”) #Check whether a GPU is present.

clf = ResNet() clf.to(device) #Put the network on GPU if present

criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(clf.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4) scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[150, 200], gamma=0.1)

  1. In the first line of code, `device` is set to `cuda:0` if a GPU number 0 if it is present and `cpu` if not.
  2. By default, when we initialise a network, it resides on the CPU. `clf.to(device)` moves the network to GPU if present. We will cover how to use multiple GPUs in more detail in the another part. We can alternatively use `clf.cuda(0)` to move our network `clf` to GPU `0` . (Replace `0` by index of the GPU in general case)
  3. `criterion` is basically a `nn.CrossEntropy` class object which, as the name suggests, implements the cross entropy loss. It basically subclasses `nn.Module`.
  4. We then define the variable `optimizer` as an `optim.SGD` object. The first argument to `optim.SGD` is `clf.parameters()`. The `parameters()` function of a `nn.Module` object returns it's so called `parameters` (Implemented as `nn.Parameter` objects, we will learn about this class in a next part where we explore advanced PyTorch functionality. For now, think of it as a list of associated `Tensors` which are **learnable).** `clf.parameters()` are basically the weights of our neural network.
  5. As you will see in the code, we will call `step()` function on `optimizer` in our code. When `step()` is called, the optimizer updates each of the `Tensor` in `clf.parameters()` using the gradient update rule equation. The gradients are accessed by using the `grad` attribute of each `Tensor`
  6. Generally, the first argument to any optimiser whether it be SGD, Adam or RMSprop is the list of `Tensors` it is supposed to update. The rest of arguments define the various hyperparameters.
  7. `scheduler` , as the name suggests, can schedule various hyperparameters of the `optimizer`. `optimizer` is used to instantiate `scheduler`. It updates the hyperparameters everytime we call `scheduler.step()`
  8. <a name="bf50df8a"></a>
  9. ### Writing the training loop
  10. We finally train for 200 epochs. You can increase the number of epochs. This might take a while on a GPU. Again the idea of this tutorial is to show how PyTorch works and not to attain the best accuracy.
  11. We evaluate classification accuracy every epoch.

for epoch in range(10): losses = [] scheduler.step()

  1. # Train
  2. start = time.time()
  3. for batch_idx, (inputs, targets) in enumerate(trainloader):
  4. inputs, targets = inputs.to(device), targets.to(device)
  5. optimizer.zero_grad() # Zero the gradients
  6. outputs = clf(inputs) # Forward pass
  7. loss = criterion(outputs, targets) # Compute the Loss
  8. loss.backward() # Compute the Gradients
  9. optimizer.step() # Updated the weights
  10. losses.append(loss.item())
  11. end = time.time()
  12. if batch_idx % 100 == 0:
  13. print('Batch Index : %d Loss : %.3f Time : %.3f seconds ' % (batch_idx, np.mean(losses), end - start))
  14. start = time.time()
  15. # Evaluate
  16. clf.eval()
  17. total = 0
  18. correct = 0
  19. with torch.no_grad():
  20. for batch_idx, (inputs, targets) in enumerate(testloader):
  21. inputs, targets = inputs.to(device), targets.to(device)
  22. outputs = clf(inputs)
  23. _, predicted = torch.max(outputs.data, 1)
  24. total += targets.size(0)
  25. correct += predicted.eq(targets.data).cpu().sum()
  26. print('Epoch : %d Test Acc : %.3f' % (epoch, 100.*correct/total))
  27. print('--------------------------------------------------------------')
  28. clf.train()

```

Now, the above is a large chunk of code. I didn’t break it into smaller ones so as to not risk continuity. While I’ve added comments in the code to inform the reader what’s going on, I will now explain the not so trivial parts in the code.

We first call scheduler.step() at the beginning of epoch to make sure that optimizer will use the correct learning rate.

The first thing inside the loop we do is that we move our input and target to GPU 0. This should be the same device on which our model resides, otherwise PyTorch will throw up and error and halt.

Notice we call optimizer.zero_grad() before our forward pass. This is because a leaf Tensors (which are weights are) will retain the gradients from previous passes. If backward is called again on the loss, the new gradients would simply be added to the earlier gradients contained by the grad attribute. This functionality comes handy when working with RNNs, but for now, we need to set the gradients to zero so the gradients don’t accumulate between subsequent passes.

We also put our evaluation code inside torch.no_grad context, so that no graph is created for evaluation. If you find this confusing, you can go back to part 1 to refresh your autograd concepts.

Also notice, we call clf.eval() on our model before evaluation, and then clf.train() after it. A model in PyTorch has two states eval() and train(). The difference between the states is rooted in stateful layers like Batch Norm (Batch statistics in training vs population statistics in inference) and Dropout which behave different during inference and training. eval tells the nn.Module to put these layers in inference mode, while training tells nn.Module to put it in the training mode.

Conclusion

This was an exhaustive tutorial where we showed you how to build a basic training classifier. While this is only a start, we have covered all the building blocks that can let you get started with developing deep networks with PyTorch.

In the next part of this series, we will look into some of the advanced functionality present in PyTorch that will supercharge your deep learning designs. These include ways to create even more complex architectures, how to customise training such as having different learning rates for different parameters.

Further Reading

  1. PyTorch documentation
  2. More PyTorch Tutorials
  3. How to use Tensorboard with PyTorch

Add speed and simplicity to your Machine Learning workflow today

Get startedContact Sales

More in PyTorch


24 Jan 2020 – 14 min read


30 Oct 2019 – 7 min read


7 Jul 2019 – 8 min read

See all 14 posts →

Building a CIFAR classifier neural network with PyTorch - 图7

[

Tutorial

PyTorch 101, Part 3: Going Deep with PyTorch

In this tutorial, we dig deep into PyTorch’s functionality and cover advanced tasks such as using different learning rates, learning rate policies and different weight initialisations etc

](/pytorch-101-advanced/)

  • Ayoosh Kathuria
    Building a CIFAR classifier neural network with PyTorch - 图8

Ayoosh Kathuria 20 Jun 2019 • 11 min read

Building a CIFAR classifier neural network with PyTorch - 图9

[

Deep Learning

PyTorch 101, Part 1: Understanding Graphs, Automatic Differentiation and Autograd

In this article, we dive into how PyTorch’s Autograd engine performs automatic differentiation.

](/pytorch-101-understanding-graphs-and-automatic-differentiation/)

  • Ayoosh Kathuria
    Building a CIFAR classifier neural network with PyTorch - 图10

Ayoosh Kathuria 29 May 2019 • 13 min read

Paperspace Blog © 2020

Latest Posts Facebook Twitter Ghost

Try Paperspace for free

Sign Up
https://blog.paperspace.com/pytorch-101-building-neural-networks/