The Image Classification Dataset

:label:sec_fashion_mnist

One of the widely used dataset for image classification is the MNIST dataset :cite:LeCun.Bottou.Bengio.ea.1998. While it had a good run as a benchmark dataset, even simple models by today’s standards achieve classification accuracy over 95%, making it unsuitable for distinguishing between stronger models and weaker ones. Today, MNIST serves as more of sanity checks than as a benchmark. To up the ante just a bit, we will focus our discussion in the coming sections on the qualitatively similar, but comparatively complex Fashion-MNIST dataset :cite:Xiao.Rasul.Vollgraf.2017, which was released in 2017.

```{.python .input} %matplotlib inline from d2l import mxnet as d2l from mxnet import gluon import sys

d2l.use_svg_display()

  1. ```{.python .input}
  2. #@tab pytorch
  3. %matplotlib inline
  4. from d2l import torch as d2l
  5. import torch
  6. import torchvision
  7. from torchvision import transforms
  8. from torch.utils import data
  9. d2l.use_svg_display()

```{.python .input}

@tab tensorflow

%matplotlib inline from d2l import tensorflow as d2l import tensorflow as tf

d2l.use_svg_display()

  1. ## Reading the Dataset
  2. We can download and read the Fashion-MNIST dataset into memory via the build-in functions in the framework.
  3. ```{.python .input}
  4. mnist_train = gluon.data.vision.FashionMNIST(train=True)
  5. mnist_test = gluon.data.vision.FashionMNIST(train=False)

```{.python .input}

@tab pytorch

ToTensor converts the image data from PIL type to 32-bit floating point

tensors. It divides all numbers by 255 so that all pixel values are between

0 and 1

trans = transforms.ToTensor() mnist_train = torchvision.datasets.FashionMNIST( root=”../data”, train=True, transform=trans, download=True) mnist_test = torchvision.datasets.FashionMNIST( root=”../data”, train=False, transform=trans, download=True)

  1. ```{.python .input}
  2. #@tab tensorflow
  3. mnist_train, mnist_test = tf.keras.datasets.fashion_mnist.load_data()

Fashion-MNIST consists of images from 10 categories, each represented by 6000 images in the training dataset and by 1000 in the test dataset. A test dataset (or test set) is used for evaluating model performance and not for training. Consequently the training set and the test set contain 60000 and 10000 images, respectively.

```{.python .input}

@tab mxnet, pytorch

len(mnist_train), len(mnist_test)

  1. ```{.python .input}
  2. #@tab tensorflow
  3. len(mnist_train[0]), len(mnist_test[0])

The height and width of each input image are both 28 pixels. Note that the dataset consists of grayscale images, whose number of channels is 1. For brevity, throughout this book we store the shape of any image with height $h$ width $w$ pixels as $h \times w$ or ($h$, $w$).

```{.python .input}

@tab all

mnist_train[0][0].shape

  1. The images in Fashion-MNIST are associated with the following categories:
  2. t-shirt, trousers, pullover, dress, coat, sandal, shirt, sneaker, bag, and ankle boot.
  3. The following function converts between numeric label indices and their names in text.
  4. ```{.python .input}
  5. #@tab all
  6. def get_fashion_mnist_labels(labels): #@save
  7. """Return text labels for the Fashion-MNIST dataset."""
  8. text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
  9. 'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
  10. return [text_labels[int(i)] for i in labels]

We can now create a function to visualize these examples.

```{.python .input}

@tab all

def showimages(imgs, num_rows, num_cols, titles=None, scale=1.5): #@save “””Plot a list of images.””” figsize = (num_cols scale, num_rows scale) , axes = d2l.plt.subplots(num_rows, num_cols, figsize=figsize) axes = axes.flatten() for i, (ax, img) in enumerate(zip(axes, imgs)): ax.imshow(d2l.numpy(img)) ax.axes.get_xaxis().set_visible(False) ax.axes.get_yaxis().set_visible(False) if titles: ax.set_title(titles[i]) return axes

  1. Here are the images and their corresponding labels (in text)
  2. for the first few examples in the training dataset.
  3. ```{.python .input}
  4. X, y = mnist_train[:18]
  5. show_images(X.squeeze(axis=-1), 2, 9, titles=get_fashion_mnist_labels(y));

```{.python .input}

@tab pytorch

X, y = next(iter(data.DataLoader(mnist_train, batch_size=18))) show_images(X.reshape(18, 28, 28), 2, 9, titles=get_fashion_mnist_labels(y));

  1. ```{.python .input}
  2. #@tab tensorflow
  3. X = tf.constant(mnist_train[0][:18])
  4. y = tf.constant(mnist_train[1][:18])
  5. show_images(X, 2, 9, titles=get_fashion_mnist_labels(y));

Reading a Minibatch

To make our life easier when reading from the training and test sets, we use the built-in data iterator rather than creating one from scratch. Recall that at each iteration, a data loader reads a minibatch of data with size batch_size each time. We also randomly shuffle the examples for the training data iterator.

```{.python .input} batch_size = 256

def get_dataloader_workers(): #@save “””Use 4 processes to read the data expect for Windows.””” return 0 if sys.platform.startswith(‘win’) else 4

ToTensor converts the image data from uint8 to 32-bit floating point. It

divides all numbers by 255 so that all pixel values are between 0 and 1

transformer = gluon.data.vision.transforms.ToTensor() train_iter = gluon.data.DataLoader(mnist_train.transform_first(transformer), batch_size, shuffle=True, num_workers=get_dataloader_workers())

  1. ```{.python .input}
  2. #@tab pytorch
  3. batch_size = 256
  4. def get_dataloader_workers(): #@save
  5. """Use 4 processes to read the data."""
  6. return 4
  7. train_iter = data.DataLoader(mnist_train, batch_size, shuffle=True,
  8. num_workers=get_dataloader_workers())

```{.python .input}

@tab tensorflow

batch_size = 256 train_iter = tf.data.Dataset.from_tensor_slices( mnist_train).batch(batch_size).shuffle(len(mnist_train[0]))

  1. Let us look at the time it takes to read the training data.
  2. ```{.python .input}
  3. #@tab all
  4. timer = d2l.Timer()
  5. for X, y in train_iter:
  6. continue
  7. f'{timer.stop():.2f} sec'

Putting All Things Together

Now we define the load_data_fashion_mnist function that obtains and reads the Fashion-MNIST dataset. It returns the data iterators for both the training set and validation set. In addition, it accepts an optional argument to resize images to another shape.

```{.python .input} def load_data_fashion_mnist(batch_size, resize=None): #@save “””Download the Fashion-MNIST dataset and then load it into memory.””” dataset = gluon.data.vision trans = [dataset.transforms.ToTensor()] if resize: trans.insert(0, dataset.transforms.Resize(resize)) trans = dataset.transforms.Compose(trans) mnist_train = dataset.FashionMNIST(train=True).transform_first(trans) mnist_test = dataset.FashionMNIST(train=False).transform_first(trans) return (gluon.data.DataLoader(mnist_train, batch_size, shuffle=True, num_workers=get_dataloader_workers()), gluon.data.DataLoader(mnist_test, batch_size, shuffle=False, num_workers=get_dataloader_workers()))

  1. ```{.python .input}
  2. #@tab pytorch
  3. def load_data_fashion_mnist(batch_size, resize=None): #@save
  4. """Download the Fashion-MNIST dataset and then load it into memory."""
  5. trans = [transforms.ToTensor()]
  6. if resize:
  7. trans.insert(0, transforms.Resize(resize))
  8. trans = transforms.Compose(trans)
  9. mnist_train = torchvision.datasets.FashionMNIST(
  10. root="../data", train=True, transform=trans, download=True)
  11. mnist_test = torchvision.datasets.FashionMNIST(
  12. root="../data", train=False, transform=trans, download=True)
  13. return (data.DataLoader(mnist_train, batch_size, shuffle=True,
  14. num_workers=get_dataloader_workers()),
  15. data.DataLoader(mnist_test, batch_size, shuffle=False,
  16. num_workers=get_dataloader_workers()))

```{.python .input}

@tab tensorflow

def load_data_fashion_mnist(batch_size, resize=None): #@save “””Download the Fashion-MNIST dataset and then load it into memory.””” mnist_train, mnist_test = tf.keras.datasets.fashion_mnist.load_data()

  1. # Divide all numbers by 255 so that all pixel values are between
  2. # 0 and 1, add a batch dimension at the last. And cast label to int32
  3. process = lambda X, y: (tf.expand_dims(X, axis=3) / 255,
  4. tf.cast(y, dtype='int32'))
  5. resize_fn = lambda X, y: (
  6. tf.image.resize_with_pad(X, resize, resize) if resize else X, y)
  7. return (
  8. tf.data.Dataset.from_tensor_slices(process(*mnist_train)).batch(
  9. batch_size).shuffle(len(mnist_train[0])).map(resize_fn),
  10. tf.data.Dataset.from_tensor_slices(process(*mnist_test)).batch(
  11. batch_size).map(resize_fn))
  1. Below we test the image resizing feature of the `load_data_fashion_mnist` function
  2. by specifying the `resize` argument.
  3. ```{.python .input}
  4. #@tab all
  5. train_iter, test_iter = load_data_fashion_mnist(32, resize=64)
  6. for X, y in train_iter:
  7. print(X.shape, X.dtype, y.shape, y.dtype)
  8. break

We are now ready to work with the Fashion-MNIST dataset in the sections that follow.

Summary

  • Fashion-MNIST is an apparel classification dataset consisting of images representing 10 categories. We will use this dataset in subsequent sections and chapters to evaluate various classification algorithms.
  • We store the shape of any image with height $h$ width $w$ pixels as $h \times w$ or ($h$, $w$).
  • Data iterators are a key component for efficient performance. Rely on well-implemented data iterators that exploit high-performance computing to avoid slowing down your training loop.

Exercises

  1. Does reducing the batch_size (for instance, to 1) affect the reading performance?
  2. The data iterator performance is important. Do you think the current implementation is fast enough? Explore various options to improve it.
  3. Check out the framework’s online API documentation. Which other datasets are available?

:begin_tab:mxnet Discussions :end_tab:

:begin_tab:pytorch Discussions :end_tab:

:begin_tab:tensorflow Discussions :end_tab: