Dog Breed Identification (ImageNet Dogs) on Kaggle
In this section, we will practice the dog breed identification problem on Kaggle. The web address of this competition is https://www.kaggle.com/c/dog-breed-identification
In this competition,
120 different breeds of dogs will be recognized.
In fact,
the dataset for this competition is
a subset of the ImageNet dataset.
Unlike the images in the CIFAR-10 dataset in :numref:sec_kaggle_cifar10,
the images in the ImageNet dataset are both higher and wider in varying dimensions.
:numref:fig_kaggle_dog shows the information on the competition’s webpage. You need a Kaggle account
to submit your results.
:width:400px
:label:fig_kaggle_dog
```{.python .input} from d2l import mxnet as d2l from mxnet import autograd, gluon, init, npx from mxnet.gluon import nn import os
npx.set_np()
```{.python .input}#@tab pytorchfrom d2l import torch as d2limport torchimport torchvisionfrom torch import nnimport os
Obtaining and Organizing the Dataset
The competition dataset is divided into a training set and a test set, which contain 10222 and 10357 JPEG images of three RGB (color) channels, respectively. Among the training dataset, there are 120 breeds of dogs such as Labradors, Poodles, Dachshunds, Samoyeds, Huskies, Chihuahuas, and Yorkshire Terriers.
Downloading the Dataset
After logging into Kaggle,
you can click on the “Data” tab on the
competition webpage shown in :numref:fig_kaggle_dog and download the dataset by clicking the “Download All” button.
After unzipping the downloaded file in ../data, you will find the entire dataset in the following paths:
- ../data/dog-breed-identification/labels.csv
- ../data/dog-breed-identification/sample_submission.csv
- ../data/dog-breed-identification/train
- ../data/dog-breed-identification/test
You may have noticed that the above structure is
similar to that of the CIFAR-10 competition in :numref:sec_kaggle_cifar10, where folders train/ and test/ contain training and testing dog images, respectively, and labels.csv contains
the labels for the training images.
Similarly, to make it easier to get started, we provide a small sample of the dataset mentioned above: train_valid_test_tiny.zip.
If you are going to use the full dataset for the Kaggle competition, you need to change the demo variable below to False.
```{.python .input}
@tab all
@save
d2l.DATA_HUB[‘dog_tiny’] = (d2l.DATA_URL + ‘kaggle_dog_tiny.zip’, ‘0cb91d09b814ecdc07b50f31f8dcad3e81d6a86d’)
If you use the full dataset downloaded for the Kaggle competition, change
the variable below to False
demo = True if demo: data_dir = d2l.download_extract(‘dog_tiny’) else: data_dir = os.path.join(‘..’, ‘data’, ‘dog-breed-identification’)
### Organizing the DatasetWe can organize the dataset similarly to what we did in :numref:`sec_kaggle_cifar10`, namely splitting outa validation set from the original training set, and moving images into subfolders grouped by labels.The `reorg_dog_data` function below readsthe training data labels, splits out the validation set, and organizes the training set.```{.python .input}#@tab alldef reorg_dog_data(data_dir, valid_ratio):labels = d2l.read_csv_labels(os.path.join(data_dir, 'labels.csv'))d2l.reorg_train_valid(data_dir, labels, valid_ratio)d2l.reorg_test(data_dir)batch_size = 4 if demo else 128valid_ratio = 0.1reorg_dog_data(data_dir, valid_ratio)
Image Augmentation
Recall that this dog breed dataset
is a subset of the ImageNet dataset,
whose images
are larger than those of the CIFAR-10 dataset
in :numref:sec_kaggle_cifar10.
The following
lists a few image augmentation operations
that might be useful for relatively larger images.
```{.python .input} transform_train = gluon.data.vision.transforms.Compose([
# Randomly crop the image to obtain an image with an area of 0.08 to 1 of# the original area and height-to-width ratio between 3/4 and 4/3. Then,# scale the image to create a new 224 x 224 imagegluon.data.vision.transforms.RandomResizedCrop(224, scale=(0.08, 1.0),ratio=(3.0/4.0, 4.0/3.0)),gluon.data.vision.transforms.RandomFlipLeftRight(),# Randomly change the brightness, contrast, and saturationgluon.data.vision.transforms.RandomColorJitter(brightness=0.4,contrast=0.4,saturation=0.4),# Add random noisegluon.data.vision.transforms.RandomLighting(0.1),gluon.data.vision.transforms.ToTensor(),# Standardize each channel of the imagegluon.data.vision.transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])])
```{.python .input}#@tab pytorchtransform_train = torchvision.transforms.Compose([# Randomly crop the image to obtain an image with an area of 0.08 to 1 of# the original area and height-to-width ratio between 3/4 and 4/3. Then,# scale the image to create a new 224 x 224 imagetorchvision.transforms.RandomResizedCrop(224, scale=(0.08, 1.0),ratio=(3.0/4.0, 4.0/3.0)),torchvision.transforms.RandomHorizontalFlip(),# Randomly change the brightness, contrast, and saturationtorchvision.transforms.ColorJitter(brightness=0.4,contrast=0.4,saturation=0.4),# Add random noisetorchvision.transforms.ToTensor(),# Standardize each channel of the imagetorchvision.transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])])
During prediction, we only use image preprocessing operations without randomness.
```{.python .input} transform_test = gluon.data.vision.transforms.Compose([ gluon.data.vision.transforms.Resize(256),
# Crop a 224 x 224 square area from the center of the imagegluon.data.vision.transforms.CenterCrop(224),gluon.data.vision.transforms.ToTensor(),gluon.data.vision.transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])])
```{.python .input}#@tab pytorchtransform_test = torchvision.transforms.Compose([torchvision.transforms.Resize(256),# Crop a 224 x 224 square area from the center of the imagetorchvision.transforms.CenterCrop(224),torchvision.transforms.ToTensor(),torchvision.transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225])])
Reading the Dataset
As in :numref:sec_kaggle_cifar10,
we can read the organized dataset
consisting of raw image files.
```{.python .input} train_ds, valid_ds, train_valid_ds, test_ds = [ gluon.data.vision.ImageFolderDataset( os.path.join(data_dir, ‘train_valid_test’, folder)) for folder in (‘train’, ‘valid’, ‘train_valid’, ‘test’)]
```{.python .input}#@tab pytorchtrain_ds, train_valid_ds = [torchvision.datasets.ImageFolder(os.path.join(data_dir, 'train_valid_test', folder),transform=transform_train) for folder in ['train', 'train_valid']]valid_ds, test_ds = [torchvision.datasets.ImageFolder(os.path.join(data_dir, 'train_valid_test', folder),transform=transform_test) for folder in ['valid', 'test']]
Below we create data loader instances
the same way
as in :numref:sec_kaggle_cifar10.
```{.python .input} train_iter, train_valid_iter = [gluon.data.DataLoader( dataset.transform_first(transform_train), batch_size, shuffle=True, last_batch=’discard’) for dataset in (train_ds, train_valid_ds)]
valid_iter = gluon.data.DataLoader( valid_ds.transform_first(transform_test), batch_size, shuffle=False, last_batch=’discard’)
test_iter = gluon.data.DataLoader( test_ds.transform_first(transform_test), batch_size, shuffle=False, last_batch=’keep’)
```{.python .input}#@tab pytorchtrain_iter, train_valid_iter = [torch.utils.data.DataLoader(dataset, batch_size, shuffle=True, drop_last=True)for dataset in (train_ds, train_valid_ds)]valid_iter = torch.utils.data.DataLoader(valid_ds, batch_size, shuffle=False,drop_last=True)test_iter = torch.utils.data.DataLoader(test_ds, batch_size, shuffle=False,drop_last=False)
Fine-Tuning a Pretrained Model
Again,
the dataset for this competition is a subset of the ImageNet dataset.
Therefore, we can use the approach discussed in
:numref:sec_fine_tuning
to select a model pretrained on the
full ImageNet dataset and use it to extract image features to be fed into a
custom small-scale output network.
High-level APIs of deep learning frameworks
provide a wide range of models
pretrained on the ImageNet dataset.
Here, we choose
a pretrained ResNet-34 model,
where we simply reuse
the input of this model’s output layer
(i.e., the extracted
features).
Then we can replace the original output layer with a small custom
output network that can be trained,
such as stacking two
fully-connected layers.
Different from the experiment in
:numref:sec_fine_tuning,
the following does
not retrain the pretrained model used for feature
extraction. This reduces training time and
memory for storing gradients.
Recall that we standardized images using the means and standard deviations of the three RGB channels for the full ImageNet dataset. In fact, this is also consistent with the standardization operation by the pretrained model on ImageNet.
```{.python .input} def get_net(devices): finetune_net = gluon.model_zoo.vision.resnet34_v2(pretrained=True)
# Define a new output networkfinetune_net.output_new = nn.HybridSequential(prefix='')finetune_net.output_new.add(nn.Dense(256, activation='relu'))# There are 120 output categoriesfinetune_net.output_new.add(nn.Dense(120))# Initialize the output networkfinetune_net.output_new.initialize(init.Xavier(), ctx=devices)# Distribute the model parameters to the CPUs or GPUs used for computationfinetune_net.collect_params().reset_ctx(devices)return finetune_net
```{.python .input}#@tab pytorchdef get_net(devices):finetune_net = nn.Sequential()finetune_net.features = torchvision.models.resnet34(pretrained=True)# Define a new output network (there are 120 output categories)finetune_net.output_new = nn.Sequential(nn.Linear(1000, 256),nn.ReLU(),nn.Linear(256, 120))# Move the model to devicesfinetune_net = finetune_net.to(devices[0])# Freeze parameters of feature layersfor param in finetune_net.features.parameters():param.requires_grad = Falsereturn finetune_net
Before calculating the loss, we first obtain the input of the pretrained model’s output layer, i.e., the extracted feature. Then we use this feature as the input for our small custom output network to calculate the loss.
```{.python .input} loss = gluon.loss.SoftmaxCrossEntropyLoss()
def evaluate_loss(data_iter, net, devices): l_sum, n = 0.0, 0 for features, labels in data_iter: X_shards, y_shards = d2l.split_batch(features, labels, devices) output_features = [net.features(X_shard) for X_shard in X_shards] outputs = [net.output_new(feature) for feature in output_features] ls = [loss(output, y_shard).sum() for output, y_shard in zip(outputs, y_shards)] l_sum += sum([float(l.sum()) for l in ls]) n += labels.size return l_sum / n
```{.python .input}#@tab pytorchloss = nn.CrossEntropyLoss(reduction='none')def evaluate_loss(data_iter, net, devices):l_sum, n = 0.0, 0for features, labels in data_iter:features, labels = features.to(devices[0]), labels.to(devices[0])outputs = net(features)l = loss(outputs, labels)l_sum = l.sum()n += labels.numel()return l_sum / n
Defining the Training Function
We will select the model and tune hyperparameters according to the model’s performance on the validation set. The model training function train only
iterates parameters of the small custom output network.
```{.python .input} def train(net, train_iter, valid_iter, num_epochs, lr, wd, devices, lr_period, lr_decay):
# Only train the small custom output networktrainer = gluon.Trainer(net.output_new.collect_params(), 'sgd',{'learning_rate': lr, 'momentum': 0.9, 'wd': wd})num_batches, timer = len(train_iter), d2l.Timer()animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs],legend=['train loss', 'valid loss'])for epoch in range(num_epochs):metric = d2l.Accumulator(2)if epoch > 0 and epoch % lr_period == 0:trainer.set_learning_rate(trainer.learning_rate * lr_decay)for i, (features, labels) in enumerate(train_iter):timer.start()X_shards, y_shards = d2l.split_batch(features, labels, devices)output_features = [net.features(X_shard) for X_shard in X_shards]with autograd.record():outputs = [net.output_new(feature)for feature in output_features]ls = [loss(output, y_shard).sum() for output, y_shardin zip(outputs, y_shards)]for l in ls:l.backward()trainer.step(batch_size)metric.add(sum([float(l.sum()) for l in ls]), labels.shape[0])timer.stop()if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:animator.add(epoch + (i + 1) / num_batches,(metric[0] / metric[1], None))if valid_iter is not None:valid_loss = evaluate_loss(valid_iter, net, devices)animator.add(epoch + 1, (None, valid_loss))if valid_iter is not None:print(f'train loss {metric[0] / metric[1]:.3f}, 'f'valid loss {valid_loss:.3f}')else:print(f'train loss {metric[0] / metric[1]:.3f}')print(f'{metric[1] * num_epochs / timer.sum():.1f} examples/sec 'f'on {str(devices)}')
```{.python .input}#@tab pytorchdef train(net, train_iter, valid_iter, num_epochs, lr, wd, devices, lr_period,lr_decay):# Only train the small custom output networknet = nn.DataParallel(net, device_ids=devices).to(devices[0])trainer = torch.optim.SGD((param for param in net.parameters()if param.requires_grad), lr=lr,momentum=0.9, weight_decay=wd)scheduler = torch.optim.lr_scheduler.StepLR(trainer, lr_period, lr_decay)num_batches, timer = len(train_iter), d2l.Timer()animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs],legend=['train loss', 'valid loss'])for epoch in range(num_epochs):metric = d2l.Accumulator(2)for i, (features, labels) in enumerate(train_iter):timer.start()features, labels = features.to(devices[0]), labels.to(devices[0])trainer.zero_grad()output = net(features)l = loss(output, labels).sum()l.backward()trainer.step()metric.add(l, labels.shape[0])timer.stop()if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:animator.add(epoch + (i + 1) / num_batches,(metric[0] / metric[1], None))if valid_iter is not None:valid_loss = evaluate_loss(valid_iter, net, devices)animator.add(epoch + 1, (None, valid_loss))scheduler.step()if valid_iter is not None:print(f'train loss {metric[0] / metric[1]:.3f}, 'f'valid loss {valid_loss:.3f}')else:print(f'train loss {metric[0] / metric[1]:.3f}')print(f'{metric[1] * num_epochs / timer.sum():.1f} examples/sec 'f'on {str(devices)}')
Training and Validating the Model
Now we can train and validate the model.
The following hyperparameters are all tunable.
For example, the number of epochs can be increased. Because lr_period and lr_decay are set to 10 and 0.1, respectively, the learning rate of the optimization algorithm will be multiplied by 0.1 after every 10 epochs.
```{.python .input} devices, num_epochs, lr, wd = d2l.try_all_gpus(), 5, 0.01, 1e-4 lr_period, lr_decay, net = 10, 0.1, get_net(devices) net.hybridize() train(net, train_iter, valid_iter, num_epochs, lr, wd, devices, lr_period, lr_decay)
```{.python .input}#@tab pytorchdevices, num_epochs, lr, wd = d2l.try_all_gpus(), 5, 0.001, 1e-4lr_period, lr_decay, net = 10, 0.1, get_net(devices)train(net, train_iter, valid_iter, num_epochs, lr, wd, devices, lr_period,lr_decay)
Classifying the Testing Set and Submitting Results on Kaggle
Similar to the final step in :numref:sec_kaggle_cifar10,
in the end all the labeled data (including the validation set) are used for training the model and classifying the testing set.
We will use the trained custom output network
for classification.
```{.python .input} net = get_net(devices) net.hybridize() train(net, train_valid_iter, None, num_epochs, lr, wd, devices, lr_period, lr_decay)
preds = [] for data, label in test_iter: output_features = net.features(data.as_in_ctx(devices[0])) output = npx.softmax(net.output_new(output_features)) preds.extend(output.asnumpy()) ids = sorted(os.listdir( os.path.join(data_dir, ‘train_valid_test’, ‘test’, ‘unknown’))) with open(‘submission.csv’, ‘w’) as f: f.write(‘id,’ + ‘,’.join(train_valid_ds.synsets) + ‘\n’) for i, output in zip(ids, preds): f.write(i.split(‘.’)[0] + ‘,’ + ‘,’.join( [str(num) for num in output]) + ‘\n’)
```{.python .input}#@tab pytorchnet = get_net(devices)train(net, train_valid_iter, None, num_epochs, lr, wd, devices, lr_period,lr_decay)preds = []for data, label in test_iter:output = torch.nn.functional.softmax(net(data.to(devices[0])), dim=0)preds.extend(output.cpu().detach().numpy())ids = sorted(os.listdir(os.path.join(data_dir, 'train_valid_test', 'test', 'unknown')))with open('submission.csv', 'w') as f:f.write('id,' + ','.join(train_valid_ds.classes) + '\n')for i, output in zip(ids, preds):f.write(i.split('.')[0] + ',' + ','.join([str(num) for num in output]) + '\n')
The above code
will generate a submission.csv file
to be submitted
to Kaggle in the same way described in :numref:sec_kaggle_house.
Summary
- Images in the ImageNet dataset are larger (with varying dimensions) than CIFAR-10 images. We may modify image augmentation operations for tasks on a different dataset.
- To classify a subset of the ImageNet dataset, we can leverage pre-trained models on the full ImageNet dataset to extract features and only train a custom small-scale output network. This will lead to less computational time and memory cost.
Exercises
- When using the fill Kaggle competition dataset, what results can you achieve when you increase the
batch_size(batch size) andnum_epochs(number of epochs)? - Do you get better results if you use a deeper pretrained model? How do you tune hyperparameters? Can you further improve the results?
:begin_tab:mxnet
Discussions
:end_tab:
:begin_tab:pytorch
Discussions
:end_tab:
