计算均值和方差的方法
数据增强
MNIST
GTSRB
Cifar-10
Cifar-100
- 具体类别
ImageNet
- 常用的是ISLVRC 2012
TinyImageNet
miniImageNet

计算均值和方差的方法

def compute_mean_std(cifar100_dataset):
    """compute the mean and std of cifar100 dataset
    Args:
        cifar100_training_dataset or cifar100_test_dataset
        witch derived from class torch.utils.data
    Returns:
        a tuple contains mean, std value of entire dataset
    """
    data_r = numpy.dstack([cifar100_dataset[i][1][:, :, 0] for i in range(len(cifar100_dataset))])
    data_g = numpy.dstack([cifar100_dataset[i][1][:, :, 1] for i in range(len(cifar100_dataset))])
    data_b = numpy.dstack([cifar100_dataset[i][1][:, :, 2] for i in range(len(cifar100_dataset))])
    mean = numpy.mean(data_r), numpy.mean(data_g), numpy.mean(data_b)
    std = numpy.std(data_r), numpy.std(data_g), numpy.std(data_b)
    return mean, std
# 结果
mean = {
'cifar10': (0.4914, 0.4822, 0.4465),
'cifar100': (0.5071, 0.4867, 0.4408),
}
std = {
'cifar10': (0.2470, 0.2435, 0.2616),
'cifar100': (0.2675, 0.2565, 0.2761),
}

数据增强

https://blog.csdn.net/see_you_yu/article/details/106722787

MNIST

包含60,000个示例的训练集以及10,000个示例的测试集

28*28=784

均值 0.1307
标准差 0.3081

GTSRB

43 classes of traffic signs, split into 39,209 training images and 12,630 test images

Cifar-10

参考：https://www.cnblogs.com/Jerry-Dong/p/8109938.html

基本信息

是Tiny Images数据集的子集，Tiny Images数据集的作者 have decided to withdraw it because it contains offensive content, and have asked the community to stop using it.

10个类，每个类6000张图，共60000张图片
其中50000张作为训练集，10000张作为测试集
shape: 32x32

均值和方差

参考：https://gist.github.com/weiaicunzai/e623931921efefd4c331622c344d8151
均值 [0.4913997551666284, 0.48215855929893703, 0.4465309133731618]
标准差 [0.24703225141799082, 0.24348516474564, 0.26158783926049628]

具体类别

0 airplane
1 automobile
2 bird
3 cat
4 deer
5 dog
6 frog
7 horse
8 ship
9 truck

图像数据集 - 图1

Cifar-100

共60000张32x32的图片
共100个类，100个类又被分组为20个超类
每个类600张图片，500张用作训练集，100张用作测试机
每张图片有两个标签

具体类别

参考：https://blog.csdn.net/qq_36653505/article/details/87864405

Superclass	Classes
aquatic mammals	beaver, dolphin, otter, seal, whale
fish	aquarium fish, flatfish, ray, shark, trout
flowers	orchids, poppies, roses, sunflowers, tulips
food containers	bottles, bowls, cans, cups, plates
fruit and vegetables	apples, mushrooms, oranges, pears, sweet peppers
household electrical devices	clock, computer keyboard, lamp, telephone, television
household furniture	bed, chair, couch, table, wardrobe
insects	bee, beetle, butterfly, caterpillar, cockroach
large carnivores	bear, leopard, lion, tiger, wolf
large man-made outdoor things	bridge, castle, house, road, skyscraper
large natural outdoor scenes	cloud, forest, mountain, plain, sea
large omnivores and herbivores	camel, cattle, chimpanzee, elephant, kangaroo
medium-sized mammals	fox, porcupine, possum, raccoon, skunk
non-insect invertebrates	crab, lobster, snail, spider, worm
people	baby, boy, girl, man, woman
reptiles	crocodile, dinosaur, lizard, snake, turtle
small mammals	hamster, mouse, rabbit, shrew, squirrel
trees	maple, oak, palm, pine, willow
vehicles 1	bicycle, bus, motorcycle, pickup truck, train
vehicles 2	lawn-mower, rocket, streetcar, tank, tractor

ImageNet

14,197,122张图片（1400万张图片）
2万多个类
按224*224读取？

常用的是ISLVRC 2012

训练集：1，281，167张图片及其标签
验证集：50，000张图片及其标签
测试集：100，000张图片及其标签
类别：https://image-net.org/challenges/LSVRC/2014/browse-synsets
使用：https://blog.csdn.net/weixin_43002433/article/details/106225771

TinyImageNet

类的数量：200
训练集总数量：100000
每个类的训练集图片数：500
~~测试集总数量：10000~~
每个类的验证集图片数：50
每个类的测试集图片数：50
图片size：64 x 64

miniImageNet

miniImageNet包含100类共60000张彩色图片，其中每类有600个样本，每张图片的规格为84 × 84。