Kera的应用模块Application提供了带有预训练权重的Keras模型,这些模型可以用来进行
- 预测
- 特征提取
- finetune
模型的预训练权重将下载到~/.keras/models/
并在载入模型时自动载入。
可用的模型
- 应用于图像分类的模型,权重训练自ImageNet:
- Xception VGG16 VGG19 ResNet50 InceptionV3 InceptionResNetV2 MobileNet DenseNet NasNet MobileNetV2
所有的这些模型(除了Xception和MobileNet)都兼容Theano和Tensorflow,并会自动基于~/.keras/keras.json的Keras的图像维度进行自动设置。例如,如果你设置data_format=”channel_last”,则加载的模型将按照TensorFlow的维度顺序来构造,即“Width-Height-Depth”的顺序
- Xception模型仅在TensorFlow下可用,因为它依赖的SeparableConvolution层仅在TensorFlow可用。MobileNet仅在TensorFlow下可用,因为它依赖的DepethwiseConvolution层仅在TF下可用
VGG16模型
VGG16模型,权重由ImageNet训练而来keras.applications.vgg16.VGG16(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None,
pooling=None,
classes=1000)
该模型再Theano和TensorFlow后端均可使用,并接受channels_first和channels_last两种输入维度顺序
模型的默认输入尺寸是224x224
参数:
- Xception模型仅在TensorFlow下可用,因为它依赖的SeparableConvolution层仅在TensorFlow可用。MobileNet仅在TensorFlow下可用,因为它依赖的DepethwiseConvolution层仅在TF下可用
include_top:是否保留顶层的3个全连接网络
- weights:None代表随机初始化,即不加载预训练权重。’imagenet’代表加载预训练权重
- input_tensor:可填入Keras tensor作为模型的图像输出tensor
- input_shape:可选,仅当
include_top=False
有效,应为长为3的tuple,指明输入图片的shape,图片的宽高必须大于48,如(200,200,3) - pooling:当include_top=False时,该参数指定了池化方式。None代表不池化,最后一个卷积层的输出为4D张量。‘avg’代表全局平均池化,‘max’代表全局最大值池化。
- classes:可选,图片分类的类别数,仅当
include_top=True
并且不加载预训练权重时可用。
返回值
- keras模型对象
ResNet50
keras.applications.resnet50.ResNet50(include_top=True, weights='imagenet',
input_tensor=None, input_shape=None,
pooling=None,
classes=1000)
50层残差网络模型,权重训练自ImageNet
该模型在Theano和TensorFlow后端均可使用,并接受channels_first和channels_last两种输入维度顺序
模型的默认输入尺寸是224x224
参数与VGG类似
Inception V3
keras.applications.inception_v3.InceptionV3(include_top=True,
weights='imagenet',
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000)
InceptionV3网络,权重训练自ImageNet
该模型在Theano和TensorFlow后端均可使用,并接受channels_first和channels_last两种输入维度顺序
模型的默认输入尺寸是299x299
DenseNet
keras.applications.densenet.DenseNet121(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000)
keras.applications.densenet.DenseNet169(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000)
keras.applications.densenet.DenseNet201(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000)
DenseNet网络,权重训练自ImageNet
该模型既可以使用channels_first维度顺序(channels, height, width),也可以使用channels_last维度顺序(height, width, channels)
模型的默认输入尺寸是224x224
例子
1、预测:利用ResNet50网络进行ImageNet分类
from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np
model = ResNet50(weights='imagenet')
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:', decode_predictions(preds, top=3)[0])
# Predicted: [(u'n02504013', u'Indian_elephant', 0.82658225), (u'n01871265', u'tusker', 0.1122357), (u'n02504458', u'African_elephant', 0.061040461)]
include_top
数默认为True,代表是否带全连接层。2、特征提取:利用VGG16提取特征
提取特征,则不需要最后的全连接层,因此
include_top=False
```python from keras.applications.vgg16 import VGG16 from keras.preprocessing import image from keras.applications.vgg16 import preprocess_input import numpy as np
model = VGG16(weights=’imagenet’, include_top=False)
img_path = ‘elephant.jpg’ img = image.load_img(img_path, target_size=(224, 224)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) x = preprocess_input(x)
features = model.predict(x)
<a name="ICWJg"></a>
#### 2、特征提取:从VGG16的任意中间层抽取特征
```python
from keras.applications.vgg19 import VGG19
from keras.preprocessing import image
from keras.applications.vgg19 import preprocess_input
from keras.models import Model
import numpy as np
base_model = VGG19(weights='imagenet')
model = Model(inputs=base_model.input, outputs=base_model.get_layer('block4_pool').output)
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
block4_pool_features = model.predict(x)
3、fine-tune:新类别上fine-tune Inception v3
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K
# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(200, activation='softmax')(x)
# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
layer.trainable = False
# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# train the model on the new data for a few epochs
model.fit_generator(...)
# at this point, the top layers are well trained and we can start fine-tuning
# convolutional layers from inception V3. We will freeze the bottom N layers
# and train the remaining top layers.
# let's visualize layer names and layer indices to see how many layers
# we should freeze:
for i, layer in enumerate(base_model.layers):
print(i, layer.name)
# we chose to train the top 2 inception blocks, i.e. we will freeze
# the first 249 layers and unfreeze the rest:
for layer in model.layers[:249]:
layer.trainable = False
for layer in model.layers[249:]:
layer.trainable = True
# we need to recompile the model for these modifications to take effect
# we use SGD with a low learning rate
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy')
# we train our model again (this time fine-tuning the top 2 inception blocks
# alongside the top Dense layers
model.fit_generator(...)
- 在定制的输入tensor上构建Inception v3 ```python from keras.applications.inception_v3 import InceptionV3 from keras.layers import Input
this could also be the output a different Keras model or layer
input_tensor = Input(shape=(224, 224, 3)) # this assumes K.image_data_format() == ‘channels_last’
model = InceptionV3(input_tensor=input_tensor, weights=’imagenet’, include_top=True)
<a name="JNNVa"></a>
## 面向小数据集构建图像分类模型
参考:[https://keras-cn.readthedocs.io/en/latest/legacy/blog/image_classification_using_very_little_data/](https://keras-cn.readthedocs.io/en/latest/legacy/blog/image_classification_using_very_little_data/)<br />将其中有启发性的步骤记录如下:
<a name="85HcX"></a>
#### 使用预训练网络的bottleneck特征
一个稍微讲究一点的办法是,利用在大规模数据集上预训练好的网络。这样的网络在多数的计算机视觉问题上都能取得不错的特征,利用这样的特征可以让我们获得更高的准确率。<br />我们将使用vgg-16网络,该网络在ImageNet数据集上进行训练,这个模型我们之前提到过了。因为ImageNet数据集包含多种“猫”类和多种“狗”类,这个模型已经能够学习与我们这个数据集相关的特征了。事实上,简单的记录原来网络的输出而不用bottleneck特征就已经足够把我们的问题解决的不错了。不过我们这里讲的方法对其他的类似问题有更好的推广性,包括在ImageNet中没有出现的类别的分类问题。<br />我们的方法是这样的,我们将利用网络的卷积层部分,把全连接以上的部分抛掉。然后在我们的训练集和测试集上跑一遍,将得到的输出(即“bottleneck feature”,网络在全连接之前的最后一层激活的feature map)记录在两个numpy array里。然后我们基于记录下来的特征训练一个全连接网络。
- 我们将这些特征保存为离线形式,而不是将我们的全连接模型直接加到网络上并冻结之前的层参数进行训练的原因是处于计算效率的考虑。运行VGG网络的代价是非常高昂的,尤其是在CPU上运行,所以我们只想运行一次。这也是我们不进行数据提升的原因。
我们不再赘述如何搭建vgg-16网络了,这件事之前已经说过,在keras的example里也可以找到。但让我们看看如何记录bottleneck特征。
```python
generator = datagen.flow_from_directory(
'data/train',
target_size=(150, 150),
batch_size=32,
class_mode=None, # this means our generator will only yield batches of data, no labels
shuffle=False) # our data will be in order, so all first 1000 images will be cats, then 1000 dogs
# the predict_generator method returns the output of a model, given
# a generator that yields batches of numpy data
bottleneck_features_train = model.predict_generator(generator, 2000)
# save the output as a Numpy array
np.save(open('bottleneck_features_train.npy', 'w'), bottleneck_features_train)
generator = datagen.flow_from_directory(
'data/validation',
target_size=(150, 150),
batch_size=32,
class_mode=None,
shuffle=False)
bottleneck_features_validation = model.predict_generator(generator, 800)
np.save(open('bottleneck_features_validation.npy', 'w'), bottleneck_features_validation)
记录完毕后我们可以将数据载入,用于训练我们的全连接网络:
train_data = np.load(open('bottleneck_features_train.npy'))
# the features were saved in order, so recreating the labels is easy
train_labels = np.array([0] * 1000 + [1] * 1000)
validation_data = np.load(open('bottleneck_features_validation.npy'))
validation_labels = np.array([0] * 400 + [1] * 400)
model = Sequential()
model.add(Flatten(input_shape=train_data.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(train_data, train_labels,
nb_epoch=50, batch_size=32,
validation_data=(validation_data, validation_labels))
model.save_weights('bottleneck_fc_model.h5')
因为特征的size很小,模型在CPU上跑的也会很快,大概1s一个epoch,最后我们的准确率是90%~91%,这么好的结果多半归功于预训练的vgg网络帮助我们提取特征。
在预训练的网络上fine-tune
为了进一步提高之前的结果,我们可以试着fine-tune网络的后面几层。Fine-tune以一个预训练好的网络为基础,在新的数据集上重新训练一小部分权重。在这个实验中,fine-tune分三个步骤
- 搭建vgg-16并载入权重
- 将之前定义的全连接网络加在模型的顶部,并载入权重
- 冻结vgg16网络的一部分参数
注意:
- 为了进行fine-tune,所有的层都应该以训练好的权重为初始值,例如,你不能将随机初始的全连接放在预训练的卷积层之上,这是因为由随机权重产生的大梯度将会破坏卷积层预训练的权重。在我们的情形中,这就是为什么我们首先训练顶层分类器,然后再基于它进行fine-tune的原因
- 我们选择只fine-tune最后的卷积块,而不是整个网络,这是为了防止过拟合。整个网络具有巨大的熵容量,因此具有很高的过拟合倾向。由底层卷积模块学习到的特征更加一般,更加不具有抽象性,因此我们要保持前两个卷积块(学习一般特征)不动,只fine-tune后面的卷积块(学习特别的特征)。
- fine-tune应该在很低的学习率下进行,通常使用SGD优化而不是其他自适应学习率的优化算法,如RMSProp。这是为了保证更新的幅度保持在较低的程度,以免毁坏预训练的特征。
详细代码见参考文档。