ResNet在一定程度上解决了过深模型(比如几百甚至上千层)梯度发散导致无法训练的问题,其关键之处在于层间的快速连接。受此启发,能否进一步增加连接,充分利用所有层的特征呢?DenseNet就是这样的模型。

DenseNet结构

DenseNet模块中的核心模块Dense Block如下图所示,相比ResNet的残差模块,DenseNet具有更多的跨层快捷连接,从输入层开始,每层都作为后面各层的输入。

DenseNet1.jpg

在具体实现上,在ResNet中,第DenseNet - 图2层的输入DenseNet - 图3经过层的转换函数DenseNet - 图4后得到对应的输出DenseNet - 图5,该输出与输入DenseNet - 图6的线性组合就成了下一层的输入DenseNet - 图7。即:

DenseNet - 图8

而在Dense Block中,第DenseNet - 图9层的新增输入DenseNet - 图10与之前的所有输入DenseNet - 图11按照通道拼接在一起组成真正的输入,即DenseNet - 图12,该输入经过一个Batch Normalization层、ReLU和卷积层得到对应的隐层输出DenseNet - 图13,该隐层输出就是下一层的新增输入DenseNet - 图14,即:

DenseNet - 图15

DenseNet - 图16再与之前的所有输入拼接为DenseNet - 图17作为下一层的输入。一般来说,每层新增输入DenseNet - 图18的通道数量DenseNet - 图19都很小,在上图中为4,原论文中的模型一般取DenseNet - 图20。这个新增通道数量DenseNet - 图21有一个专门的名字叫增长率(Growth Rate)。由于采用这种拼接方式,同时每个隐层特别瘦(即增长率DenseNet - 图22较小),使得DenseNet看起来连接很密集,但实际参数数量及对应运算量反而较少。DenseNet相比ResNet在性能上有一定的优势,在ImageNet分类数据集上达到同样的准确率,DenseNet的参数数量及运算量可能只需要ResNet的一半左右。

最终的DenseNet由Dense Block以及转换层(Transition Layer)组成,转换层一般由一个Batch Normalization层、卷积核大小为11的卷积层和池化层组成,其中11的卷积主要用于瘦身,即降低通道数量。如下图所示,是包含三个Dense Block的DenseNet模型。
DenseNet2.jpg

用于ImageNet的DenseNet架构
DenseNet3.jpg

Code实现

  1. def DenseNet121(nb_dense_block=4, growth_rate=32, nb_filter=64, reduction=0.0, dropout_rate=0.0, weight_decay=1e-4, classes=1000, weights_path=None):
  2. '''Instantiate the DenseNet 121 architecture,
  3. # Arguments
  4. nb_dense_block: number of dense blocks to add to end
  5. growth_rate: number of filters to add per dense block
  6. nb_filter: initial number of filters
  7. reduction: reduction factor of transition blocks.
  8. dropout_rate: dropout rate
  9. weight_decay: weight decay factor
  10. classes: optional number of classes to classify images
  11. weights_path: path to pre-trained weights
  12. # Returns
  13. A Keras model instance.
  14. '''
  15. eps = 1.1e-5
  16. # compute compression factor
  17. compression = 1.0 - reduction
  18. # Handle Dimension Ordering for different backends
  19. global concat_axis
  20. if K.image_dim_ordering() == 'tf':
  21. concat_axis = 3
  22. img_input = Input(shape=(224, 224, 3), name='data')
  23. else:
  24. concat_axis = 1
  25. img_input = Input(shape=(3, 224, 224), name='data')
  26. # From architecture for ImageNet (Table 1 in the paper)
  27. nb_filter = 64
  28. nb_layers = [6,12,24,16] # For DenseNet-121
  29. # Initial convolution
  30. x = ZeroPadding2D((3, 3), name='conv1_zeropadding')(img_input)
  31. x = Convolution2D(nb_filter, 7, 7, subsample=(2, 2), name='conv1', bias=False)(x)
  32. x = BatchNormalization(epsilon=eps, axis=concat_axis, name='conv1_bn')(x)
  33. x = Scale(axis=concat_axis, name='conv1_scale')(x)
  34. x = Activation('relu', name='relu1')(x)
  35. x = ZeroPadding2D((1, 1), name='pool1_zeropadding')(x)
  36. x = MaxPooling2D((3, 3), strides=(2, 2), name='pool1')(x)
  37. # Add dense blocks
  38. for block_idx in range(nb_dense_block - 1):
  39. stage = block_idx+2
  40. x, nb_filter = dense_block(x, stage, nb_layers[block_idx], nb_filter, growth_rate, dropout_rate=dropout_rate, weight_decay=weight_decay)
  41. # Add transition_block
  42. x = transition_block(x, stage, nb_filter, compression=compression, dropout_rate=dropout_rate, weight_decay=weight_decay)
  43. nb_filter = int(nb_filter * compression)
  44. final_stage = stage + 1
  45. x, nb_filter = dense_block(x, final_stage, nb_layers[-1], nb_filter, growth_rate, dropout_rate=dropout_rate, weight_decay=weight_decay)
  46. x = BatchNormalization(epsilon=eps, axis=concat_axis, name='conv'+str(final_stage)+'_blk_bn')(x)
  47. x = Scale(axis=concat_axis, name='conv'+str(final_stage)+'_blk_scale')(x)
  48. x = Activation('relu', name='relu'+str(final_stage)+'_blk')(x)
  49. x = GlobalAveragePooling2D(name='pool'+str(final_stage))(x)
  50. x = Dense(classes, name='fc6')(x)
  51. x = Activation('softmax', name='prob')(x)
  52. model = Model(img_input, x, name='densenet')
  53. if weights_path is not None:
  54. model.load_weights(weights_path)
  55. return model
  56. def conv_block(x, stage, branch, nb_filter, dropout_rate=None, weight_decay=1e-4):
  57. '''Apply BatchNorm, Relu, bottleneck 1x1 Conv2D, 3x3 Conv2D, and option dropout
  58. # Arguments
  59. x: input tensor
  60. stage: index for dense block
  61. branch: layer index within each dense block
  62. nb_filter: number of filters
  63. dropout_rate: dropout rate
  64. weight_decay: weight decay factor
  65. '''
  66. eps = 1.1e-5
  67. conv_name_base = 'conv' + str(stage) + '_' + str(branch)
  68. relu_name_base = 'relu' + str(stage) + '_' + str(branch)
  69. # 1x1 Convolution (Bottleneck layer)
  70. inter_channel = nb_filter * 4
  71. x = BatchNormalization(epsilon=eps, axis=concat_axis, name=conv_name_base+'_x1_bn')(x)
  72. x = Scale(axis=concat_axis, name=conv_name_base+'_x1_scale')(x)
  73. x = Activation('relu', name=relu_name_base+'_x1')(x)
  74. x = Convolution2D(inter_channel, 1, 1, name=conv_name_base+'_x1', bias=False)(x)
  75. if dropout_rate:
  76. x = Dropout(dropout_rate)(x)
  77. # 3x3 Convolution
  78. x = BatchNormalization(epsilon=eps, axis=concat_axis, name=conv_name_base+'_x2_bn')(x)
  79. x = Scale(axis=concat_axis, name=conv_name_base+'_x2_scale')(x)
  80. x = Activation('relu', name=relu_name_base+'_x2')(x)
  81. x = ZeroPadding2D((1, 1), name=conv_name_base+'_x2_zeropadding')(x)
  82. x = Convolution2D(nb_filter, 3, 3, name=conv_name_base+'_x2', bias=False)(x)
  83. if dropout_rate:
  84. x = Dropout(dropout_rate)(x)
  85. return x
  86. def transition_block(x, stage, nb_filter, compression=1.0, dropout_rate=None, weight_decay=1E-4):
  87. ''' Apply BatchNorm, 1x1 Convolution, averagePooling, optional compression, dropout
  88. # Arguments
  89. x: input tensor
  90. stage: index for dense block
  91. nb_filter: number of filters
  92. compression: calculated as 1 - reduction. Reduces the number of feature maps in the transition block.
  93. dropout_rate: dropout rate
  94. weight_decay: weight decay factor
  95. '''
  96. eps = 1.1e-5
  97. conv_name_base = 'conv' + str(stage) + '_blk'
  98. relu_name_base = 'relu' + str(stage) + '_blk'
  99. pool_name_base = 'pool' + str(stage)
  100. x = BatchNormalization(epsilon=eps, axis=concat_axis, name=conv_name_base+'_bn')(x)
  101. x = Scale(axis=concat_axis, name=conv_name_base+'_scale')(x)
  102. x = Activation('relu', name=relu_name_base)(x)
  103. x = Convolution2D(int(nb_filter * compression), 1, 1, name=conv_name_base, bias=False)(x)
  104. if dropout_rate:
  105. x = Dropout(dropout_rate)(x)
  106. x = AveragePooling2D((2, 2), strides=(2, 2), name=pool_name_base)(x)
  107. return x
  108. def dense_block(x, stage, nb_layers, nb_filter, growth_rate, dropout_rate=None, weight_decay=1e-4, grow_nb_filters=True):
  109. ''' Build a dense_block where the output of each conv_block is fed to subsequent ones
  110. # Arguments
  111. x: input tensor
  112. stage: index for dense block
  113. nb_layers: the number of layers of conv_block to append to the model.
  114. nb_filter: number of filters
  115. growth_rate: growth rate
  116. dropout_rate: dropout rate
  117. weight_decay: weight decay factor
  118. grow_nb_filters: flag to decide to allow number of filters to grow
  119. '''
  120. eps = 1.1e-5
  121. concat_feat = x
  122. for i in range(nb_layers):
  123. branch = i+1
  124. x = conv_block(concat_feat, stage, branch, growth_rate, dropout_rate, weight_decay)
  125. concat_feat = merge([concat_feat, x], mode='concat', concat_axis=concat_axis, name='concat_'+str(stage)+'_'+str(branch))
  126. if grow_nb_filters:
  127. nb_filter += growth_rate
  128. return concat_feat, nb_filter

Source

https://github.com/liuzhuang13/DenseNet
https://www.jiqizhixin.com/articles/042201?from=synced&keyword=resnet
https://www.cnblogs.com/skyfsm/p/8451834.html