引言

TensorFlow 是 Google 基于 DistBelief 进行研发的第二代人工智能学习系统,被广泛用于语音识别或图像识别等多项机器深度学习领域。其命名来源于本身的运行原理。Tensor(张量)意味着 N 维数组,Flow(流)意味着基于数据流图的计算,TensorFlow 代表着张量从图象的一端流动到另一端计算过程,是将复杂的数据结构传输至人工智能神经网中进行分析和处理的过程。

TensorFlow 完全开源,任何人都可以使用。可在小到一部智能手机、大到数千台数据中心服务器的各种设备上运行。

『机器学习进阶笔记』系列将深入解析 TensorFlow 系统的技术实践,从零开始,由浅入深,与大家一起走上机器学习的进阶之路。

GoogLeNet 是 ILSVRC 2014 的冠军,主要是直径经典的 LeNet-5 算法,主要是 Google 的 team 成员完成,paper 见Going Deeper with Convolutions. 相关工作主要包括LeNet-5Gabor filtersNetwork-in-Network.Network-in-Network 改进了传统的 CNN 网络,采用少量的参数就轻松地击败了 AlexNet 网络,使用 Network-in-Network 的模型最后大小约为 29MNetwork-in-Network caffe model.GoogLeNet 借鉴了 Network-in-Network 的思想,下面会详细讲述下。

Network-in-Network

机器学习进阶笔记之四 | 深入理解GoogLeNet - 知乎 - 图1

左边是我们 CNN 的线性卷积层,一般来说线性卷积层用来提取线性可分的特征,但所提取的特征高度非线性时,我们需要更加多的 filters 来提取各种潜在的特征,这样就存在一个问题,filters 太多,导致网络参数太多,网络过于复杂对于计算压力太大。

文章主要从两个方法来做了一些改良:1,卷积层的改进:MLPconv,在每个 local 部分进行比传统卷积层复杂的计算,如上图右,提高每一层卷积层对于复杂特征的识别能力,这里举个不恰当的例子,传统的 CNN 网络,每一层的卷积层相当于一个只会做单一任务,你必须要增加海量的 filters 来达到完成特定量类型的任务,而 MLPconv 的每层 conv 有更加大的能力,每一层能够做多种不同类型的任务,在选择 filters 时只需要很少量的部分;2,采用全局均值池化来解决传统 CNN 网络中最后全连接层参数过于复杂的特点,而且全连接会造成网络的泛化能力差,Alexnet 中有提高使用 dropout 来提高网络的泛化能力。

机器学习进阶笔记之四 | 深入理解GoogLeNet - 知乎 - 图2

最后作者设计了一个 4 层的 Network-in-network + 全局均值池化层来做 imagenet 的分类问题.

  1. class NiN(Network):
  2. def setup(self):
  3. (self.feed('data')
  4. .conv(11, 11, 96, 4, 4, padding='VALID', name='conv1')
  5. .conv(1, 1, 96, 1, 1, name='cccp1')
  6. .conv(1, 1, 96, 1, 1, name='cccp2')
  7. .max_pool(3, 3, 2, 2, name='pool1')
  8. .conv(5, 5, 256, 1, 1, name='conv2')
  9. .conv(1, 1, 256, 1, 1, name='cccp3')
  10. .conv(1, 1, 256, 1, 1, name='cccp4')
  11. .max_pool(3, 3, 2, 2, padding='VALID', name='pool2')
  12. .conv(3, 3, 384, 1, 1, name='conv3')
  13. .conv(1, 1, 384, 1, 1, name='cccp5')
  14. .conv(1, 1, 384, 1, 1, name='cccp6')
  15. .max_pool(3, 3, 2, 2, padding='VALID', name='pool3')
  16. .conv(3, 3, 1024, 1, 1, name='conv4-1024')
  17. .conv(1, 1, 1024, 1, 1, name='cccp7-1024')
  18. .conv(1, 1, 1000, 1, 1, name='cccp8-1024')
  19. .avg_pool(6, 6, 1, 1, padding='VALID', name='pool4')
  20. .softmax(name='prob'))

网络基本结果如上,代码见GitHub - ethereon/caffe-tensorflow: Caffe models in TensorFlow.
这里因为我最近工作变动的问题,没有了机器来跑一篇,也无法画下基本的网络结构图,之后我会补上。这里指的提出的是中间 cccp1 和 ccp2(cross channel pooling)等价于 1*1kernel 大小的卷积层。caffe 中 NIN 的实现如下:

  1. name: "nin_imagenet"
  2. layers {
  3. top: "data"
  4. top: "label"
  5. name: "data"
  6. type: DATA
  7. data_param {
  8. source: "/home/linmin/IMAGENET-LMDB/imagenet-train-lmdb"
  9. backend: LMDB
  10. batch_size: 64
  11. }
  12. transform_param {
  13. crop_size: 224
  14. mirror: true
  15. mean_file: "/home/linmin/IMAGENET-LMDB/imagenet-train-mean"
  16. }
  17. include: { phase: TRAIN }
  18. }
  19. layers {
  20. top: "data"
  21. top: "label"
  22. name: "data"
  23. type: DATA
  24. data_param {
  25. source: "/home/linmin/IMAGENET-LMDB/imagenet-val-lmdb"
  26. backend: LMDB
  27. batch_size: 89
  28. }
  29. transform_param {
  30. crop_size: 224
  31. mirror: false
  32. mean_file: "/home/linmin/IMAGENET-LMDB/imagenet-train-mean"
  33. }
  34. include: { phase: TEST }
  35. }
  36. layers {
  37. bottom: "data"
  38. top: "conv1"
  39. name: "conv1"
  40. type: CONVOLUTION
  41. blobs_lr: 1
  42. blobs_lr: 2
  43. weight_decay: 1
  44. weight_decay: 0
  45. convolution_param {
  46. num_output: 96
  47. kernel_size: 11
  48. stride: 4
  49. weight_filler {
  50. type: "gaussian"
  51. mean: 0
  52. std: 0.01
  53. }
  54. bias_filler {
  55. type: "constant"
  56. value: 0
  57. }
  58. }
  59. }
  60. layers {
  61. bottom: "conv1"
  62. top: "conv1"
  63. name: "relu0"
  64. type: RELU
  65. }
  66. layers {
  67. bottom: "conv1"
  68. top: "cccp1"
  69. name: "cccp1"
  70. type: CONVOLUTION
  71. blobs_lr: 1
  72. blobs_lr: 2
  73. weight_decay: 1
  74. weight_decay: 0
  75. convolution_param {
  76. num_output: 96
  77. kernel_size: 1
  78. stride: 1
  79. weight_filler {
  80. type: "gaussian"
  81. mean: 0
  82. std: 0.05
  83. }
  84. bias_filler {
  85. type: "constant"
  86. value: 0
  87. }
  88. }
  89. }
  90. layers {
  91. bottom: "cccp1"
  92. top: "cccp1"
  93. name: "relu1"
  94. type: RELU
  95. }
  96. layers {
  97. bottom: "cccp1"
  98. top: "cccp2"
  99. name: "cccp2"
  100. type: CONVOLUTION
  101. blobs_lr: 1
  102. blobs_lr: 2
  103. weight_decay: 1
  104. weight_decay: 0
  105. convolution_param {
  106. num_output: 96
  107. kernel_size: 1
  108. stride: 1
  109. weight_filler {
  110. type: "gaussian"
  111. mean: 0
  112. std: 0.05
  113. }
  114. bias_filler {
  115. type: "constant"
  116. value: 0
  117. }
  118. }
  119. }
  120. layers {
  121. bottom: "cccp2"
  122. top: "cccp2"
  123. name: "relu2"
  124. type: RELU
  125. }
  126. layers {
  127. bottom: "cccp2"
  128. top: "pool0"
  129. name: "pool0"
  130. type: POOLING
  131. pooling_param {
  132. pool: MAX
  133. kernel_size: 3
  134. stride: 2
  135. }
  136. }
  137. layers {
  138. bottom: "pool0"
  139. top: "conv2"
  140. name: "conv2"
  141. type: CONVOLUTION
  142. blobs_lr: 1
  143. blobs_lr: 2
  144. weight_decay: 1
  145. weight_decay: 0
  146. convolution_param {
  147. num_output: 256
  148. pad: 2
  149. kernel_size: 5
  150. stride: 1
  151. weight_filler {
  152. type: "gaussian"
  153. mean: 0
  154. std: 0.05
  155. }
  156. bias_filler {
  157. type: "constant"
  158. value: 0
  159. }
  160. }
  161. }
  162. layers {
  163. bottom: "conv2"
  164. top: "conv2"
  165. name: "relu3"
  166. type: RELU
  167. }
  168. layers {
  169. bottom: "conv2"
  170. top: "cccp3"
  171. name: "cccp3"
  172. type: CONVOLUTION
  173. blobs_lr: 1
  174. blobs_lr: 2
  175. weight_decay: 1
  176. weight_decay: 0
  177. convolution_param {
  178. num_output: 256
  179. kernel_size: 1
  180. stride: 1
  181. weight_filler {
  182. type: "gaussian"
  183. mean: 0
  184. std: 0.05
  185. }
  186. bias_filler {
  187. type: "constant"
  188. value: 0
  189. }
  190. }
  191. }
  192. layers {
  193. bottom: "cccp3"
  194. top: "cccp3"
  195. name: "relu5"
  196. type: RELU
  197. }
  198. layers {
  199. bottom: "cccp3"
  200. top: "cccp4"
  201. name: "cccp4"
  202. type: CONVOLUTION
  203. blobs_lr: 1
  204. blobs_lr: 2
  205. weight_decay: 1
  206. weight_decay: 0
  207. convolution_param {
  208. num_output: 256
  209. kernel_size: 1
  210. stride: 1
  211. weight_filler {
  212. type: "gaussian"
  213. mean: 0
  214. std: 0.05
  215. }
  216. bias_filler {
  217. type: "constant"
  218. value: 0
  219. }
  220. }
  221. }
  222. layers {
  223. bottom: "cccp4"
  224. top: "cccp4"
  225. name: "relu6"
  226. type: RELU
  227. }
  228. layers {
  229. bottom: "cccp4"
  230. top: "pool2"
  231. name: "pool2"
  232. type: POOLING
  233. pooling_param {
  234. pool: MAX
  235. kernel_size: 3
  236. stride: 2
  237. }
  238. }
  239. layers {
  240. bottom: "pool2"
  241. top: "conv3"
  242. name: "conv3"
  243. type: CONVOLUTION
  244. blobs_lr: 1
  245. blobs_lr: 2
  246. weight_decay: 1
  247. weight_decay: 0
  248. convolution_param {
  249. num_output: 384
  250. pad: 1
  251. kernel_size: 3
  252. stride: 1
  253. weight_filler {
  254. type: "gaussian"
  255. mean: 0
  256. std: 0.01
  257. }
  258. bias_filler {
  259. type: "constant"
  260. value: 0
  261. }
  262. }
  263. }
  264. layers {
  265. bottom: "conv3"
  266. top: "conv3"
  267. name: "relu7"
  268. type: RELU
  269. }
  270. layers {
  271. bottom: "conv3"
  272. top: "cccp5"
  273. name: "cccp5"
  274. type: CONVOLUTION
  275. blobs_lr: 1
  276. blobs_lr: 2
  277. weight_decay: 1
  278. weight_decay: 0
  279. convolution_param {
  280. num_output: 384
  281. kernel_size: 1
  282. stride: 1
  283. weight_filler {
  284. type: "gaussian"
  285. mean: 0
  286. std: 0.05
  287. }
  288. bias_filler {
  289. type: "constant"
  290. value: 0
  291. }
  292. }
  293. }
  294. layers {
  295. bottom: "cccp5"
  296. top: "cccp5"
  297. name: "relu8"
  298. type: RELU
  299. }
  300. layers {
  301. bottom: "cccp5"
  302. top: "cccp6"
  303. name: "cccp6"
  304. type: CONVOLUTION
  305. blobs_lr: 1
  306. blobs_lr: 2
  307. weight_decay: 1
  308. weight_decay: 0
  309. convolution_param {
  310. num_output: 384
  311. kernel_size: 1
  312. stride: 1
  313. weight_filler {
  314. type: "gaussian"
  315. mean: 0
  316. std: 0.05
  317. }
  318. bias_filler {
  319. type: "constant"
  320. value: 0
  321. }
  322. }
  323. }
  324. layers {
  325. bottom: "cccp6"
  326. top: "cccp6"
  327. name: "relu9"
  328. type: RELU
  329. }
  330. layers {
  331. bottom: "cccp6"
  332. top: "pool3"
  333. name: "pool3"
  334. type: POOLING
  335. pooling_param {
  336. pool: MAX
  337. kernel_size: 3
  338. stride: 2
  339. }
  340. }
  341. layers {
  342. bottom: "pool3"
  343. top: "pool3"
  344. name: "drop"
  345. type: DROPOUT
  346. dropout_param {
  347. dropout_ratio: 0.5
  348. }
  349. }
  350. layers {
  351. bottom: "pool3"
  352. top: "conv4"
  353. name: "conv4-1024"
  354. type: CONVOLUTION
  355. blobs_lr: 1
  356. blobs_lr: 2
  357. weight_decay: 1
  358. weight_decay: 0
  359. convolution_param {
  360. num_output: 1024
  361. pad: 1
  362. kernel_size: 3
  363. stride: 1
  364. weight_filler {
  365. type: "gaussian"
  366. mean: 0
  367. std: 0.05
  368. }
  369. bias_filler {
  370. type: "constant"
  371. value: 0
  372. }
  373. }
  374. }
  375. layers {
  376. bottom: "conv4"
  377. top: "conv4"
  378. name: "relu10"
  379. type: RELU
  380. }
  381. layers {
  382. bottom: "conv4"
  383. top: "cccp7"
  384. name: "cccp7-1024"
  385. type: CONVOLUTION
  386. blobs_lr: 1
  387. blobs_lr: 2
  388. weight_decay: 1
  389. weight_decay: 0
  390. convolution_param {
  391. num_output: 1024
  392. kernel_size: 1
  393. stride: 1
  394. weight_filler {
  395. type: "gaussian"
  396. mean: 0
  397. std: 0.05
  398. }
  399. bias_filler {
  400. type: "constant"
  401. value: 0
  402. }
  403. }
  404. }
  405. layers {
  406. bottom: "cccp7"
  407. top: "cccp7"
  408. name: "relu11"
  409. type: RELU
  410. }
  411. layers {
  412. bottom: "cccp7"
  413. top: "cccp8"
  414. name: "cccp8-1024"
  415. type: CONVOLUTION
  416. blobs_lr: 1
  417. blobs_lr: 2
  418. weight_decay: 1
  419. weight_decay: 0
  420. convolution_param {
  421. num_output: 1000
  422. kernel_size: 1
  423. stride: 1
  424. weight_filler {
  425. type: "gaussian"
  426. mean: 0
  427. std: 0.01
  428. }
  429. bias_filler {
  430. type: "constant"
  431. value: 0
  432. }
  433. }
  434. }
  435. layers {
  436. bottom: "cccp8"
  437. top: "cccp8"
  438. name: "relu12"
  439. type: RELU
  440. }
  441. layers {
  442. bottom: "cccp8"
  443. top: "pool4"
  444. name: "pool4"
  445. type: POOLING
  446. pooling_param {
  447. pool: AVE
  448. kernel_size: 6
  449. stride: 1
  450. }
  451. }
  452. layers {
  453. name: "accuracy"
  454. type: ACCURACY
  455. bottom: "pool4"
  456. bottom: "label"
  457. top: "accuracy"
  458. include: { phase: TEST }
  459. }
  460. layers {
  461. bottom: "pool4"
  462. bottom: "label"
  463. name: "loss"
  464. type: SOFTMAX_LOSS
  465. include: { phase: TRAIN }
  466. }

NIN 的提出其实也可以认为我们加深了网络的深度,通过加深网络深度(增加单个 NIN 的特征表示能力)以及将原先全连接层变为 aver_pool 层,大大减少了原先需要的 filters 数,减少了 model 的参数。paper 中实验证明达到 Alexnet 相同的性能,最终 model 大小仅为 29M。

理解 NIN 之后,再来看 GoogLeNet 就不会有不明所理的感觉。

GoogLeNet

痛点

  • 越大的 CNN 网络,有更大的 model 参数,也需要更多的计算力支持,并且由于模型过于复杂会过拟合;
  • 在 CNN 中,网络的层数的增加会伴随着需求计算资源的增加;
  • 稀疏的 network 是可以接受,但是稀疏的数据结构通常在计算时效率很低

Inception module

机器学习进阶笔记之四 | 深入理解GoogLeNet - 知乎 - 图3

Inception module 的提出主要考虑多个不同 size 的卷积核能够 hold 图像当中不同 cluster 的信息,为方便计算,paper 中分别使用 11,33,55,同时加入 33 max pooling 模块。
然而这里存在一个很大的计算隐患,每一层 Inception module 的输出的 filters 将是分支所有 filters 数量的综合,经过多层之后,最终 model 的数量将会变得巨大,naive 的 inception 会对计算资源有更大的依赖。
前面我们有提到 Network-in-Network 模型,1*1 的模型能够有效进行降维(使用更少的来表达尽可能多的信息),所以文章提出了”Inception module with dimension reduction”, 在不损失模型特征表示能力的前提下,尽量减少 filters 的数量,达到减少 model 复杂度的目的:

机器学习进阶笔记之四 | 深入理解GoogLeNet - 知乎 - 图4

Overall of GoogLeNet

机器学习进阶笔记之四 | 深入理解GoogLeNet - 知乎 - 图5

在 tensorflow 构造 GoogLeNet 基本的代码:

from kaffe.tensorflow import Network

class GoogleNet(Network):

  1. def setup(self):
  2. (self.feed('data')
  3. .conv(7, 7, 64, 2, 2, name='conv1_7x7_s2')
  4. .max_pool(3, 3, 2, 2, name='pool1_3x3_s2')
  5. .lrn(2, 2e-05, 0.75, name='pool1_norm1')
  6. .conv(1, 1, 64, 1, 1, name='conv2_3x3_reduce')
  7. .conv(3, 3, 192, 1, 1, name='conv2_3x3')
  8. .lrn(2, 2e-05, 0.75, name='conv2_norm2')
  9. .max_pool(3, 3, 2, 2, name='pool2_3x3_s2')
  10. .conv(1, 1, 64, 1, 1, name='inception_3a_1x1'))
  11. (self.feed('pool2_3x3_s2')
  12. .conv(1, 1, 96, 1, 1, name='inception_3a_3x3_reduce')
  13. .conv(3, 3, 128, 1, 1, name='inception_3a_3x3'))
  14. (self.feed('pool2_3x3_s2')
  15. .conv(1, 1, 16, 1, 1, name='inception_3a_5x5_reduce')
  16. .conv(5, 5, 32, 1, 1, name='inception_3a_5x5'))
  17. (self.feed('pool2_3x3_s2')
  18. .max_pool(3, 3, 1, 1, name='inception_3a_pool')
  19. .conv(1, 1, 32, 1, 1, name='inception_3a_pool_proj'))
  20. (self.feed('inception_3a_1x1',
  21. 'inception_3a_3x3',
  22. 'inception_3a_5x5',
  23. 'inception_3a_pool_proj')
  24. .concat(3, name='inception_3a_output')
  25. .conv(1, 1, 128, 1, 1, name='inception_3b_1x1'))
  26. (self.feed('inception_3a_output')
  27. .conv(1, 1, 128, 1, 1, name='inception_3b_3x3_reduce')
  28. .conv(3, 3, 192, 1, 1, name='inception_3b_3x3'))
  29. (self.feed('inception_3a_output')
  30. .conv(1, 1, 32, 1, 1, name='inception_3b_5x5_reduce')
  31. .conv(5, 5, 96, 1, 1, name='inception_3b_5x5'))
  32. (self.feed('inception_3a_output')
  33. .max_pool(3, 3, 1, 1, name='inception_3b_pool')
  34. .conv(1, 1, 64, 1, 1, name='inception_3b_pool_proj'))
  35. (self.feed('inception_3b_1x1',
  36. 'inception_3b_3x3',
  37. 'inception_3b_5x5',
  38. 'inception_3b_pool_proj')
  39. .concat(3, name='inception_3b_output')
  40. .max_pool(3, 3, 2, 2, name='pool3_3x3_s2')
  41. .conv(1, 1, 192, 1, 1, name='inception_4a_1x1'))
  42. (self.feed('pool3_3x3_s2')
  43. .conv(1, 1, 96, 1, 1, name='inception_4a_3x3_reduce')
  44. .conv(3, 3, 208, 1, 1, name='inception_4a_3x3'))
  45. (self.feed('pool3_3x3_s2')
  46. .conv(1, 1, 16, 1, 1, name='inception_4a_5x5_reduce')
  47. .conv(5, 5, 48, 1, 1, name='inception_4a_5x5'))
  48. (self.feed('pool3_3x3_s2')
  49. .max_pool(3, 3, 1, 1, name='inception_4a_pool')
  50. .conv(1, 1, 64, 1, 1, name='inception_4a_pool_proj'))
  51. (self.feed('inception_4a_1x1',
  52. 'inception_4a_3x3',
  53. 'inception_4a_5x5',
  54. 'inception_4a_pool_proj')
  55. .concat(3, name='inception_4a_output')
  56. .conv(1, 1, 160, 1, 1, name='inception_4b_1x1'))
  57. (self.feed('inception_4a_output')
  58. .conv(1, 1, 112, 1, 1, name='inception_4b_3x3_reduce')
  59. .conv(3, 3, 224, 1, 1, name='inception_4b_3x3'))
  60. (self.feed('inception_4a_output')
  61. .conv(1, 1, 24, 1, 1, name='inception_4b_5x5_reduce')
  62. .conv(5, 5, 64, 1, 1, name='inception_4b_5x5'))
  63. (self.feed('inception_4a_output')
  64. .max_pool(3, 3, 1, 1, name='inception_4b_pool')
  65. .conv(1, 1, 64, 1, 1, name='inception_4b_pool_proj'))
  66. (self.feed('inception_4b_1x1',
  67. 'inception_4b_3x3',
  68. 'inception_4b_5x5',
  69. 'inception_4b_pool_proj')
  70. .concat(3, name='inception_4b_output')
  71. .conv(1, 1, 128, 1, 1, name='inception_4c_1x1'))
  72. (self.feed('inception_4b_output')
  73. .conv(1, 1, 128, 1, 1, name='inception_4c_3x3_reduce')
  74. .conv(3, 3, 256, 1, 1, name='inception_4c_3x3'))
  75. (self.feed('inception_4b_output')
  76. .conv(1, 1, 24, 1, 1, name='inception_4c_5x5_reduce')
  77. .conv(5, 5, 64, 1, 1, name='inception_4c_5x5'))
  78. (self.feed('inception_4b_output')
  79. .max_pool(3, 3, 1, 1, name='inception_4c_pool')
  80. .conv(1, 1, 64, 1, 1, name='inception_4c_pool_proj'))
  81. (self.feed('inception_4c_1x1',
  82. 'inception_4c_3x3',
  83. 'inception_4c_5x5',
  84. 'inception_4c_pool_proj')
  85. .concat(3, name='inception_4c_output')
  86. .conv(1, 1, 112, 1, 1, name='inception_4d_1x1'))
  87. (self.feed('inception_4c_output')
  88. .conv(1, 1, 144, 1, 1, name='inception_4d_3x3_reduce')
  89. .conv(3, 3, 288, 1, 1, name='inception_4d_3x3'))
  90. (self.feed('inception_4c_output')
  91. .conv(1, 1, 32, 1, 1, name='inception_4d_5x5_reduce')
  92. .conv(5, 5, 64, 1, 1, name='inception_4d_5x5'))
  93. (self.feed('inception_4c_output')
  94. .max_pool(3, 3, 1, 1, name='inception_4d_pool')
  95. .conv(1, 1, 64, 1, 1, name='inception_4d_pool_proj'))
  96. (self.feed('inception_4d_1x1',
  97. 'inception_4d_3x3',
  98. 'inception_4d_5x5',
  99. 'inception_4d_pool_proj')
  100. .concat(3, name='inception_4d_output')
  101. .conv(1, 1, 256, 1, 1, name='inception_4e_1x1'))
  102. (self.feed('inception_4d_output')
  103. .conv(1, 1, 160, 1, 1, name='inception_4e_3x3_reduce')
  104. .conv(3, 3, 320, 1, 1, name='inception_4e_3x3'))
  105. (self.feed('inception_4d_output')
  106. .conv(1, 1, 32, 1, 1, name='inception_4e_5x5_reduce')
  107. .conv(5, 5, 128, 1, 1, name='inception_4e_5x5'))
  108. (self.feed('inception_4d_output')
  109. .max_pool(3, 3, 1, 1, name='inception_4e_pool')
  110. .conv(1, 1, 128, 1, 1, name='inception_4e_pool_proj'))
  111. (self.feed('inception_4e_1x1',
  112. 'inception_4e_3x3',
  113. 'inception_4e_5x5',
  114. 'inception_4e_pool_proj')
  115. .concat(3, name='inception_4e_output')
  116. .max_pool(3, 3, 2, 2, name='pool4_3x3_s2')
  117. .conv(1, 1, 256, 1, 1, name='inception_5a_1x1'))
  118. (self.feed('pool4_3x3_s2')
  119. .conv(1, 1, 160, 1, 1, name='inception_5a_3x3_reduce')
  120. .conv(3, 3, 320, 1, 1, name='inception_5a_3x3'))
  121. (self.feed('pool4_3x3_s2')
  122. .conv(1, 1, 32, 1, 1, name='inception_5a_5x5_reduce')
  123. .conv(5, 5, 128, 1, 1, name='inception_5a_5x5'))
  124. (self.feed('pool4_3x3_s2')
  125. .max_pool(3, 3, 1, 1, name='inception_5a_pool')
  126. .conv(1, 1, 128, 1, 1, name='inception_5a_pool_proj'))
  127. (self.feed('inception_5a_1x1',
  128. 'inception_5a_3x3',
  129. 'inception_5a_5x5',
  130. 'inception_5a_pool_proj')
  131. .concat(3, name='inception_5a_output')
  132. .conv(1, 1, 384, 1, 1, name='inception_5b_1x1'))
  133. (self.feed('inception_5a_output')
  134. .conv(1, 1, 192, 1, 1, name='inception_5b_3x3_reduce')
  135. .conv(3, 3, 384, 1, 1, name='inception_5b_3x3'))
  136. (self.feed('inception_5a_output')
  137. .conv(1, 1, 48, 1, 1, name='inception_5b_5x5_reduce')
  138. .conv(5, 5, 128, 1, 1, name='inception_5b_5x5'))
  139. (self.feed('inception_5a_output')
  140. .max_pool(3, 3, 1, 1, name='inception_5b_pool')
  141. .conv(1, 1, 128, 1, 1, name='inception_5b_pool_proj'))
  142. (self.feed('inception_5b_1x1',
  143. 'inception_5b_3x3',
  144. 'inception_5b_5x5',
  145. 'inception_5b_pool_proj')
  146. .concat(3, name='inception_5b_output')
  147. .avg_pool(7, 7, 1, 1, padding='VALID', name='pool5_7x7_s1')
  148. .fc(1000, relu=False, name='loss3_classifier')
  149. .softmax(name='prob'))

代码在GitHub - ethereon/caffe-tensorflow: Caffe models in TensorFlow中,作者封装了一些基本的操作,了解网络结构之后,构造 GoogLeNet 很容易。之后等到新公司之后,我会试着在 tflearn 的基础上写下 GoogLeNet 的网络代码。

GoogLeNet on Tensorflow

GoogLeNet 为了实现方便,我用 tflearn 来重写了下,代码中和 caffe model 里面不一样的就是一些 padding 的位置,因为改的比较麻烦,必须保持 inception 部分的 concat 时要一致,我这里也不知道怎么修改 pad 的值(caffe prototxt),所以统一 padding 设定为 same,具体代码如下:

  1. # -*- coding: utf-8 -*-
  2. """ GoogLeNet.
  3. Applying 'GoogLeNet' to Oxford's 17 Category Flower Dataset classification task.
  4. References:
  5. - Szegedy, Christian, et al.
  6. Going deeper with convolutions.
  7. - 17 Category Flower Dataset. Maria-Elena Nilsback and Andrew Zisserman.
  8. Links:
  9. - [GoogLeNet Paper](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf)
  10. - [Flower Dataset (17)](http://www.robots.ox.ac.uk/~vgg/data/flowers/17/)
  11. """
  12. from __future__ import division, print_function, absolute_import
  13. import tflearn
  14. from tflearn.layers.core import input_data, dropout, fully_connected
  15. from tflearn.layers.conv import conv_2d, max_pool_2d, avg_pool_2d
  16. from tflearn.layers.normalization import local_response_normalization
  17. from tflearn.layers.merge_ops import merge
  18. from tflearn.layers.estimator import regression
  19. import tflearn.datasets.oxflower17 as oxflower17
  20. X, Y = oxflower17.load_data(one_hot=True, resize_pics=(227, 227))
  21. network = input_data(shape=[None, 227, 227, 3])
  22. conv1_7_7 = conv_2d(network, 64, 7, strides=2, activation='relu', name = 'conv1_7_7_s2')
  23. pool1_3_3 = max_pool_2d(conv1_7_7, 3,strides=2)
  24. pool1_3_3 = local_response_normalization(pool1_3_3)
  25. conv2_3_3_reduce = conv_2d(pool1_3_3, 64,1, activation='relu',name = 'conv2_3_3_reduce')
  26. conv2_3_3 = conv_2d(conv2_3_3_reduce, 192,3, activation='relu', name='conv2_3_3')
  27. conv2_3_3 = local_response_normalization(conv2_3_3)
  28. pool2_3_3 = max_pool_2d(conv2_3_3, kernel_size=3, strides=2, name='pool2_3_3_s2')
  29. inception_3a_1_1 = conv_2d(pool2_3_3, 64, 1, activation='relu', name='inception_3a_1_1')
  30. inception_3a_3_3_reduce = conv_2d(pool2_3_3, 96,1, activation='relu', name='inception_3a_3_3_reduce')
  31. inception_3a_3_3 = conv_2d(inception_3a_3_3_reduce, 128,filter_size=3, activation='relu', name = 'inception_3a_3_3')
  32. inception_3a_5_5_reduce = conv_2d(pool2_3_3,16, filter_size=1,activation='relu', name ='inception_3a_5_5_reduce' )
  33. inception_3a_5_5 = conv_2d(inception_3a_5_5_reduce, 32, filter_size=5, activation='relu', name= 'inception_3a_5_5')
  34. inception_3a_pool = max_pool_2d(pool2_3_3, kernel_size=3, strides=1, )
  35. inception_3a_pool_1_1 = conv_2d(inception_3a_pool, 32, filter_size=1, activation='relu', name='inception_3a_pool_1_1')
  36. # merge the inception_3a__
  37. inception_3a_output = merge([inception_3a_1_1, inception_3a_3_3, inception_3a_5_5, inception_3a_pool_1_1], mode='concat', axis=3)
  38. inception_3b_1_1 = conv_2d(inception_3a_output, 128,filter_size=1,activation='relu', name= 'inception_3b_1_1' )
  39. inception_3b_3_3_reduce = conv_2d(inception_3a_output, 128, filter_size=1, activation='relu', name='inception_3b_3_3_reduce')
  40. inception_3b_3_3 = conv_2d(inception_3b_3_3_reduce, 192, filter_size=3, activation='relu',name='inception_3b_3_3')
  41. inception_3b_5_5_reduce = conv_2d(inception_3a_output, 32, filter_size=1, activation='relu', name = 'inception_3b_5_5_reduce')
  42. inception_3b_5_5 = conv_2d(inception_3b_5_5_reduce, 96, filter_size=5, name = 'inception_3b_5_5')
  43. inception_3b_pool = max_pool_2d(inception_3a_output, kernel_size=3, strides=1, name='inception_3b_pool')
  44. inception_3b_pool_1_1 = conv_2d(inception_3b_pool, 64, filter_size=1,activation='relu', name='inception_3b_pool_1_1')
  45. #merge the inception_3b_*
  46. inception_3b_output = merge([inception_3b_1_1, inception_3b_3_3, inception_3b_5_5, inception_3b_pool_1_1], mode='concat',axis=3,name='inception_3b_output')
  47. pool3_3_3 = max_pool_2d(inception_3b_output, kernel_size=3, strides=2, name='pool3_3_3')
  48. inception_4a_1_1 = conv_2d(pool3_3_3, 192, filter_size=1, activation='relu', name='inception_4a_1_1')
  49. inception_4a_3_3_reduce = conv_2d(pool3_3_3, 96, filter_size=1, activation='relu', name='inception_4a_3_3_reduce')
  50. inception_4a_3_3 = conv_2d(inception_4a_3_3_reduce, 208, filter_size=3, activation='relu', name='inception_4a_3_3')
  51. inception_4a_5_5_reduce = conv_2d(pool3_3_3, 16, filter_size=1, activation='relu', name='inception_4a_5_5_reduce')
  52. inception_4a_5_5 = conv_2d(inception_4a_5_5_reduce, 48, filter_size=5, activation='relu', name='inception_4a_5_5')
  53. inception_4a_pool = max_pool_2d(pool3_3_3, kernel_size=3, strides=1, name='inception_4a_pool')
  54. inception_4a_pool_1_1 = conv_2d(inception_4a_pool, 64, filter_size=1, activation='relu', name='inception_4a_pool_1_1')
  55. inception_4a_output = merge([inception_4a_1_1, inception_4a_3_3, inception_4a_5_5, inception_4a_pool_1_1], mode='concat', axis=3, name='inception_4a_output')
  56. inception_4b_1_1 = conv_2d(inception_4a_output, 160, filter_size=1, activation='relu', name='inception_4a_1_1')
  57. inception_4b_3_3_reduce = conv_2d(inception_4a_output, 112, filter_size=1, activation='relu', name='inception_4b_3_3_reduce')
  58. inception_4b_3_3 = conv_2d(inception_4b_3_3_reduce, 224, filter_size=3, activation='relu', name='inception_4b_3_3')
  59. inception_4b_5_5_reduce = conv_2d(inception_4a_output, 24, filter_size=1, activation='relu', name='inception_4b_5_5_reduce')
  60. inception_4b_5_5 = conv_2d(inception_4b_5_5_reduce, 64, filter_size=5, activation='relu', name='inception_4b_5_5')
  61. inception_4b_pool = max_pool_2d(inception_4a_output, kernel_size=3, strides=1, name='inception_4b_pool')
  62. inception_4b_pool_1_1 = conv_2d(inception_4b_pool, 64, filter_size=1, activation='relu', name='inception_4b_pool_1_1')
  63. inception_4b_output = merge([inception_4b_1_1, inception_4b_3_3, inception_4b_5_5, inception_4b_pool_1_1], mode='concat', axis=3, name='inception_4b_output')
  64. inception_4c_1_1 = conv_2d(inception_4b_output, 128, filter_size=1, activation='relu',name='inception_4c_1_1')
  65. inception_4c_3_3_reduce = conv_2d(inception_4b_output, 128, filter_size=1, activation='relu', name='inception_4c_3_3_reduce')
  66. inception_4c_3_3 = conv_2d(inception_4c_3_3_reduce, 256, filter_size=3, activation='relu', name='inception_4c_3_3')
  67. inception_4c_5_5_reduce = conv_2d(inception_4b_output, 24, filter_size=1, activation='relu', name='inception_4c_5_5_reduce')
  68. inception_4c_5_5 = conv_2d(inception_4c_5_5_reduce, 64, filter_size=5, activation='relu', name='inception_4c_5_5')
  69. inception_4c_pool = max_pool_2d(inception_4b_output, kernel_size=3, strides=1)
  70. inception_4c_pool_1_1 = conv_2d(inception_4c_pool, 64, filter_size=1, activation='relu', name='inception_4c_pool_1_1')
  71. inception_4c_output = merge([inception_4c_1_1, inception_4c_3_3, inception_4c_5_5, inception_4c_pool_1_1], mode='concat', axis=3,name='inception_4c_output')
  72. inception_4d_1_1 = conv_2d(inception_4c_output, 112, filter_size=1, activation='relu', name='inception_4d_1_1')
  73. inception_4d_3_3_reduce = conv_2d(inception_4c_output, 144, filter_size=1, activation='relu', name='inception_4d_3_3_reduce')
  74. inception_4d_3_3 = conv_2d(inception_4d_3_3_reduce, 288, filter_size=3, activation='relu', name='inception_4d_3_3')
  75. inception_4d_5_5_reduce = conv_2d(inception_4c_output, 32, filter_size=1, activation='relu', name='inception_4d_5_5_reduce')
  76. inception_4d_5_5 = conv_2d(inception_4d_5_5_reduce, 64, filter_size=5, activation='relu', name='inception_4d_5_5')
  77. inception_4d_pool = max_pool_2d(inception_4c_output, kernel_size=3, strides=1, name='inception_4d_pool')
  78. inception_4d_pool_1_1 = conv_2d(inception_4d_pool, 64, filter_size=1, activation='relu', name='inception_4d_pool_1_1')
  79. inception_4d_output = merge([inception_4d_1_1, inception_4d_3_3, inception_4d_5_5, inception_4d_pool_1_1], mode='concat', axis=3, name='inception_4d_output')
  80. inception_4e_1_1 = conv_2d(inception_4d_output, 256, filter_size=1, activation='relu', name='inception_4e_1_1')
  81. inception_4e_3_3_reduce = conv_2d(inception_4d_output, 160, filter_size=1, activation='relu', name='inception_4e_3_3_reduce')
  82. inception_4e_3_3 = conv_2d(inception_4e_3_3_reduce, 320, filter_size=3, activation='relu', name='inception_4e_3_3')
  83. inception_4e_5_5_reduce = conv_2d(inception_4d_output, 32, filter_size=1, activation='relu', name='inception_4e_5_5_reduce')
  84. inception_4e_5_5 = conv_2d(inception_4e_5_5_reduce, 128, filter_size=5, activation='relu', name='inception_4e_5_5')
  85. inception_4e_pool = max_pool_2d(inception_4d_output, kernel_size=3, strides=1, name='inception_4e_pool')
  86. inception_4e_pool_1_1 = conv_2d(inception_4e_pool, 128, filter_size=1, activation='relu', name='inception_4e_pool_1_1')
  87. inception_4e_output = merge([inception_4e_1_1, inception_4e_3_3, inception_4e_5_5,inception_4e_pool_1_1],axis=3, mode='concat')
  88. pool4_3_3 = max_pool_2d(inception_4e_output, kernel_size=3, strides=2, name='pool_3_3')
  89. inception_5a_1_1 = conv_2d(pool4_3_3, 256, filter_size=1, activation='relu', name='inception_5a_1_1')
  90. inception_5a_3_3_reduce = conv_2d(pool4_3_3, 160, filter_size=1, activation='relu', name='inception_5a_3_3_reduce')
  91. inception_5a_3_3 = conv_2d(inception_5a_3_3_reduce, 320, filter_size=3, activation='relu', name='inception_5a_3_3')
  92. inception_5a_5_5_reduce = conv_2d(pool4_3_3, 32, filter_size=1, activation='relu', name='inception_5a_5_5_reduce')
  93. inception_5a_5_5 = conv_2d(inception_5a_5_5_reduce, 128, filter_size=5, activation='relu', name='inception_5a_5_5')
  94. inception_5a_pool = max_pool_2d(pool4_3_3, kernel_size=3, strides=1, name='inception_5a_pool')
  95. inception_5a_pool_1_1 = conv_2d(inception_5a_pool, 128, filter_size=1,activation='relu', name='inception_5a_pool_1_1')
  96. inception_5a_output = merge([inception_5a_1_1, inception_5a_3_3, inception_5a_5_5, inception_5a_pool_1_1], axis=3,mode='concat')
  97. inception_5b_1_1 = conv_2d(inception_5a_output, 384, filter_size=1,activation='relu', name='inception_5b_1_1')
  98. inception_5b_3_3_reduce = conv_2d(inception_5a_output, 192, filter_size=1, activation='relu', name='inception_5b_3_3_reduce')
  99. inception_5b_3_3 = conv_2d(inception_5b_3_3_reduce, 384, filter_size=3,activation='relu', name='inception_5b_3_3')
  100. inception_5b_5_5_reduce = conv_2d(inception_5a_output, 48, filter_size=1, activation='relu', name='inception_5b_5_5_reduce')
  101. inception_5b_5_5 = conv_2d(inception_5b_5_5_reduce,128, filter_size=5, activation='relu', name='inception_5b_5_5' )
  102. inception_5b_pool = max_pool_2d(inception_5a_output, kernel_size=3, strides=1, name='inception_5b_pool')
  103. inception_5b_pool_1_1 = conv_2d(inception_5b_pool, 128, filter_size=1, activation='relu', name='inception_5b_pool_1_1')
  104. inception_5b_output = merge([inception_5b_1_1, inception_5b_3_3, inception_5b_5_5, inception_5b_pool_1_1], axis=3, mode='concat')
  105. pool5_7_7 = avg_pool_2d(inception_5b_output, kernel_size=7, strides=1)
  106. pool5_7_7 = dropout(pool5_7_7, 0.4)
  107. loss = fully_connected(pool5_7_7, 17,activation='softmax')
  108. network = regression(loss, optimizer='momentum',
  109. loss='categorical_crossentropy',
  110. learning_rate=0.001)
  111. model = tflearn.DNN(network, checkpoint_path='model_googlenet',
  112. max_checkpoints=1, tensorboard_verbose=2)
  113. model.fit(X, Y, n_epoch=1000, validation_set=0.1, shuffle=True,
  114. show_metric=True, batch_size=64, snapshot_step=200,
  115. snapshot_epoch=False, run_id='googlenet_oxflowers17')

大家如果感兴趣,可以看看这部分的 caffe model prototxt, 帮忙检查下是否有问题,代码我已经提交到 tflearn 的官方库了,add GoogLeNet(Inception) in Example,各位有 tensorflow 的直接安装下 tflearn,看看是否能帮忙检查下是否有问题,我这里因为没有 GPU 的机器,跑的比较慢,TensorBoard 的图如下,不像之前 Alexnet 那么明显(主要还是没有跑那么多 epoch, 这里在写入的时候发现主机上没有磁盘空间了,尴尬,然后从新写了 restore 来跑的,TensorBoard 的图也貌似除了点问题, 好像每次载入都不太一样,但是从基本的 log 里面的东西来看,是逐步在收敛的,这里图也贴下看看吧)

机器学习进阶笔记之四 | 深入理解GoogLeNet - 知乎 - 图6

网络结构,也无法从 TensorBoard 上直接 download 下来,我这里就一步步自己截的图 (勉强看看吧),好傻:

机器学习进阶笔记之四 | 深入理解GoogLeNet - 知乎 - 图7

为了方便,这里也贴出一些我自己保存的运行的 log,能够很明显的看出收敛:

机器学习进阶笔记之四 | 深入理解GoogLeNet - 知乎 - 图8
机器学习进阶笔记之四 | 深入理解GoogLeNet - 知乎 - 图9

相关阅读推荐:

机器学习进阶笔记之三 | 深入理解 Alexnet

机器学习进阶笔记之二 | 深入理解 Neural Style

机器学习进阶笔记之一 | TensorFlow 安装与入门

本文由『UCloud 内核与虚拟化研发团队』提供。

关于作者:

Burness( ), UCloud 平台研发中心深度学习研发工程师,tflearn Contributor,做过电商推荐、精准化营销相关算法工作,专注于分布式深度学习框架、计算机视觉算法研究,平时喜欢玩玩算法,研究研究开源的项目,偶尔也会去一些数据比赛打打酱油,生活中是个极客,对新技术、新技能痴迷。

你可以在 Github 上找到他:http://hacker.duanshishi.com/

「UCloud 机构号」将独家分享云计算领域的技术洞见、行业资讯以及一切你想知道的相关讯息。

欢迎提问 & 求关注 o(////▽////)q~

以上。
https://zhuanlan.zhihu.com/p/22817228