原文:keras-tutorials

译者:飞龙

协议:CC BY-NC-SA 4.0

4.1 深度学习导论

致谢:派生于 Valerio Maggio 的 deep-learning-keras-tensorflow

深度学习允许由多组成的计算模型,来学习具有多个抽象级别的数据表示。这些方法极大地改进了语音识别,视觉对象识别,物体检测,以及药物发现和基因组学等许多其他领域的最新技术。

深度学习是目前数据分析领域的主要工具之一,深度学习最常见的框架之一是 Keras。该教程将使用带有实际代码示例的keras介绍深度学习。

人工神经网络(ANN)

在机器学习和认知科学中,人工神经网络(ANN)是受生物神经网络启发的网络,用于估计或近似可取决于大量输入的函数,这些输入通常是未知的。

ANN 从堆叠的节点(神经元)构建,它们位于特征向量和目标向量之间的层中。神经网络中的节点根据权重和激活函数构建。从一个节点构建的 ANN 的早期版本被称为感知机

四、Keras - 图1

感知机是用于二元分类器的监督学习算法。 它是一个函数,可以决定输入(由数字向量表示)是属于一个类还是另一个类的。与逻辑回归非常相似,神经网络中的权重乘以输入向量并求和,并馈送给激活函数的输入。感知机网络可以设计为多层,产生多层感知器(又名“MLP”)。

四、Keras - 图2

每个神经元的权重是通过梯度下降来学习的,其中每个神经元的误差根据它的权重得出。在称为反向传播的技术中,针对前一层对每一层进行优化。

四、Keras - 图3

从零构建神经网络

点子:

我们将从第一原则构建神经网络。我们将创建一个非常简单的模型并理解它是如何工作的。我们还将实现反向传播算法。请注意,此代码未经过优化,不能用于生产。这是出于教学目的 - 让我们了解 ANN 的工作原理。theano这样的库具有高度优化的代码。

(以下代码受到这个非常棒的笔记本的启发)

  1. # 导入所需的包
  2. import numpy as np
  3. import pandas as pd
  4. import matplotlib
  5. import matplotlib.pyplot as plt
  6. import scipy
  7. # 内联展示绘图
  8. %matplotlib inline
  9. # 定义绘图的默认图形大小
  10. matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)
  11. import random
  12. random.seed(123)
  13. # 读取数据集
  14. train = pd.read_csv("data/intro_to_ann.csv")
  15. X, y = np.array(train.ix[:,0:2]), np.array(train.ix[:,2])
  16. X.shape
  17. # (500, 2)
  18. y.shape
  19. # (500,)
  20. # 让我们绘制数据集,来看看它什么样
  21. plt.scatter(X[:,0], X[:,1], s=40, c=y, cmap=plt.cm.BuGn)
  22. # <matplotlib.collections.PathCollection at 0x110b4b0f0>

四、Keras - 图4

开始构建我们的 ANN 积木

注意:此过程最终将产生我们自己的神经网络类

看一看细节

四、Keras - 图5

接受两个数字并生成一个随机数的函数

它将用在哪里?:当我们初始化神经网络时,必须随机分配权重。

  1. # 计算满足 a <= rand < b 的随机数
  2. def rand(a, b):
  3. return (b-a)*random.random() + a
  4. # 创建矩阵
  5. def makeMatrix(I, J, fill=0.0):
  6. return np.zeros([I,J])

定义我们的激活函数。让我们使用 sigmoid 函数

  1. # 我们的 sigmoid 函数
  2. def sigmoid(x):
  3. # 返回 math.tanh(x)
  4. return 1/(1+np.exp(-x))

对我们的激活函数求导

注意:当我们运行反向传播算法时,我们需要这个

  1. # sigmoid 函数对输出(也就是 y)的导数
  2. def dsigmoid(y):
  3. return y - y**2

我们的神经网络类

当我们首次创建神经网络架构时,我们需要知道输入的数量,隐藏层的数量和输出的数量。权重必须随机初始化。

  1. class ANN:
  2. def __init__(self, ni, nh, no):
  3. # 输入,隐层和输出节点的数量
  4. self.ni = ni + 1 # +1 用于偏置节点
  5. self.nh = nh
  6. self.no = no
  7. # 节点的激活
  8. self.ai = [1.0]*self.ni
  9. self.ah = [1.0]*self.nh
  10. self.ao = [1.0]*self.no
  11. # 创建权重
  12. self.wi = makeMatrix(self.ni, self.nh)
  13. self.wo = makeMatrix(self.nh, self.no)
  14. # 将它们设为随机值
  15. self.wi = rand(-0.2, 0.2, size=self.wi.shape)
  16. self.wo = rand(-2.0, 2.0, size=self.wo.shape)
  17. # 最后为动量修改权重
  18. self.ci = makeMatrix(self.ni, self.nh)
  19. self.co = makeMatrix(self.nh, self.no)

激活函数

  1. def activate(self, inputs):
  2. if len(inputs) != self.ni-1:
  3. print(inputs)
  4. raise ValueError('wrong number of inputs')
  5. # 输入激活
  6. for i in range(self.ni-1):
  7. self.ai[i] = inputs[i]
  8. # 隐层激活
  9. for j in range(self.nh):
  10. sum_h = 0.0
  11. for i in range(self.ni):
  12. sum_h += self.ai[i] * self.wi[i][j]
  13. self.ah[j] = sigmoid(sum_h)
  14. # 输出激活
  15. for k in range(self.no):
  16. sum_o = 0.0
  17. for j in range(self.nh):
  18. sum_o += self.ah[j] * self.wo[j][k]
  19. self.ao[k] = sigmoid(sum_o)
  20. return self.ao[:]

反向传播

  1. def backPropagate(self, targets, N, M):
  2. if len(targets) != self.no:
  3. print(targets)
  4. raise ValueError('wrong number of target values')
  5. # 为输出计算误差项
  6. output_deltas = np.zeros(self.no)
  7. for k in range(self.no):
  8. error = targets[k]-self.ao[k]
  9. output_deltas[k] = dsigmoid(self.ao[k]) * error
  10. # 为隐层计算误差项
  11. hidden_deltas = np.zeros(self.nh)
  12. for j in range(self.nh):
  13. error = 0.0
  14. for k in range(self.no):
  15. error += output_deltas[k]*self.wo[j][k]
  16. hidden_deltas[j] = dsigmoid(self.ah[j]) * error
  17. # 更新输出权重
  18. for j in range(self.nh):
  19. for k in range(self.no):
  20. change = output_deltas[k] * self.ah[j]
  21. self.wo[j][k] += N*change +
  22. M*self.co[j][k]
  23. self.co[j][k] = change
  24. # 更新输入权重
  25. for i in range(self.ni):
  26. for j in range(self.nh):
  27. change = hidden_deltas[j]*self.ai[i]
  28. self.wi[i][j] += N*change +
  29. M*self.ci[i][j]
  30. self.ci[i][j] = change
  31. # 计算误差
  32. error = 0.0
  33. for k in range(len(targets)):
  34. error += 0.5*(targets[k]-self.ao[k])**2
  35. return error
  36. ## 把所有东西放在一起
  37. class ANN:
  38. def __init__(self, ni, nh, no):
  39. # 输入,隐层和输出节点的数量
  40. self.ni = ni + 1 # +1 用于偏置节点
  41. self.nh = nh
  42. self.no = no
  43. # 节点的激活
  44. self.ai = [1.0]*self.ni
  45. self.ah = [1.0]*self.nh
  46. self.ao = [1.0]*self.no
  47. # 创建权重
  48. self.wi = makeMatrix(self.ni, self.nh)
  49. self.wo = makeMatrix(self.nh, self.no)
  50. # 将它们设为随机值
  51. for i in range(self.ni):
  52. for j in range(self.nh):
  53. self.wi[i][j] = rand(-0.2, 0.2)
  54. for j in range(self.nh):
  55. for k in range(self.no):
  56. self.wo[j][k] = rand(-2.0, 2.0)
  57. # 最后为动量修改权重
  58. self.ci = makeMatrix(self.ni, self.nh)
  59. self.co = makeMatrix(self.nh, self.no)
  60. def backPropagate(self, targets, N, M):
  61. if len(targets) != self.no:
  62. print(targets)
  63. raise ValueError('wrong number of target values')
  64. # 为输出计算误差项
  65. output_deltas = np.zeros(self.no)
  66. for k in range(self.no):
  67. error = targets[k]-self.ao[k]
  68. output_deltas[k] = dsigmoid(self.ao[k]) * error
  69. # 为隐层计算误差项
  70. hidden_deltas = np.zeros(self.nh)
  71. for j in range(self.nh):
  72. error = 0.0
  73. for k in range(self.no):
  74. error += output_deltas[k]*self.wo[j][k]
  75. hidden_deltas[j] = dsigmoid(self.ah[j]) * error
  76. # 更新输出权重
  77. for j in range(self.nh):
  78. for k in range(self.no):
  79. change = output_deltas[k] * self.ah[j]
  80. self.wo[j][k] += N*change + M*self.co[j][k]
  81. self.co[j][k] = change
  82. # 更新输入权重
  83. for i in range(self.ni):
  84. for j in range(self.nh):
  85. change = hidden_deltas[j]*self.ai[i]
  86. self.wi[i][j] += N*change + M*self.ci[i][j]
  87. self.ci[i][j] = change
  88. # 计算误差
  89. error = 0.0
  90. for k in range(len(targets)):
  91. error += 0.5*(targets[k]-self.ao[k])**2
  92. return error
  93. def test(self, patterns):
  94. self.predict = np.empty([len(patterns), self.no])
  95. for i, p in enumerate(patterns):
  96. self.predict[i] = self.activate(p)
  97. #self.predict[i] = self.activate(p[0])
  98. def activate(self, inputs):
  99. if len(inputs) != self.ni-1:
  100. print(inputs)
  101. raise ValueError('wrong number of inputs')
  102. # 输入激活
  103. for i in range(self.ni-1):
  104. self.ai[i] = inputs[i]
  105. # 隐层激活
  106. for j in range(self.nh):
  107. sum_h = 0.0
  108. for i in range(self.ni):
  109. sum_h += self.ai[i] * self.wi[i][j]
  110. self.ah[j] = sigmoid(sum_h)
  111. # 输出激活
  112. for k in range(self.no):
  113. sum_o = 0.0
  114. for j in range(self.nh):
  115. sum_o += self.ah[j] * self.wo[j][k]
  116. self.ao[k] = sigmoid(sum_o)
  117. return self.ao[:]
  118. def train(self, patterns, iterations=1000, N=0.5, M=0.1):
  119. # N: 学习率
  120. # M: 动量因子
  121. patterns = list(patterns)
  122. for i in range(iterations):
  123. error = 0.0
  124. for p in patterns:
  125. inputs = p[0]
  126. targets = p[1]
  127. self.activate(inputs)
  128. error += self.backPropagate([targets], N, M)
  129. if i % 5 == 0:
  130. print('error in interation %d : %-.5f' % (i,error))
  131. print('Final training error: %-.5f' % error)

在数据集上运行模型

  1. # 创建网络,带有两个输入,一个隐层,和一个输出节点
  2. ann = ANN(2, 1, 1)
  3. %timeit -n 1 -r 1 ann.train(zip(X,y), iterations=2)
  4. '''
  5. error in interation 0 : 53.62995
  6. Final training error: 53.62995
  7. Final training error: 47.35136
  8. 1 loop, best of 1: 97.6 ms per loop
  9. '''

预测训练数据集,并测量样本内准确率

  1. %timeit -n 1 -r 1 ann.test(X)
  2. # 1 loop, best of 1: 22.6 ms per loop
  3. prediction = pd.DataFrame(data=np.array([y, np.ravel(ann.predict)]).T,
  4. columns=["actual", "prediction"])
  5. prediction.head()
actual prediction
0 1.0 0.491100
1 1.0 0.495469
2 0.0 0.097362
3 0.0 0.400006
4 1.0 0.489664
  1. np.min(prediction.prediction)
  2. # 0.076553078113180129

让我们可视化并观察结果

  1. # 绘制决策边界的辅助函数
  2. # 它生成等高线图,来展示决策边界
  3. def plot_decision_boundary(nn_model):
  4. # 设置最大最小值并给它一些填充
  5. x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
  6. y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
  7. h = 0.01
  8. # 生成点的网格,它们之间距离为 h
  9. xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
  10. np.arange(y_min, y_max, h))
  11. # 为整个网格预测函数值
  12. nn_model.test(np.c_[xx.ravel(), yy.ravel()])
  13. Z = nn_model.predict
  14. Z[Z>=0.5] = 1
  15. Z[Z<0.5] = 0
  16. Z = Z.reshape(xx.shape)
  17. # 绘制等高线和训练样本
  18. plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)
  19. plt.scatter(X[:, 0], X[:, 1], s=40, c=y, cmap=plt.cm.BuGn)
  20. plot_decision_boundary(ann)
  21. plt.title("Our initial model")
  22. # <matplotlib.text.Text at 0x110bdb940>

四、Keras - 图6

练习

在上面的代码中创建具有 10 个隐藏节点的神经网络。对准确率有什么影响?

  1. # 将你的代码放在这里
  2. # (或者如果你想作弊,加载答案)
  3. # %load solutions/sol_111.py
  4. ann = ANN(2, 10, 1)
  5. %timeit -n 1 -r 1 ann.train(zip(X,y), iterations=2)
  6. plot_decision_boundary(ann)
  7. plt.title("Our next model with 10 hidden units")
  8. '''
  9. error in interation 0 : 34.91394
  10. Final training error: 34.91394
  11. Final training error: 25.36183
  12. 1 loop, best of 1: 288 ms per loop
  13. <matplotlib.text.Text at 0x11151f630>
  14. '''

四、Keras - 图7

练习:

通过增加迭代来训练神经网络。对准确率有什么影响?

  1. # 把你的代码放在这里
  2. # %load solutions/sol_112.py
  3. ann = ANN(2, 10, 1)
  4. %timeit -n 1 -r 1 ann.train(zip(X,y), iterations=100)
  5. plot_decision_boundary(ann)
  6. plt.title("Our model with 10 hidden units and 100 iterations")
  7. '''
  8. error in interation 0 : 31.63185
  9. Final training error: 31.63185
  10. Final training error: 25.12319
  11. Final training error: 24.92547
  12. Final training error: 24.89692
  13. Final training error: 24.88124
  14. ...
  15. error in interation 95 : 7.50499
  16. Final training error: 7.50499
  17. Final training error: 7.46215
  18. Final training error: 7.42298
  19. Final training error: 7.38707
  20. Final training error: 7.35410
  21. 1 loop, best of 1: 14.5 s per loop
  22. <matplotlib.text.Text at 0x1115951d0>
  23. '''

四、Keras - 图8

附录

仓库中还有一个额外的笔记本,即“用于 MNIST 的 ANN 的简单实现”,用于实现 SGDMLP 并用于 MNIST 数据集。它和 http://neuralnetworksanddeeplearning.com/ 配套。强烈推荐这本书。

4.2 Theano

致谢:派生于 Valerio Maggio 的 deep-learning-keras-tensorflow

语言中的语言。

处理权重矩阵和梯度可能是棘手的,有时不是没有意义的。Theano 是处理向量,矩阵和高维张量代数的一个很好的框架。本教程的大部分内容都将引用 Theano,但 TensorFlow 是另一个伟大的框架,能够为复杂代数提供令人难以置信的抽象。TensorFlow 的更多信息请参阅下一章。

  1. import theano
  2. import theano.tensor as T

符号变量

Theano 拥有自己的变量和函数,定义如下:

  1. x = T.scalar()
  2. x

变量可以用在表达式中:

  1. y = 3*(x**2) + 1

y现在是一个表达式。结果也是符号:

  1. type(y)
  2. y.shape
  3. # Shape.0

打印

我们将要看到,正常的打印对于 theano 来说并不是最好的:

  1. print(y)
  2. # Elemwise{add,no_inplace}.0
  3. theano.pprint(y)
  4. # '((TensorConstant{3} * (<TensorType(float32, scalar)> ** TensorConstant{2})) + TensorConstant{1})'
  5. theano.printing.debugprint(y)
  6. '''
  7. Elemwise{add,no_inplace} [@A] ''
  8. |Elemwise{mul,no_inplace} [@B] ''
  9. | |TensorConstant{3} [@C]
  10. | |Elemwise{pow,no_inplace} [@D] ''
  11. | |<TensorType(float32, scalar)> [@E]
  12. | |TensorConstant{2} [@F]
  13. |TensorConstant{1} [@G]
  14. '''

表达式求值

提供将变量映射到值的dict

  1. y.eval({x: 2})
  2. # array(13.0, dtype=float32)

或者编译函数:

  1. f = theano.function([x], y)
  2. f(2)
  3. # array(13.0, dtype=float32)

其它张量类型

  1. X = T.vector()
  2. X = T.matrix()
  3. X = T.tensor3()
  4. X = T.tensor4()

自动求导

  • 梯度是自动的!
  1. x = T.scalar()
  2. y = T.log(x)
  3. gradient = T.grad(y, x)
  4. print gradient
  5. print gradient.eval({x: 2})
  6. print (2 * gradient)
  7. '''
  8. Elemwise{true_div}.0
  9. 0.5
  10. Elemwise{mul,no_inplace}.0
  11. '''

共享变量

  • 符号 + 存储
  1. import numpy as np
  2. x = theano.shared(np.zeros((2, 3), dtype=theano.config.floatX))
  3. x
  4. # <CudaNdarrayType(float32, matrix)>

我们可以获取和设置变量的值。

  1. values = x.get_value()
  2. print(values.shape)
  3. print(values)
  4. '''
  5. (2, 3)
  6. [[ 0. 0. 0.]
  7. [ 0. 0. 0.]]
  8. '''
  9. x.set_value(values)

共享变量也可以在表达式中使用:

  1. (x + 2) ** 2
  2. # Elemwise{pow,no_inplace}.0

在求值时,它们的值用作输入:

  1. ((x + 2) ** 2).eval()
  2. '''
  3. array([[ 4., 4., 4.],
  4. [ 4., 4., 4.]], dtype=float32)
  5. '''
  6. theano.function([], (x + 2) ** 2)()
  7. '''
  8. array([[ 4., 4., 4.],
  9. [ 4., 4., 4.]], dtype=float32)
  10. '''

更新

  • 储存函数求值的结果
  • dict将共享变量映射到新的值
  1. count = theano.shared(0)
  2. new_count = count + 1
  3. updates = {count: new_count}
  4. f = theano.function([], count, updates=updates)
  5. f()
  6. # array(0)
  7. f()
  8. # array(1)
  9. f()
  10. # array(2)

热身!逻辑回归

  1. %matplotlib inline
  2. import numpy as np
  3. import pandas as pd
  4. import theano
  5. import theano.tensor as T
  6. import matplotlib.pyplot as plt
  7. from sklearn.preprocessing import StandardScaler
  8. from sklearn.preprocessing import LabelEncoder
  9. from keras.utils import np_utils
  10. # Using Theano backend.

在本节中,我们将使用 Kaggle otto 挑战。如果你想关注它,请从 Kaggle 获取数据:https://www.kaggle.com/c/otto-group-product-classification-challenge/data

关于数据

奥托集团是世界上最大的电子商务公司之一,对产品性能的一致分析至关重要。 然而,由于全球基础设施多样化,许多相同的产品具有不同分类。在本次比赛中,我们提供了超过 200,000 种产品和 93 个特征的数据集。 目标是建立一个能够区分我们主要产品类别的预测模型。每行对应一个产品。 共有 93 个数字特征,代表不同事件的计数。 所有特征都已经过混淆,不再进一步定义。

https://www.kaggle.com/c/otto-group-product-classification-challenge/data

  1. def load_data(path, train=True):
  2. """从 CSV 文件加载数据
  3. 参数
  4. ----------
  5. path: str
  6. CSV 文件的路径
  7. train: bool (默认为 True)
  8. 决定数据是否是*训练数据*
  9. 如果为 True,执行一些打乱
  10. 返回值
  11. ------
  12. X: numpy.ndarray
  13. 作为浮点的多维数组的数据
  14. ids: numpy.ndarray
  15. 每个样本的 id 向量
  16. """
  17. df = pd.read_csv(path)
  18. X = df.values.copy()
  19. if train:
  20. np.random.shuffle(X) # https://youtu.be/uyUXoap67N8
  21. X, labels = X[:, 1:-1].astype(np.float32), X[:, -1]
  22. return X, labels
  23. else:
  24. X, ids = X[:, 1:].astype(np.float32), X[:, 0].astype(str)
  25. return X, ids
  26. def preprocess_data(X, scaler=None):
  27. """通过减去均值并缩放到单位方差
  28. 来标准化数据,来处理输入数据"""
  29. if not scaler:
  30. scaler = StandardScaler()
  31. scaler.fit(X)
  32. X = scaler.transform(X)
  33. return X, scaler
  34. def preprocess_labels(labels, encoder=None, categorical=True):
  35. """使用 0~`n-classes-1` 的值编码标签"""
  36. if not encoder:
  37. encoder = LabelEncoder()
  38. encoder.fit(labels)
  39. y = encoder.transform(labels).astype(np.int32)
  40. if categorical:
  41. y = np_utils.to_categorical(y)
  42. return y, encoder
  43. print("Loading data...")
  44. X, labels = load_data('train.csv', train=True)
  45. X, scaler = preprocess_data(X)
  46. Y, encoder = preprocess_labels(labels)
  47. X_test, ids = load_data('test.csv', train=False)
  48. X_test, ids = X_test[:1000], ids[:1000]
  49. # 绘制数据
  50. print(X_test[:1])
  51. X_test, _ = preprocess_data(X_test, scaler)
  52. nb_classes = Y.shape[1]
  53. print(nb_classes, 'classes')
  54. dims = X.shape[1]
  55. print(dims, 'dims')
  56. '''
  57. Loading data...
  58. [[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 3. 0. 0. 0. 3.
  59. 2. 1. 0. 0. 0. 0. 0. 0. 0. 5. 3. 1. 1. 0.
  60. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 1. 0. 0.
  61. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  62. 0. 0. 0. 0. 0. 0. 0. 3. 0. 0. 0. 0. 1. 1.
  63. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  64. 0. 11. 1. 20. 0. 0. 0. 0. 0.]]
  65. (9L, 'classes')
  66. (93L, 'dims')
  67. '''

现在让我们创建并训练逻辑回归模型

实战 - 逻辑回归

  1. # 基于来自 DeepLearning.net 的示例
  2. rng = np.random
  3. N = 400
  4. feats = 93
  5. training_steps = 1
  6. # 声明 Theano 符号变量
  7. x = T.matrix("x")
  8. y = T.vector("y")
  9. w = theano.shared(rng.randn(feats), name="w")
  10. b = theano.shared(0., name="b")
  11. # 构造 Theano 表达式图
  12. p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b)) # 目标为 1 的概率
  13. prediction = p_1 > 0.5 # 预测阈值
  14. xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # 交叉熵损失函数
  15. cost = xent.mean() + 0.01 * (w ** 2).sum()# 要最小化的损失
  16. gw, gb = T.grad(cost, [w, b]) # 计算损失的梯度
  17. # (我们将在这个教程的后面的章节中返回这里)
  18. # 编译
  19. train = theano.function(
  20. inputs=[x,y],
  21. outputs=[prediction, xent],
  22. updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)),
  23. allow_input_downcast=True)
  24. predict = theano.function(inputs=[x], outputs=prediction, allow_input_downcast=True)
  25. # class1 的变换
  26. y_class1 = []
  27. for i in Y:
  28. y_class1.append(i[0])
  29. y_class1 = np.array(y_class1)
  30. # 训练
  31. for i in range(training_steps):
  32. print('Epoch %s' % (i+1,))
  33. pred, err = train(X, y_class1)
  34. print("target values for Data:")
  35. print(y_class1)
  36. print("prediction on training set:")
  37. print(predict(X))
  38. '''
  39. Epoch 1
  40. target values for Data:
  41. [ 0. 0. 1. ..., 0. 0. 0.]
  42. prediction on training set:
  43. [0 0 0 ..., 0 0 0]
  44. '''

4.3 Keras

致谢:派生于 Valerio Maggio 的 deep-learning-keras-tensorflow

  1. %matplotlib inline
  2. import numpy as np
  3. import pandas as pd
  4. import theano
  5. import theano.tensor as T
  6. import matplotlib.pyplot as plt
  7. import keras
  8. from sklearn.preprocessing import StandardScaler
  9. from sklearn.preprocessing import LabelEncoder
  10. from keras.utils import np_utils
  11. from sklearn.cross_validation import train_test_split
  12. from keras.callbacks import EarlyStopping, ModelCheckpoint
  13. from keras.models import Sequential
  14. from keras.layers import Dense, Activation
  15. # Using Theano backend.

用于 Theano 和 TensorFlow 的深度学习库

Keras 是一个极简,高度模块化的神经网络库,用 Python 编写,能够在 TensorFlow 或 Theano 之上运行。 它的开发重点是实现快速实验。 能够在最短时间内将理念变成结果,是进行良好研究的关键。

参考:https://keras.io/

Keras,为什么是这个名字?

Keras(κέρας)在希腊语中的意思是号角。 它是古希腊和拉丁文学的文学形象的参考,首先在《奥德赛》中发现,其中梦灵(Oneiroi,单数为 Oneiros)被分为用虚假异象欺骗人的一种,它们通过象牙门到达地球,以及宣布未来的另一种,它们通过号角之门到达。 这是一个关于单词 κέρας(号角)/κραίνω(履行)和 ἐλέφας(象牙)/ἐλεφαίρομαι(欺骗)的戏剧。

Keras 最初作为项目 ONEIROS(开放式神经电子智能机器人操作系统)的研究工作的一部分而开发。

“Oneiroi are beyond our unravelling —who can be sure what tale they tell? Not all that men look for comes to pass. Two gates there are that give passage to fleeting Oneiroi; one is made of horn, one of ivory. The Oneiroi that pass through sawn ivory are deceitful, bearing a message that will not be fulfilled; those that come out through polished horn have truth behind them, to be accomplished for men who see them.”

Homer, Odyssey 19. 562 ff (Shewring translation).

实战 - Keras 逻辑回归

  1. dims = X.shape[1]
  2. print(dims, 'dims')
  3. print("Building model...")
  4. nb_classes = Y.shape[1]
  5. print(nb_classes, 'classes')
  6. model = Sequential()
  7. model.add(Dense(nb_classes, input_shape=(dims,)))
  8. model.add(Activation('softmax'))
  9. model.compile(optimizer='sgd', loss='categorical_crossentropy')
  10. model.fit(X, Y)
  11. '''
  12. (93L, 'dims')
  13. Building model...
  14. (9L, 'classes')
  15. Epoch 1/10
  16. 61878/61878 [==============================] - 1s - loss: 1.0574
  17. Epoch 2/10
  18. 61878/61878 [==============================] - 1s - loss: 0.7730
  19. Epoch 3/10
  20. 61878/61878 [==============================] - 1s - loss: 0.7297
  21. Epoch 4/10
  22. 61878/61878 [==============================] - 1s - loss: 0.7080
  23. Epoch 5/10
  24. 61878/61878 [==============================] - 1s - loss: 0.6948
  25. Epoch 6/10
  26. 61878/61878 [==============================] - 1s - loss: 0.6854
  27. Epoch 7/10
  28. 61878/61878 [==============================] - 1s - loss: 0.6787
  29. Epoch 8/10
  30. 61878/61878 [==============================] - 1s - loss: 0.6734
  31. Epoch 9/10
  32. 61878/61878 [==============================] - 1s - loss: 0.6691
  33. Epoch 10/10
  34. 61878/61878 [==============================] - 1s - loss: 0.6657
  35. '''
  36. # <keras.callbacks.History at 0x23d330f0>

简洁是非常令人印象深刻的吗?现在让我们理解:

Keras 的核心数据结构是模型,一种组织层的方法。主要类型的模型是顺序模型,层的线性栈。

我们在这里做的是,从输入到输出堆叠可训练权重的全连接(密集)层,并在权重层顶部堆叠激活层。

密集层(Dense
  1. from keras.layers.core import Dense
  2. Dense(output_dim, init='glorot_uniform', activation='linear',
  3. weights=None, W_regularizer=None, b_regularizer=None,
  4. activity_regularizer=None, W_constraint=None,
  5. b_constraint=None, bias=True, input_dim=None)

激活(Activation
  1. from keras.layers.core import Activation
  2. Activation(activation)

优化器

如果需要,你可以进一步配置优化器。Keras 的核心原则是使事情变得相当简单,同时在需要的时候,允许用户完全控制(终极控制是源代码的易扩展性)。在这里,我们使用 SGD随机梯度下降)作为我们可训练权重的优化算法。

对这个示例执行更多的”数据分析”

我们在这里做的很好,但是在现实世界中由于过拟合而无法使用。让我们尝试用交叉验证来解决它。

过拟合

在过度拟合中,统计模型描述随机误差或噪声而不是底层关系。 当模型过于复杂时发生过拟合,例如相对于观察数量参数太多。过拟合的模型具有较差的预测表现,因为它对训练数据中的微小波动过度反应。

四、Keras - 图9

为了避免过拟合,我们将首先将数据拆分为训练集和测试集,并在测试集上测试模型。下一步:我们将使用两个 keras 的回调EarlyStoppingModelCheckpoint

  1. X, X_test, Y, Y_test = train_test_split(X, Y, test_size=0.15, random_state=42)
  2. fBestModel = 'best_model.h5'
  3. early_stop = EarlyStopping(monitor='val_loss', patience=4, verbose=1)
  4. best_model = ModelCheckpoint(fBestModel, verbose=0, save_best_only=True)
  5. model.fit(X, Y, validation_data = (X_test, Y_test), nb_epoch=20,
  6. batch_size=128, verbose=True, validation_split=0.15,
  7. callbacks=[best_model, early_stop])
  8. '''
  9. Train on 19835 samples, validate on 3501 samples
  10. Epoch 1/20
  11. 19835/19835 [==============================] - 0s - loss: 0.6391 - val_loss: 0.6680
  12. Epoch 2/20
  13. 19835/19835 [==============================] - 0s - loss: 0.6386 - val_loss: 0.6689
  14. Epoch 3/20
  15. 19835/19835 [==============================] - 0s - loss: 0.6384 - val_loss: 0.6695
  16. Epoch 4/20
  17. 19835/19835 [==============================] - 0s - loss: 0.6381 - val_loss: 0.6702
  18. Epoch 5/20
  19. 19835/19835 [==============================] - 0s - loss: 0.6378 - val_loss: 0.6709
  20. Epoch 6/20
  21. 19328/19835 [============================>.] - ETA: 0s - loss: 0.6380Epoch 00005: early stopping
  22. 19835/19835 [==============================] - 0s - loss: 0.6375 - val_loss: 0.6716
  23. '''
  24. # <keras.callbacks.History at 0x1d7245f8>

多层感知机和全连接

那么,用 keras 构建多层感知器有多难?它是一样的,只需添加更多层!

  1. model = Sequential()
  2. model.add(Dense(100, input_shape=(dims,)))
  3. model.add(Dense(nb_classes))
  4. model.add(Activation('softmax'))
  5. model.compile(optimizer='sgd', loss='categorical_crossentropy')
  6. model.fit(X, Y)

你的回合!

实战 - Keras 全连接

花几分钟时间尝试优化层数和层中的参数数量,来获得最佳效果。

  1. model = Sequential()
  2. model.add(Dense(100, input_shape=(dims,)))
  3. # ...
  4. # ...
  5. # 玩转它!按你的想法添加一些层!尝试获得更好的结果。
  6. model.add(Dense(nb_classes))
  7. model.add(Activation('softmax'))
  8. model.compile(optimizer='sgd', loss='categorical_crossentropy')
  9. model.fit(X, Y)

构建问答系统,图像分类模型,神经图灵机,word2vec 嵌入器或任何其他模型,是同样快的。 深度学习背后的想法很简单,那么为什么他们的实现会很痛苦呢?

深度的理论动机

有很多研究都是关于神经网络的深度。已经在数学上 [1] 和经验上证明,卷积神经网络从深度中获益!

[1] - On the Expressive Power of Deep Learning: A Tensor Analysis - Cohen, et al 2015

神经网络的一个引用定理说明:

通用近似定理 [1] 表明,具有单个隐层包含有限数量的神经元的前馈网络(即多层感知器),在激活函数的温和假设下,可以近似 $\mathbb{R}^n$ 的紧致子集上的连续函数。因此该定理表明,当给出适当的参数时,简单的神经网络可以表示各种有趣的函数;但是,它没有涉及这些参数的算法可学习性。

[1] - Approximation Capabilities of Multilayer Feedforward Networks - Kurt Hornik 1991

4.4 用于 MNIST 的 ANN 简单实现

致谢:派生于 Valerio Maggio 的 deep-learning-keras-tensorflow

代码取自:https://github.com/mnielsen/neural-networks-and-deep-learning

这一节与在线文本 http://neuralnetworksanddeeplearning.com/ 配套。强烈推荐这本书。

  1. # 导入库
  2. import random
  3. import numpy as np
  4. import keras
  5. from keras.datasets import mnist
  6. '''
  7. Using Theano backend.
  8. Using gpu device 0: GeForce GTX 760 (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
  9. '''
  10. # 将完整路径设为 mnist.pkl.gz
  11. # 将其指向仓库里的数据文件夹
  12. path_to_dataset = "euroscipy2016_dl-tutorial/data/mnist.pkl.gz"
  13. !mkdir -p $HOME/.keras/datasets/euroscipy2016_dl-tutorial/data/
  14. # 加载数据集
  15. (X_train, y_train), (X_test, y_test) = mnist.load_data(path_to_dataset)
  16. '''
  17. Downloading data from https://s3.amazonaws.com/img-datasets/mnist.pkl.gz
  18. 15286272/15296311 [============================>.] - ETA: 0s
  19. '''
  20. print(X_train.shape, y_train.shape)
  21. print(X_test.shape, y_test.shape)
  22. '''
  23. (60000, 28, 28) (60000,)
  24. (10000, 28, 28) (10000,)
  25. '''
  26. """
  27. network.py
  28. ~~~~~~~~~~
  29. 为前馈神经网络实现随机梯度下降学习算法的模块。
  30. 使用反向传播计算梯度。
  31. 请注意,我专注于使代码简单,易读且易于修改。
  32. 它没有经过优化,省略了许多理想的特性。
  33. """
  34. #### 库
  35. # 标准库
  36. import random
  37. # 三方库
  38. import numpy as np
  39. class Network(object):
  40. def __init__(self, sizes):
  41. """列表``sizes``包含网络各层中的神经元数量。 例如,
  42. 如果列表是 [2,3,1] 那么它将是三层网络,第一层包含 2
  43. 个神经元,第二层 3 个神经元,第三层 1 个神经元。
  44. 网络的偏置和权重是随机初始化的,使用均值为 0 方差为 1
  45. 的高斯分布。注意,假设第一层是输入层,按照惯例,我们不会
  46. 为这些神经元设置任何偏置,因为偏差只用于计算后面的层的输出。"""
  47. self.num_layers = len(sizes)
  48. self.sizes = sizes
  49. self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
  50. self.weights = [np.random.randn(y, x)
  51. for x, y in zip(sizes[:-1], sizes[1:])]
  52. def feedforward(self, a):
  53. """如果输入``a``,则返回网络的输出。"""
  54. for b, w in zip(self.biases, self.weights):
  55. a = sigmoid(np.dot(w, a)+b)
  56. return a
  57. def SGD(self, training_data, epochs, mini_batch_size, eta,
  58. test_data=None):
  59. """使用小批量随机梯度下降训练神经网络。``training_data``
  60. 是``(x, y)``元组的列表,表示训练输入和所需输出。其他非可选
  61. 参数是不言自明的。如果提供``test_data``,那么将在每个
  62. 迭代之后对测试数据评估网络,并打印出部分进度。这对于
  63. 跟踪进度很有用,但会大大减慢速度。"""
  64. training_data = list(training_data)
  65. test_data = list(test_data)
  66. if test_data: n_test = len(test_data)
  67. n = len(training_data)
  68. for j in range(epochs):
  69. random.shuffle(training_data)
  70. mini_batches = [
  71. training_data[k:k+mini_batch_size]
  72. for k in range(0, n, mini_batch_size)]
  73. for mini_batch in mini_batches:
  74. self.update_mini_batch(mini_batch, eta)
  75. if test_data:
  76. print( "Epoch {0}: {1} / {2}".format(
  77. j, self.evaluate(test_data), n_test))
  78. else:
  79. print( "Epoch {0} complete".format(j))
  80. def update_mini_batch(self, mini_batch, eta):
  81. """通过使用反向传播,将梯度下降应用于
  82. 单个小批量,来更新网络的权重和偏差。
  83. ``mini_batch``是``(x, y)``元组列表,``eta``是学习率。"""
  84. nabla_b = [np.zeros(b.shape) for b in self.biases]
  85. nabla_w = [np.zeros(w.shape) for w in self.weights]
  86. for x, y in mini_batch:
  87. delta_nabla_b, delta_nabla_w = self.backprop(x, y)
  88. nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
  89. nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
  90. self.weights = [w-(eta/len(mini_batch))*nw
  91. for w, nw in zip(self.weights, nabla_w)]
  92. self.biases = [b-(eta/len(mini_batch))*nb
  93. for b, nb in zip(self.biases, nabla_b)]
  94. def backprop(self, x, y):
  95. """返回元组``(nabla_b, nabla_w),表示损失函数
  96. C_x 的梯度``。 ``nabla_b``和``nabla_w``是 numpy
  97. 数组的逐层列表,类似于``self.biases``和``self.weights``。"""
  98. nabla_b = [np.zeros(b.shape) for b in self.biases]
  99. nabla_w = [np.zeros(w.shape) for w in self.weights]
  100. # 前馈
  101. activation = x
  102. activations = [x] # 用于逐层储存所有激活的列表
  103. zs = [] # 用于逐层储存所有 z 向量的列表
  104. for b, w in zip(self.biases, self.weights):
  105. z = np.dot(w, activation)+b
  106. zs.append(z)
  107. activation = sigmoid(z)
  108. activations.append(activation)
  109. # 反向传播
  110. delta = self.cost_derivative(activations[-1], y) * \
  111. sigmoid_prime(zs[-1])
  112. nabla_b[-1] = delta
  113. nabla_w[-1] = np.dot(delta, activations[-2].transpose())
  114. # 请注意,下面循环中的变量`l`与本书第 2 章中的表示法略有不同。
  115. # 这里,`l = 1`表示最后一层神经元,`l = 2`表示倒数第二层,
  116. # 依此类推。 它是本书中方案的重新编号,
  117. # 利用了可以在 Python 列表中使用负数索引的事实。
  118. for l in range(2, self.num_layers):
  119. z = zs[-l]
  120. sp = sigmoid_prime(z)
  121. delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
  122. nabla_b[-l] = delta
  123. nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
  124. return (nabla_b, nabla_w)
  125. def evaluate(self, test_data):
  126. """返回神经网络输出正确结果的测试输入数。
  127. 注意,神经网络的输出被假定为,
  128. 具有最高激活的最终层中任何神经元的索引。"""
  129. test_results = [(np.argmax(self.feedforward(x)), y)
  130. for (x, y) in test_data]
  131. return sum(int(x == y) for (x, y) in test_results)
  132. def cost_derivative(self, output_activations, y):
  133. """为输出激活返回 C_x 对 a 的偏导数"""
  134. return (output_activations-y)
  135. #### 杂项函数
  136. def sigmoid(z):
  137. """sigmoid 函数"""
  138. return 1.0/(1.0+np.exp(-z))
  139. def sigmoid_prime(z):
  140. """sigmoid 函数的导数"""
  141. return sigmoid(z)*(1-sigmoid(z))
  142. def vectorized_result(j):
  143. """返回一个 10 维单位向量,其中第 j 个位置为 1.0,
  144. 其他位置为零。 这用于将数字 0~9 转换为
  145. 来自神经网络的对应的期望输出。"""
  146. e = np.zeros((10, 1))
  147. e[j] = 1.0
  148. return e
  149. net = Network([784, 30, 10])
  150. training_inputs = [np.reshape(x, (784, 1)) for x in X_train.copy()]
  151. training_results = [vectorized_result(y) for y in y_train.copy()]
  152. training_data = zip(training_inputs, training_results)
  153. test_inputs = [np.reshape(x, (784, 1)) for x in X_test.copy()]
  154. test_data = zip(test_inputs, y_test.copy())
  155. net.SGD(training_data, 10, 10, 3.0, test_data=test_data)
  156. '''
  157. Epoch 0: 1348 / 10000
  158. Epoch 1: 1939 / 10000
  159. Epoch 2: 2046 / 10000
  160. Epoch 3: 1422 / 10000
  161. Epoch 4: 1365 / 10000
  162. Epoch 5: 1351 / 10000
  163. Epoch 6: 1879 / 10000
  164. Epoch 7: 1806 / 10000
  165. Epoch 8: 1754 / 10000
  166. Epoch 9: 1974 / 10000
  167. '''
  168. net = Network([784, 10, 10])
  169. training_inputs = [np.reshape(x, (784, 1)) for x in X_train.copy()]
  170. training_results = [vectorized_result(y) for y in y_train.copy()]
  171. training_data = zip(training_inputs, training_results)
  172. test_inputs = [np.reshape(x, (784, 1)) for x in X_test.copy()]
  173. test_data = zip(test_inputs, y_test.copy())
  174. net.SGD(training_data, 10, 10, 1.0, test_data=test_data)
  175. '''
  176. Epoch 0: 3526 / 10000
  177. Epoch 1: 3062 / 10000
  178. Epoch 2: 2946 / 10000
  179. Epoch 3: 2462 / 10000
  180. Epoch 4: 3617 / 10000
  181. Epoch 5: 3773 / 10000
  182. Epoch 6: 3568 / 10000
  183. Epoch 7: 4459 / 10000
  184. Epoch 8: 3009 / 10000
  185. Epoch 9: 2660 / 10000
  186. '''

4.5 卷积神经网络

致谢:派生于 Valerio Maggio 的 deep-learning-keras-tensorflow

参考:

我使用的一些图片和内容来自这篇精彩的博客文章和这本非常棒的书,《神经网络和深度学习》,由 Michael Nielsen 撰写(强烈推荐)。

卷积神经网络(CNN,或 ConvNet)是一种前馈人工神经网络,其神经元之间的连接模式受到动物视觉皮层组织的启发。网络由多层小神经元集合组成,它们处理输入图像的部分,称为感受域。然后展开这些集合的输出,使它们的输入区域重叠,来获得原始图像的更好表示。 对于每个这样的层重复这一过程。

看起来是什么样呢?

四、Keras - 图10

来源:https://flickrcode.files.wordpress.com/2014/10/conv-net2.png

问题空间

图像分类

图像分类是一类任务,获取输入图像并输出类别(猫,狗等),或最能描述图像的类别的概率。对于人类而言,这种识别的任务是我们从出生那一刻开始学习的第一个技能之一,也是成年人自然而轻松地学习的技能之一。这些技能,能够快速识别模式,从先前知识泛化并适应不同图像环境,是我们和机器不同的技能。

输入和输出

四、Keras - 图11

来源:http://www.pawbuzz.com/wp-content/uploads/sites/551/2014/11/corgi-puppies-21.jpg

当计算机看到图像(接受图像作为输入)时,它将看到一个像素值数组。根据图像的分辨率和大小,它将看到一个 32 x 32 x 3 的数字数组(3 表示 RGB 值)。假设我们有 JPG 格式的彩色图像,其大小为 480 x 480。表示数组将为 480 x 480 x 3。这些数字中的每一个都提供 0 到 255 之间的值,该值描述了该点的像素强度。

目标

我们希望计算机做的是,能够区分给它的所有图像,并找出使狗成为狗或使猫成为猫的独特特征。当我们看一张狗的照片时,如果照片有可识别的特征,如爪子或四条腿,我们可以将它分类。以类似的方式,计算机应该能够通过查找低级特征(例如边和曲线),然后通过一系列卷积层构建更抽象的概念来执行图像分类。

CNN 的结构

A more detailed overview of what CNNs do would be that you take the image, pass it through a series of convolutional, nonlinear, pooling (downsampling), and fully connected layers, and get an output. As we said earlier, the output can be a single class or a probability of classes that best describes the image.

来源:[1]

卷积层

CNN 中的第一层始终是卷积层

四、Keras - 图12

卷积过滤器

像图像识别中的内核一样,卷积滤波器是一个小的矩阵,可用于模糊,锐化,浮雕,边缘检测等。这是通过内核和图像之间的卷积来实现的。另一个主要区别是,卷积核是学到的

四、Keras - 图13

当过滤器在输入图像上滑动或卷积时,它将过滤器中的值乘以图像的原始像素值(也称为计算逐元素乘法)。

四、Keras - 图14

现在,我们对输入图像上的每个位置重复此过程。(下一步是将过滤器向右移动 1 个单位,然后再向右移动 1,依此类推)。将过滤器滑过所有位置后,我们会留下一组数字,通常称为激活映射特征映射

高阶视角

让我们从高层简单谈谈,这个卷积实际上做的事情。这些过滤器中的每一个都可以被认为是特征标识符(例如直边,简单颜色,曲线)。

四、Keras - 图15

感知域的可视化

四、Keras - 图16

四、Keras - 图17

四、Keras - 图18

值要低得多! 这是因为图像部分中没有任何响应曲线检测过滤器的内容。 请记住,此卷积层的输出是激活映射。

深入网络

现在,在传统的卷积神经网络架构中,还有其他层散布在这些卷积层之间。

四、Keras - 图19

ReLU(整流线性单元)层

在每个卷积层之后,通常立即应用非线性层(或激活层)。

这一层的目的是为一个系统引入非线性,该系统基本上只是在卷积层中计算线性运算(只是元素乘法和加法)。在过去,使用 tanh 和 Sigmoid 等非线性函数,但研究人员发现 ReLU 工作得更好,因为网络能够训练得更快(因为计算效率),而没有准确性的显着差异。

它还有助于缓解梯度消失问题,这是网络的较低层训练得非常缓慢的问题,因为通过各层的梯度呈指数下降。

简而言之)消失梯度问题取决于激活函数的选择。许多常见的激活函数(例如sigmoidtanh以非常非线性的方式,将它们压缩到非常小的输出范围内。例如,sigmoid 将实数映射到 [0,1] 的“小”范围。结果,输入空间的大区域被映射到极小的范围。在输入空间的这些区域中,即使输入的大的变化也会在输出中产生小的变化 - 因此*梯度很小

ReLu

ReLu 函数定义为 $f(x) = \max(0, x),$ [2]。整流器的平滑近似是解析函数:$f(x) = \ln(1 + e^x)$,这被称为 softplus 函数。softplus 的导数是 $f’(x) = e^x / (e^x + 1) = 1 / (1 + e^{-x})$,即逻辑函数

[2] http://www.cs.toronto.edu/~fritz/absps/reluICML.pdf by G. E. Hinton

池化层

在一些 ReLU 层之后,通常应用池化层(也称为下采样层)。在这个类别中,还有几个层的选项,最大池化是最受欢迎的。

最大池化过滤器的示例:

四、Keras - 图20

池化层的其他选项是平均池化和 L2 标准池化。这个池化层背后的直觉是,一旦我们知道特定特征在原始输入中(高激活值的地方),其确切位置就不如与其他特征的相对位置一样重要。因此,该层极大地减小了输入的空间尺寸(长度和宽度,但不是深度)。

这有两个主要目的:减少参数的数量;控制过拟合。可以用一个例子来直观解释池化的作用:让我们假设我们有一个用于检测面部的过滤器。面部的确切像素位置,与面部“位于顶部某处”的事实相关性较小。

丢弃层

丢失层具有非常特殊的功能,即通过在前向传递中将它们设置为零,来剔除该层中的一组随机激活。就那么简单。它允许避免过拟合,但必须在训练时使用而不是测试期间。

全连接层

然而,最后一层是重要的层,即全连接层。基本上,FC 层会查看与特定类别相关度最强的高级特征,并且具有特定权重,以便在计算权重和上一层的乘积时,可以获得不同类别的正确概率。

四、Keras - 图21

Keras 中的 CNN

Keras 支持:

  • 1D 卷积层;
  • 2D 卷积层;
  • 3D 卷积层;

相应的keras包是keras.layers.convolutional

Convolution1D

  1. from keras.layers.convolutional import Convolution1D
  2. Convolution1D(nb_filter, filter_length, init='uniform',
  3. activation='linear', weights=None,
  4. border_mode='valid', subsample_length=1,
  5. W_regularizer=None, b_regularizer=None,
  6. activity_regularizer=None, W_constraint=None,
  7. b_constraint=None, bias=True, input_dim=None,
  8. input_length=None)

用于过滤一维输入的邻域的卷积算子。 当使用此层作为模型中的第一层时,要么提供关键字参数input_dimint,例如 128 表示 128 维向量的序列),要么提供input_shape(整数元组,例如(10, 128)表示 128 维向量的 10 个向量的序列。

示例

  1. # 在 10 个时间步骤的序列上应用
  2. # 带有 64 个输出过滤器的长度为 3 的一维卷积
  3. model = Sequential()
  4. model.add(Convolution1D(64, 3, border_mode='same', input_shape=(10, 32)))
  5. # 现在 model.output_shape == (None, 10, 64)
  6. # 添加新的add a new conv1d on top
  7. model.add(Convolution1D(32, 3, border_mode='same'))
  8. # 现在 model.output_shape == (None, 10, 32)

Convolution2D

  1. from keras.layers.convolutional import Convolution2D
  2. Convolution2D(nb_filter, nb_row, nb_col,
  3. init='glorot_uniform',
  4. activation='linear', weights=None,
  5. border_mode='valid', subsample=(1, 1),
  6. dim_ordering='default', W_regularizer=None,
  7. b_regularizer=None, activity_regularizer=None,
  8. W_constraint=None, b_constraint=None,
  9. bias=True)

示例

  1. # 在 256x256 图像上应用带有 64 个过滤器的 3x3 卷积
  2. model = Sequential()
  3. model.add(Convolution2D(64, 3, 3, border_mode='same',
  4. input_shape=(3, 256, 256)))
  5. # 现在 model.output_shape == (None, 64, 256, 256)
  6. # 在顶上添加 3x3 卷积,带有 32 个输出过滤器
  7. model.add(Convolution2D(32, 3, 3, border_mode='same'))
  8. # 现在 model.output_shape == (None, 32, 256, 256)

Keras 中的卷积过滤器的维度

ConvNets 的复杂结构可能使表示难以理解。当然,维度根据卷积滤波器的维度(例如 1D,2D)而变化

Convolution1D

输入形状

3D 张量,形状为:(samples, steps, input_dim)

输出形状

3D 张量,形状为:(samples, new_steps, nb_filter)

Convolution2D

输入形状

4D 张量,形状为:

  • (samples, channels, rows, cols),如果dim_ordering='th'
  • (samples, rows, cols, channels),如果dim_ordering='tf'

输出形状

4D 张量,形状为:

  • (samples, nb_filter, new_rows, new_cols),如果dim_ordering='th'
  • (samples, new_rows, new_cols, nb_filter),如果dim_ordering='tf'

4.6 Keras ConvNet 实战

致谢:派生于 Valerio Maggio 的 deep-learning-keras-tensorflow

问题定义

识别手写数字。

数据

MNIST 数据库 有一个手写数字数据集。训练集有 60,000 个样本。测试集有 10,000 个样本。数字是尺寸标准化的并且以固定尺寸的图像为中心。数据页面描述了如何收集数据。 它还报告了测试数据集上各种算法的基准。

加载数据

数据存在于仓库的data文件夹中。让我们使用keras库加载它。现在,让我们加载数据并查看它的外观。

  1. import numpy as np
  2. import keras
  3. from keras.datasets import mnist
  4. '''
  5. Using Theano backend.
  6. Using gpu device 0: GeForce GTX 760 (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
  7. '''
  8. !mkdir -p $HOME/.keras/datasets/euroscipy_2016_dl-keras/data/
  9. # 将完整路径设为 mnist.pkl.gz
  10. path_to_dataset = "euroscipy_2016_dl-keras/data/mnist.pkl.gz"
  11. # 加载数据集
  12. (X_train, y_train), (X_test, y_test) = mnist.load_data(path_to_dataset)
  13. '''
  14. Downloading data from https://s3.amazonaws.com/img-datasets/mnist.pkl.gz
  15. 15024128/15296311 [============================>.] - ETA: 0s
  16. '''

数据集上的基本数据分析

  1. # X_train 的类型是什么?
  2. # y_train 的类型是什么?
  3. # 寻找训练数据中的观测数
  4. # 寻找测试数据中的观测数
  5. # 展示 X_train 的前两个记录
  6. '''
  7. array([[[0, 0, 0, ..., 0, 0, 0],
  8. [0, 0, 0, ..., 0, 0, 0],
  9. [0, 0, 0, ..., 0, 0, 0],
  10. ...,
  11. [0, 0, 0, ..., 0, 0, 0],
  12. [0, 0, 0, ..., 0, 0, 0],
  13. [0, 0, 0, ..., 0, 0, 0]],
  14. [[0, 0, 0, ..., 0, 0, 0],
  15. [0, 0, 0, ..., 0, 0, 0],
  16. [0, 0, 0, ..., 0, 0, 0],
  17. ...,
  18. [0, 0, 0, ..., 0, 0, 0],
  19. [0, 0, 0, ..., 0, 0, 0],
  20. [0, 0, 0, ..., 0, 0, 0]]], dtype=uint8)
  21. '''
  22. # 展示 y_train 的前十个记录
  23. # array([5, 0, 4, 1, 9, 2, 1, 3, 1, 4], dtype=uint8)
  24. # 寻找 y_train 数据集中每个数字的观测数
  25. '''
  26. (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8),
  27. array([5923, 6742, 5958, 6131, 5842, 5421, 5918, 6265, 5851, 5949]))
  28. '''
  29. # 寻找 y_test 数据集中每个数字的观测数
  30. '''
  31. (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8),
  32. array([ 980, 1135, 1032, 1010, 982, 892, 958, 1028, 974, 1009]))
  33. '''
  34. # X_train 的维度是什么,这是什么意思?
  35. # (60000, 28, 28)

展示图像

现在让我们展示一些图像并看看它们的外观。我们将使用matplotlib库来显示图像。

  1. from matplotlib import pyplot
  2. import matplotlib as mpl
  3. %matplotlib inline
  4. # 展示第一个训练数据
  5. fig = pyplot.figure()
  6. ax = fig.add_subplot(1,1,1)
  7. imgplot = ax.imshow(X_train[1], cmap=mpl.cm.Greys)
  8. imgplot.set_interpolation('nearest')
  9. ax.xaxis.set_ticks_position('top')
  10. ax.yaxis.set_ticks_position('left')
  11. pyplot.show()

四、Keras - 图22

  1. # 让我们展示第 11 个记录

四、Keras - 图23

4.7 用于 MNIST 的卷积网络

致谢:派生于 Valerio Maggio 的 deep-learning-keras-tensorflow

深度学习模型可能需要相当长的时间来运行,尤其是在不使用 GPU 的情况下。

为了节省时间,你可以采样一个观测子集(例如 1000 个),这是你选择的特定数字(例如 6)和 1000 非特定数字的观察值(即非 6)。我们将使用它构建一个模型,并查看它在测试数据集上的表现。

  1. # 导入所需的库
  2. import numpy as np
  3. np.random.seed(1338)
  4. from keras.datasets import mnist
  5. '''
  6. Using Theano backend.
  7. Using gpu device 0: GeForce GTX 760 (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
  8. '''
  9. from keras.models import Sequential
  10. from keras.layers.core import Dense, Dropout, Activation, Flatten
  11. from keras.layers.convolutional import Convolution2D
  12. from keras.layers.pooling import MaxPooling2D
  13. from keras.utils import np_utils
  14. from keras.optimizers import SGD

加载数据

  1. path_to_dataset = "euroscipy_2016_dl-keras/data/mnist.pkl.gz"
  2. # 加载训练和测试数据
  3. (X_train, y_train), (X_test, y_test) = mnist.load_data(path_to_dataset)
  4. X_test_orig = X_test

数据准备

  1. img_rows, img_cols = 28, 28
  2. X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
  3. X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
  4. X_train = X_train.astype('float32')
  5. X_test = X_test.astype('float32')
  6. X_train /= 255
  7. X_test /= 255
  8. # 用于复现的种子
  9. np.random.seed(1338)
  10. # 测试数据
  11. X_test = X_test.copy()
  12. Y = y_test.copy()
  13. # 将输出转换为二元分类(6 => 1,!6 => 0)
  14. Y_test = Y == 6
  15. Y_test = Y_test.astype(int)
  16. # 选择输出是 6 的 5918 个样本
  17. X_six = X_train[y_train == 6].copy()
  18. Y_six = y_train[y_train == 6].copy()
  19. # 选择输出不是 6 的样本
  20. X_not_six = X_train[y_train != 6].copy()
  21. Y_not_six = y_train[y_train != 6].copy()
  22. # 从输出不是 6 的数据 6000 个随机样本
  23. random_rows = np.random.randint(0,X_six.shape[0],6000)
  24. X_not_six = X_not_six[random_rows]
  25. Y_not_six = Y_not_six[random_rows]
  26. # 附加输出是 6 的数据,和输出不是 6 的数据
  27. X_train = np.append(X_six,X_not_six)
  28. # 改变附加数据的形状
  29. X_train = X_train.reshape(X_six.shape[0] + X_not_six.shape[0],
  30. 1, img_rows, img_cols)
  31. # 附加标签,并将标签转换为二元分类(6 => 1,!6 => 0)
  32. Y_labels = np.append(Y_six,Y_not_six)
  33. Y_train = Y_labels == 6
  34. Y_train = Y_train.astype(int)
  35. print(X_train.shape, Y_labels.shape, X_test.shape, Y_test.shape)
  36. # (11918, 1, 28, 28) (11918,) (10000, 1, 28, 28) (10000, 2)
  37. # 将分类转换为二元类别形式
  38. nb_classes = 2
  39. Y_train = np_utils.to_categorical(Y_train, nb_classes)
  40. Y_test = np_utils.to_categorical(Y_test, nb_classes)

简单的 CNN

  1. # 为卷积神经网络初始化值
  2. nb_epoch = 2
  3. batch_size = 128
  4. # 要使用的卷积过滤器数量
  5. nb_filters = 32
  6. # 用于最大池化的池化区数量
  7. nb_pool = 2
  8. # 卷积核大小
  9. nb_conv = 3
  10. sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

第一步:模型定义

  1. model = Sequential()
  2. model.add(Convolution2D(nb_filters, nb_conv, nb_conv,
  3. border_mode='valid',
  4. input_shape=(1, img_rows, img_cols)))
  5. model.add(Activation('relu'))
  6. model.add(Flatten())
  7. model.add(Dense(nb_classes))
  8. model.add(Activation('softmax'))

第二步:编译

  1. model.compile(loss='categorical_crossentropy',
  2. optimizer='sgd',
  3. metrics=['accuracy'])

第三步:拟合

  1. model.fit(X_train, Y_train, batch_size=batch_size,
  2. nb_epoch=nb_epoch,verbose=1,
  3. validation_data=(X_test, Y_test))
  4. '''
  5. Train on 11918 samples, validate on 10000 samples
  6. Epoch 1/2
  7. 11918/11918 [==============================] - 0s - loss: 0.2890 - acc: 0.9326 - val_loss: 0.1251 - val_acc: 0.9722
  8. Epoch 2/2
  9. 11918/11918 [==============================] - 0s - loss: 0.1341 - acc: 0.9612 - val_loss: 0.1298 - val_acc: 0.9599
  10. <keras.callbacks.History at 0x7f6ccb68f630>
  11. '''

第四步:评估

  1. # 在测试数据上评估模型
  2. score, accuracy = model.evaluate(X_test, Y_test, verbose=0)
  3. print('Test score:', score)
  4. print('Test accuracy:', accuracy)
  5. '''
  6. Test score: 0.129807630396
  7. Test accuracy: 0.9599
  8. '''

让我们绘制模型预测

  1. import matplotlib.pyplot as plt
  2. %matplotlib inline
  3. slice = 15
  4. predicted = model.predict(X_test[:slice]).argmax(-1)
  5. plt.figure(figsize=(16,8))
  6. for i in range(slice):
  7. plt.subplot(1, slice, i+1)
  8. plt.imshow(X_test_orig[i], interpolation='nearest')
  9. plt.text(0, 0, predicted[i], color='black',
  10. bbox=dict(facecolor='white', alpha=1))
  11. plt.axis('off')

四、Keras - 图24

添加更多密集层

  1. model = Sequential()
  2. model.add(Convolution2D(nb_filters, nb_conv, nb_conv,
  3. border_mode='valid',
  4. input_shape=(1, img_rows, img_cols)))
  5. model.add(Activation('relu'))
  6. model.add(Flatten())
  7. model.add(Dense(128))
  8. model.add(Activation('relu'))
  9. model.add(Dense(nb_classes))
  10. model.add(Activation('softmax'))
  11. model.compile(loss='categorical_crossentropy',
  12. optimizer='sgd',
  13. metrics=['accuracy'])
  14. model.fit(X_train, Y_train, batch_size=batch_size,
  15. nb_epoch=nb_epoch,verbose=1,
  16. validation_data=(X_test, Y_test))
  17. '''
  18. Train on 11918 samples, validate on 10000 samples
  19. Epoch 1/2
  20. 11918/11918 [==============================] - 0s - loss: 0.3044 - acc: 0.9379 - val_loss: 0.1469 - val_acc: 0.9625
  21. Epoch 2/2
  22. 11918/11918 [==============================] - 0s - loss: 0.1189 - acc: 0.9640 - val_loss: 0.1058 - val_acc: 0.9655
  23. <keras.callbacks.History at 0x7f6cf59f7358>
  24. '''
  25. # 在测试数据上评估模型
  26. score, accuracy = model.evaluate(X_test, Y_test, verbose=0)
  27. print('Test score:', score)
  28. print('Test accuracy:', accuracy)
  29. '''
  30. Test score: 0.105762729073
  31. Test accuracy: 0.9655
  32. '''

添加丢弃

  1. model = Sequential()
  2. model.add(Convolution2D(nb_filters, nb_conv, nb_conv,
  3. border_mode='valid',
  4. input_shape=(1, img_rows, img_cols)))
  5. model.add(Activation('relu'))
  6. model.add(Flatten())
  7. model.add(Dense(128))
  8. model.add(Activation('relu'))
  9. model.add(Dropout(0.5))
  10. model.add(Dense(nb_classes))
  11. model.add(Activation('softmax'))
  12. model.compile(loss='categorical_crossentropy',
  13. optimizer='sgd',
  14. metrics=['accuracy'])
  15. model.fit(X_train, Y_train, batch_size=batch_size,
  16. nb_epoch=nb_epoch,verbose=1,
  17. validation_data=(X_test, Y_test))
  18. '''
  19. Train on 11918 samples, validate on 10000 samples
  20. Epoch 1/2
  21. 11918/11918 [==============================] - 0s - loss: 0.3128 - acc: 0.9097 - val_loss: 0.1438 - val_acc: 0.9624
  22. Epoch 2/2
  23. 11918/11918 [==============================] - 0s - loss: 0.1362 - acc: 0.9580 - val_loss: 0.1145 - val_acc: 0.9628
  24. <keras.callbacks.History at 0x7f6ccb180208>
  25. '''
  26. # 在测试数据上评估模型
  27. score, accuracy = model.evaluate(X_test, Y_test, verbose=0)
  28. print('Test score:', score)
  29. print('Test accuracy:', accuracy)
  30. '''
  31. Test score: 0.11448907243
  32. Test accuracy: 0.9628
  33. '''

添加更多卷积层

  1. model = Sequential()
  2. model.add(Convolution2D(nb_filters, nb_conv, nb_conv,
  3. border_mode='valid',
  4. input_shape=(1, img_rows, img_cols)))
  5. model.add(Activation('relu'))
  6. model.add(Convolution2D(nb_filters, nb_conv, nb_conv))
  7. model.add(Activation('relu'))
  8. model.add(MaxPooling2D(pool_size=(nb_pool, nb_pool)))
  9. model.add(Dropout(0.25))
  10. model.add(Flatten())
  11. model.add(Dense(128))
  12. model.add(Activation('relu'))
  13. model.add(Dropout(0.5))
  14. model.add(Dense(nb_classes))
  15. model.add(Activation('softmax'))
  16. model.compile(loss='categorical_crossentropy',
  17. optimizer='sgd',
  18. metrics=['accuracy'])
  19. model.fit(X_train, Y_train, batch_size=batch_size,
  20. nb_epoch=nb_epoch,verbose=1,
  21. validation_data=(X_test, Y_test))
  22. '''
  23. Train on 11918 samples, validate on 10000 samples
  24. Epoch 1/2
  25. 11918/11918 [==============================] - 1s - loss: 0.4707 - acc: 0.8288 - val_loss: 0.2307 - val_acc: 0.9399
  26. Epoch 2/2
  27. 11918/11918 [==============================] - 1s - loss: 0.1882 - acc: 0.9383 - val_loss: 0.1195 - val_acc: 0.9621
  28. <keras.callbacks.History at 0x7f6cc97b8748>
  29. '''
  30. # 在测试数据上评估模型
  31. score, accuracy = model.evaluate(X_test, Y_test, verbose=0)
  32. print('Test score:', score)
  33. print('Test accuracy:', accuracy)
  34. '''
  35. Test score: 0.11954063682
  36. Test accuracy: 0.9621
  37. '''

练习

上面的代码已经编写为函数。改变一些超参数并看看会发生什么。

  1. # 用于构造卷积神经网络的函数
  2. # 如果你希望的话,随便添加参数
  3. def build_model():
  4. """"""
  5. model = Sequential()
  6. model.add(Convolution2D(nb_filters, nb_conv, nb_conv,
  7. border_mode='valid',
  8. input_shape=(1, img_rows, img_cols)))
  9. model.add(Activation('relu'))
  10. model.add(Convolution2D(nb_filters, nb_conv, nb_conv))
  11. model.add(Activation('relu'))
  12. model.add(MaxPooling2D(pool_size=(nb_pool, nb_pool)))
  13. model.add(Dropout(0.25))
  14. model.add(Flatten())
  15. model.add(Dense(128))
  16. model.add(Activation('relu'))
  17. model.add(Dropout(0.5))
  18. model.add(Dense(nb_classes))
  19. model.add(Activation('softmax'))
  20. model.compile(loss='categorical_crossentropy',
  21. optimizer='sgd',
  22. metrics=['accuracy'])
  23. model.fit(X_train, Y_train, batch_size=batch_size,
  24. nb_epoch=nb_epoch,verbose=1,
  25. validation_data=(X_test, Y_test))
  26. # 在测试集上评估模型
  27. score, accuracy = model.evaluate(X_test, Y_test, verbose=0)
  28. print('Test score:', score)
  29. print('Test accuracy:', accuracy)
  30. # 计算需要多久来构建模型并测试
  31. %timeit -n1 -r1 build_model()
  32. '''
  33. Train on 11918 samples, validate on 10000 samples
  34. Epoch 1/2
  35. 11918/11918 [==============================] - 1s - loss: 0.5634 - acc: 0.7860 - val_loss: 0.3574 - val_acc: 0.9363
  36. Epoch 2/2
  37. 11918/11918 [==============================] - 1s - loss: 0.2372 - acc: 0.9292 - val_loss: 0.2253 - val_acc: 0.9190
  38. Test score: 0.225333989978
  39. Test accuracy: 0.919
  40. 1 loop, best of 1: 5.45 s per loop
  41. '''

批量标准化

在每批中标准化前一层的激活,即应用一个变换,保持激活均值接近 0 且标准差接近 1。

如何在 Keras 中 BatchNorm

  1. from keras.layers.normalization import BatchNormalization
  2. BatchNormalization(epsilon=1e-06, mode=0,
  3. axis=-1, momentum=0.99,
  4. weights=None, beta_init='zero',
  5. gamma_init='one')
  6. # 尝试向模型添加新的 BatchNormalization 层
  7. # (在 Dropout 层之后)

4.8 深度学习实战

致谢:派生于 Valerio Maggio 的 deep-learning-keras-tensorflow

从头开始构建和训练你自己的 ConvNet 可能很难并且是一项长期任务。在深度学习中使用的一个常见技巧是使用预训练的模型,并将其微调到它将用于的特定数据。

Keras 中的著名模型

此笔记本包含以下 Keras 模型的代码和参考(收集自 https://github.com/fchollet/deep-learning-models)。

  • VGG16
  • VGG19
  • ResNet50
  • Inception v3

参考

所有架构都兼容 TensorFlow 和 Theano,并且在实例化时,模型将根据 Keras 配置文件中设置的图像维度顺序构建,位于~/.keras/keras.json。例如,如果你设置了image_dim_ordering=tf,则根据 TensorFlow 维度顺序约定“Width-Height-Depth”,来构建从此仓库加载的任何模型。

Keras 配置文件

  1. !cat ~/.keras/keras.json
  2. '''
  3. {
  4. "image_dim_ordering": "th",
  5. "floatx": "float32",
  6. "epsilon": 1e-07,
  7. "backend": "theano"
  8. }
  9. '''
  10. !sed -i 's/theano/tensorflow/g' ~/.keras/keras.json
  11. !cat ~/.keras/keras.json
  12. '''
  13. {
  14. "image_dim_ordering": "th",
  15. "floatx": "float32",
  16. "epsilon": 1e-07,
  17. "backend": "tensorflow"
  18. }
  19. '''
  20. import keras
  21. # Using TensorFlow backend.
  22. import theano
  23. '''
  24. Using gpu device 0: GeForce GTX 760 (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
  25. '''
  26. !sed -i 's/tensorflow/theano/g' ~/.keras/keras.json
  27. !cat ~/.keras/keras.json
  28. '''
  29. {
  30. "image_dim_ordering": "th",
  31. "backend": "theano",
  32. "floatx": "float32",
  33. "epsilon": 1e-07
  34. }
  35. '''
  36. import keras
  37. '''
  38. Using Theano backend.
  39. Using gpu device 0: GeForce GTX 760 (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
  40. '''
  41. # %load deep_learning_models/imagenet_utils.py
  42. import numpy as np
  43. import json
  44. from keras.utils.data_utils import get_file
  45. from keras import backend as K
  46. CLASS_INDEX = None
  47. CLASS_INDEX_PATH = 'https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json'
  48. def preprocess_input(x, dim_ordering='default'):
  49. if dim_ordering == 'default':
  50. dim_ordering = K.image_dim_ordering()
  51. assert dim_ordering in {'tf', 'th'}
  52. if dim_ordering == 'th':
  53. x[:, 0, :, :] -= 103.939
  54. x[:, 1, :, :] -= 116.779
  55. x[:, 2, :, :] -= 123.68
  56. # 'RGB'->'BGR'
  57. x = x[:, ::-1, :, :]
  58. else:
  59. x[:, :, :, 0] -= 103.939
  60. x[:, :, :, 1] -= 116.779
  61. x[:, :, :, 2] -= 123.68
  62. # 'RGB'->'BGR'
  63. x = x[:, :, :, ::-1]
  64. return x
  65. def decode_predictions(preds):
  66. global CLASS_INDEX
  67. assert len(preds.shape) == 2 and preds.shape[1] == 1000
  68. if CLASS_INDEX is None:
  69. fpath = get_file('imagenet_class_index.json',
  70. CLASS_INDEX_PATH,
  71. cache_subdir='models')
  72. CLASS_INDEX = json.load(open(fpath))
  73. indices = np.argmax(preds, axis=-1)
  74. results = []
  75. for i in indices:
  76. results.append(CLASS_INDEX[str(i)])
  77. return results
  78. '''
  79. Using Theano backend.
  80. Using gpu device 0: GeForce GTX 760 (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
  81. '''
  82. IMAGENET_FOLDER = '../img/imagenet' #in the repo

VGG16

  1. # %load deep_learning_models/vgg16.py
  2. '''
  3. 用于 Keras 的 VGG16 模型。
  4. # 参考:
  5. - [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556)
  6. '''
  7. from __future__ import print_function
  8. import numpy as np
  9. import warnings
  10. from keras.models import Model
  11. from keras.layers import Flatten, Dense, Input
  12. from keras.layers import Convolution2D, MaxPooling2D
  13. from keras.preprocessing import image
  14. from keras.utils.layer_utils import convert_all_kernels_in_model
  15. from keras.utils.data_utils import get_file
  16. from keras import backend as K
  17. TH_WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_th_dim_ordering_th_kernels.h5'
  18. TF_WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5'
  19. TH_WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_th_dim_ordering_th_kernels_notop.h5'
  20. TF_WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5'
  21. def VGG16(include_top=True, weights='imagenet',
  22. input_tensor=None):
  23. '''实例化 VGG16 架构,可选择加载 ImageNet 上预先训练的权重。
  24. 请注意,使用 TensorFlow 时,为了获得最佳性能,你应该在
  25. `~/.keras/keras.json`的 Keras 配置中设置`image_dim_ordering='tf'`。
  26. 模型和权重与 TensorFlow 和 Theano 兼容。
  27. 模型使用的维度顺序约定是 Keras 配置文件中规定的约定。
  28. # 参数
  29. include_top: 是否在网络顶部包含三个全连接层
  30. weights: `None`(随机初始化)或 "imagenet"(ImageNet 上的预训练)
  31. input_tensor: 可选的 Keras 张量(也就是`layers.Input()`的输出),用作模型的图像输入
  32. # 返回值
  33. Keras 模型实例
  34. '''
  35. if weights not in {'imagenet', None}:
  36. raise ValueError('The `weights` argument should be either '
  37. '`None` (random initialization) or `imagenet` '
  38. '(pre-training on ImageNet).')
  39. # 确定合适的输入大小
  40. if K.image_dim_ordering() == 'th':
  41. if include_top:
  42. input_shape = (3, 224, 224)
  43. else:
  44. input_shape = (3, None, None)
  45. else:
  46. if include_top:
  47. input_shape = (224, 224, 3)
  48. else:
  49. input_shape = (None, None, 3)
  50. if input_tensor is None:
  51. img_input = Input(shape=input_shape)
  52. else:
  53. if not K.is_keras_tensor(input_tensor):
  54. img_input = Input(tensor=input_tensor)
  55. else:
  56. img_input = input_tensor
  57. # 块 1
  58. x = Convolution2D(64, 3, 3, activation='relu', border_mode='same', name='block1_conv1')(img_input)
  59. x = Convolution2D(64, 3, 3, activation='relu', border_mode='same', name='block1_conv2')(x)
  60. x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)
  61. # 块 2
  62. x = Convolution2D(128, 3, 3, activation='relu', border_mode='same', name='block2_conv1')(x)
  63. x = Convolution2D(128, 3, 3, activation='relu', border_mode='same', name='block2_conv2')(x)
  64. x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)
  65. # 块 3
  66. x = Convolution2D(256, 3, 3, activation='relu', border_mode='same', name='block3_conv1')(x)
  67. x = Convolution2D(256, 3, 3, activation='relu', border_mode='same', name='block3_conv2')(x)
  68. x = Convolution2D(256, 3, 3, activation='relu', border_mode='same', name='block3_conv3')(x)
  69. x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)
  70. # 块 4
  71. x = Convolution2D(512, 3, 3, activation='relu', border_mode='same', name='block4_conv1')(x)
  72. x = Convolution2D(512, 3, 3, activation='relu', border_mode='same', name='block4_conv2')(x)
  73. x = Convolution2D(512, 3, 3, activation='relu', border_mode='same', name='block4_conv3')(x)
  74. x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)
  75. # 块 5
  76. x = Convolution2D(512, 3, 3, activation='relu', border_mode='same', name='block5_conv1')(x)
  77. x = Convolution2D(512, 3, 3, activation='relu', border_mode='same', name='block5_conv2')(x)
  78. x = Convolution2D(512, 3, 3, activation='relu', border_mode='same', name='block5_conv3')(x)
  79. x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)
  80. if include_top:
  81. # 分类块
  82. x = Flatten(name='flatten')(x)
  83. x = Dense(4096, activation='relu', name='fc1')(x)
  84. x = Dense(4096, activation='relu', name='fc2')(x)
  85. x = Dense(1000, activation='softmax', name='predictions')(x)
  86. # 创建模型
  87. model = Model(img_input, x)
  88. # 加载权重
  89. if weights == 'imagenet':
  90. print('K.image_dim_ordering:', K.image_dim_ordering())
  91. if K.image_dim_ordering() == 'th':
  92. if include_top:
  93. weights_path = get_file('vgg16_weights_th_dim_ordering_th_kernels.h5',
  94. TH_WEIGHTS_PATH,
  95. cache_subdir='models')
  96. else:
  97. weights_path = get_file('vgg16_weights_th_dim_ordering_th_kernels_notop.h5',
  98. TH_WEIGHTS_PATH_NO_TOP,
  99. cache_subdir='models')
  100. model.load_weights(weights_path)
  101. if K.backend() == 'tensorflow':
  102. warnings.warn('You are using the TensorFlow backend, yet you '
  103. 'are using the Theano '
  104. 'image dimension ordering convention '
  105. '(`image_dim_ordering="th"`). '
  106. 'For best performance, set '
  107. '`image_dim_ordering="tf"` in '
  108. 'your Keras config '
  109. 'at ~/.keras/keras.json.')
  110. convert_all_kernels_in_model(model)
  111. else:
  112. if include_top:
  113. weights_path = get_file('vgg16_weights_tf_dim_ordering_tf_kernels.h5',
  114. TF_WEIGHTS_PATH,
  115. cache_subdir='models')
  116. else:
  117. weights_path = get_file('vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5',
  118. TF_WEIGHTS_PATH_NO_TOP,
  119. cache_subdir='models')
  120. model.load_weights(weights_path)
  121. if K.backend() == 'theano':
  122. convert_all_kernels_in_model(model)
  123. return model
  124. import os
  125. model = VGG16(include_top=True, weights='imagenet')
  126. img_path = os.path.join(IMAGENET_FOLDER, 'strawberry_1157.jpeg')
  127. img = image.load_img(img_path, target_size=(224, 224))
  128. x = image.img_to_array(img)
  129. x = np.expand_dims(x, axis=0)
  130. x = preprocess_input(x)
  131. print('Input image shape:', x.shape)
  132. preds = model.predict(x)
  133. print('Predicted:', decode_predictions(preds))

K.image_dim_ordering: th
Input image shape: (1, 3, 224, 224)
Predicted: [[‘n07745940’, ‘strawberry’]]

Fine Tuning of a Pre-Trained Model

  1. def VGG16_FT(weights_path = None,
  2. img_width = 224, img_height = 224,
  3. f_type = None, n_labels = None ):
  4. """调优基于 VGG16 的网络"""
  5. # 最后一层之前都是 VGG16!
  6. model = Sequential()
  7. model.add(ZeroPadding2D((1, 1),
  8. input_shape=(3,
  9. img_width, img_height)))
  10. model.add(Convolution2D(64, 3, 3, activation='relu',
  11. name='conv1_1'))
  12. model.add(ZeroPadding2D((1, 1)))
  13. model.add(Convolution2D(64, 3, 3, activation='relu',
  14. name='conv1_2'))
  15. model.add(MaxPooling2D((2, 2), strides=(2, 2)))
  16. model.add(ZeroPadding2D((1, 1)))
  17. model.add(Convolution2D(128, 3, 3, activation='relu',
  18. name='conv2_1'))
  19. model.add(ZeroPadding2D((1, 1)))
  20. model.add(Convolution2D(128, 3, 3, activation='relu',
  21. name='conv2_2'))
  22. model.add(MaxPooling2D((2, 2), strides=(2, 2)))
  23. model.add(ZeroPadding2D((1, 1)))
  24. model.add(Convolution2D(256, 3, 3, activation='relu',
  25. name='conv3_1'))
  26. model.add(ZeroPadding2D((1, 1)))
  27. model.add(Convolution2D(256, 3, 3, activation='relu',
  28. name='conv3_2'))
  29. model.add(ZeroPadding2D((1, 1)))
  30. model.add(Convolution2D(256, 3, 3, activation='relu',
  31. name='conv3_3'))
  32. model.add(MaxPooling2D((2, 2), strides=(2, 2)))
  33. model.add(ZeroPadding2D((1, 1)))
  34. model.add(Convolution2D(512, 3, 3, activation='relu',
  35. name='conv4_1'))
  36. model.add(ZeroPadding2D((1, 1)))
  37. model.add(Convolution2D(512, 3, 3, activation='relu',
  38. name='conv4_2'))
  39. model.add(ZeroPadding2D((1, 1)))
  40. model.add(Convolution2D(512, 3, 3, activation='relu',
  41. name='conv4_3'))
  42. model.add(MaxPooling2D((2, 2), strides=(2, 2)))
  43. model.add(ZeroPadding2D((1, 1)))
  44. model.add(Convolution2D(512, 3, 3, activation='relu',
  45. name='conv5_1'))
  46. model.add(ZeroPadding2D((1, 1)))
  47. model.add(Convolution2D(512, 3, 3, activation='relu',
  48. name='conv5_2'))
  49. model.add(ZeroPadding2D((1, 1)))
  50. model.add(Convolution2D(512, 3, 3, activation='relu',
  51. name='conv5_3'))
  52. model.add(MaxPooling2D((2, 2), strides=(2, 2)))
  53. model.add(Flatten())
  54. # 插入新的层
  55. model.add(Dense(768, activation='sigmoid'))
  56. model.add(Dropout(0.0))
  57. model.add(Dense(768, activation='sigmoid'))
  58. model.add(Dropout(0.0))
  59. last_layer = Dense(n_labels, activation='sigmoid')
  60. loss = 'categorical_crossentropy'
  61. optimizer = optimizers.Adam(lr=1e-4, epsilon=1e-08)
  62. batch_size = 128
  63. assert os.path.exists(weights_path), 'Model weights not found (see "weights_path" variable in script).'
  64. #model.load_weights(weights_path)
  65. f = h5py.File(weights_path)
  66. for k in range(len(f.attrs['layer_names'])):
  67. g = f[f.attrs['layer_names'][k]]
  68. weights = [g[g.attrs['weight_names'][p]]
  69. for p in range(len(g.attrs['weight_names']))]
  70. if k >= len(model.layers):
  71. break
  72. else:
  73. model.layers[k].set_weights(weights)
  74. f.close()
  75. print('Model loaded.')
  76. model.add(last_layer)
  77. # 将前 25 层(最后的卷积块之前)设为不可训练
  78. # (权重不会更新)
  79. for layer in model.layers[:25]:
  80. layer.trainable = False
  81. # 使用 SGD 或动量优化器以及非常小的学习率编译模型
  82. model.compile(loss=loss,
  83. optimizer=optimizer,
  84. metrics=['accuracy'])
  85. return model, batch_size

实战:

尝试用其他模型做相同的事情

  1. %load deep_learning_models/vgg19.py
  2. %load deep_learning_models/resnet50.py

4.9 无监督学习

致谢:派生于 Valerio Maggio 的 deep-learning-keras-tensorflow

自编码器

自动编码器是用于学习有效编码的人工神经网络。自动编码器的目的是学习一组数据表示(编码),通常用于降维。

四、Keras - 图25

无监督学习是一种机器学习算法,用于从没有标签的输入数据组成的数据集中做出推断。 最常见的无监督学习方法是聚类分析,用于探索性数据分析来发现隐藏的模式或数据分组。

  1. # 基于:https://blog.keras.io/building-autoencoders-in-keras.html
  2. encoding_dim = 32
  3. input_img = Input(shape=(784,))
  4. encoded = Dense(encoding_dim, activation='relu')(input_img)
  5. decoded = Dense(784, activation='sigmoid')(encoded)
  6. autoencoder = Model(input=input_img, output=decoded)
  7. encoder = Model(input=input_img, output=encoded)
  8. encoded_input = Input(shape=(encoding_dim,))
  9. decoder_layer = autoencoder.layers[-1]
  10. decoder = Model(input=encoded_input, output=decoder_layer(encoded_input))
  11. autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
  12. (x_train, _), (x_test, _) = mnist.load_data()
  13. x_train = x_train.astype('float32') / 255.
  14. x_test = x_test.astype('float32') / 255.
  15. x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
  16. x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
  17. #note: x_train, x_train :)
  18. autoencoder.fit(x_train, x_train,
  19. nb_epoch=50,
  20. batch_size=256,
  21. shuffle=True,
  22. validation_data=(x_test, x_test))
  23. '''
  24. Train on 60000 samples, validate on 10000 samples
  25. Epoch 1/50
  26. 60000/60000 [==============================] - 20s - loss: 0.3832 - val_loss: 0.2730
  27. Epoch 2/50
  28. 60000/60000 [==============================] - 19s - loss: 0.2660 - val_loss: 0.2557
  29. Epoch 3/50
  30. 60000/60000 [==============================] - 18s - loss: 0.2455 - val_loss: 0.2331
  31. Epoch 4/50
  32. 60000/60000 [==============================] - 19s - loss: 0.2254 - val_loss: 0.2152
  33. Epoch 5/50
  34. 60000/60000 [==============================] - 19s - loss: 0.2099 - val_loss: 0.2018
  35. ...
  36. Epoch 45/50
  37. 60000/60000 [==============================] - 19s - loss: 0.1075 - val_loss: 0.1057
  38. Epoch 46/50
  39. 60000/60000 [==============================] - 19s - loss: 0.1070 - val_loss: 0.1052
  40. Epoch 47/50
  41. 60000/60000 [==============================] - 20s - loss: 0.1065 - val_loss: 0.1047
  42. Epoch 48/50
  43. 60000/60000 [==============================] - 17s - loss: 0.1061 - val_loss: 0.1043
  44. Epoch 49/50
  45. 60000/60000 [==============================] - 29s - loss: 0.1056 - val_loss: 0.1039
  46. Epoch 50/50
  47. 60000/60000 [==============================] - 21s - loss: 0.1052 - val_loss: 0.1034
  48. <keras.callbacks.History at 0x285017b8>
  49. '''

测试自编码器

  1. encoded_imgs = encoder.predict(x_test)
  2. decoded_imgs = decoder.predict(encoded_imgs)
  3. n = 10
  4. plt.figure(figsize=(20, 4))
  5. for i in range(n):
  6. # 原始
  7. ax = plt.subplot(2, n, i + 1)
  8. plt.imshow(x_test[i].reshape(28, 28))
  9. plt.gray()
  10. ax.get_xaxis().set_visible(False)
  11. ax.get_yaxis().set_visible(False)
  12. # 重构
  13. ax = plt.subplot(2, n, i + 1 + n)
  14. plt.imshow(decoded_imgs[i].reshape(28, 28))
  15. plt.gray()
  16. ax.get_xaxis().set_visible(False)
  17. ax.get_yaxis().set_visible(False)
  18. plt.show()

四、Keras - 图26

使用自编码器的样本生成

  1. encoded_imgs = np.random.rand(10,32)
  2. decoded_imgs = decoder.predict(encoded_imgs)
  3. n = 10
  4. plt.figure(figsize=(20, 4))
  5. for i in range(n):
  6. # 生成
  7. ax = plt.subplot(2, n, i + 1 + n)
  8. plt.imshow(decoded_imgs[i].reshape(28, 28))
  9. plt.gray()
  10. ax.get_xaxis().set_visible(False)
  11. ax.get_yaxis().set_visible(False)
  12. plt.show()

四、Keras - 图27

预训练编码器

自编码器的威力之一,是使用编码器从特征向量生成有意义的表示。

  1. # 使用编码器来预训练分类器

使用人工神经网络的自然语言处理

“非上帝不信,非数据不认。” – W. Edwards Deming, 统计学家

词嵌入

是什么?

将单词转换为高维空间中的向量。 每个维度表示一个方面,如性别,对象/单词的类型。“词嵌入”是一系列自然语言处理技术,旨在将语义意义映射到几何空间。 这通过将数字向量与字典中的每个单词相关联来完成,使得任何两个向量之间的距离(例如,L2 距离或更常见的余弦距离)将捕获两个相关单词之间的语义关系的一部分。由这些向量形成的几何空间称为嵌入空间。

为什么?

通过将单词转换为向量,我们在单词之间建立关系。单词在维度中更相似,他们的分数就越接近。

示例

  1. W(green) = (1.2, 0.98, 0.05, ...)
  2. W(red) = (1.1, 0.2, 0.5, ...)

这里greenred的向量值在一个维度上非常相似,因为它们都是颜色。第二维的值非常不同,因为红色可能描绘了训练数据中的负面内容,而绿色则用于正面。通过向量化,我们间接在不同类型的词之间建立了关系。

使用 gensim 的word2vec示例

  1. from gensim.models import word2vec
  2. from gensim.models.word2vec import Word2Vec
  3. '''
  4. Using gpu device 0: GeForce GTX 760 (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
  5. '''

从数据目录中读取博客文章

  1. import os
  2. import pickle
  3. DATA_DIRECTORY = os.path.join(os.path.abspath(os.path.curdir), 'data')
  4. male_posts = []
  5. female_post = []
  6. with open(os.path.join(DATA_DIRECTORY,"male_blog_list.txt"),"rb") as male_file:
  7. male_posts= pickle.load(male_file)
  8. with open(os.path.join(DATA_DIRECTORY,"female_blog_list.txt"),"rb") as female_file:
  9. female_posts = pickle.load(female_file)
  10. print(len(female_posts))
  11. print(len(male_posts))
  12. '''
  13. 2252
  14. 2611
  15. '''
  16. filtered_male_posts = list(filter(lambda p: len(p) > 0, male_posts))
  17. filtered_female_posts = list(filter(lambda p: len(p) > 0, female_posts))
  18. posts = filtered_female_posts + filtered_male_posts
  19. print(len(filtered_female_posts), len(filtered_male_posts), len(posts))
  20. # 2247 2595 4842

Word2Vec

  1. w2v = Word2Vec(size=200, min_count=1)
  2. w2v.build_vocab(map(lambda x: x.split(), posts[:100]), )
  3. w2v.vocab
  4. '''
  5. {'see.': <gensim.models.word2vec.Vocab at 0x7f61aa4f1908>,
  6. 'never.': <gensim.models.word2vec.Vocab at 0x7f61aa4f1dd8>,
  7. 'driving': <gensim.models.word2vec.Vocab at 0x7f61aa4f1e48>,
  8. 'buddy': <gensim.models.word2vec.Vocab at 0x7f61aa4f0240>,
  9. 'DEFENSE': <gensim.models.word2vec.Vocab at 0x7f61aa4f0438>,
  10. ...}
  11. '''
  12. w2v.similarity('I', 'My')
  13. # 0.082851942583535218
  14. print(posts[5])
  15. w2v.similarity('ring', 'husband')
  16. '''
  17. I've tried starting blog after blog and it just never feels right. Then I read today that it feels strange to most people, but the more you do it the better it gets (hmm, sounds suspiciously like something else!) so I decided to give it another try. My husband bought me a notepad at urlLink McNally (the best bookstore in Western Canada) with that title and a picture of a 50s housewife grinning desperately. Each page has something funny like "New curtains! Hurrah!". For some reason it struck me as absolutely hilarious and has stuck in my head ever since. What were those women thinking?
  18. 0.037229111896779618
  19. '''
  20. w2v.similarity('ring', 'housewife')
  21. # 0.11547398696865138
  22. w2v.similarity('women', 'housewife') # 多样性友好
  23. # -0.14627530812290576

Doc2Vec

word2vec 的相同技术可以扩展到文档。 在这里,我们实现了 word2vec 中完成的所有工作,并且我们也将文档向量化。

  1. import numpy as np
  2. # 0 for male, 1 for female
  3. y_posts = np.concatenate((np.zeros(len(filtered_male_posts)),
  4. np.ones(len(filtered_female_posts))))
  5. len(y_posts)
  6. # 4842

用于句子分类的卷积神经网络

为情感分析训练卷积网络。基于 Yoon Kim 的“用于句子分类的卷积神经网络”

‘CNN-non-static’ 在 61 个迭代之后达到 82.1%,设置如下:

  1. embedding_dim = 20
  2. filter_sizes = (3, 4)
  3. num_filters = 3
  4. dropout_prob = (0.7, 0.8)
  5. hidden_dims = 100

‘CNN-rand’ 在 7-8 个迭代之后达到 78-79%,设置如下:

  1. embedding_dim = 20
  2. filter_sizes = (3, 4)
  3. num_filters = 150
  4. dropout_prob = (0.25, 0.5)
  5. hidden_dims = 150

‘CNN-static’ 在 7 个迭代之后达到 75.4%,设置如下:

  1. embedding_dim = 100
  2. filter_sizes = (3, 4)
  3. num_filters = 150
  4. dropout_prob = (0.25, 0.5)
  5. hidden_dims = 150

事实证明,如此小的数据集“每个评论只有一个句子的电影评论”(Pang 和 Lee,2005)所需的网络,要比原始文章中介绍的要小得多:

  • 嵌入维度只有 20(而不是 300;’CNN-static’ 需要大约 100)
  • 过滤器大小为 2(而不是 3)
  • 更高的丢弃概率,以及
  • 3 个过滤器对于 ‘CNN-non-static’ 就足够了(而不是 100)
  • 嵌入初始化不需要预构建的 Google Word2Vec 数据

在相同的“电影评论”数据集上训练 Word2Vec 足以实现文章中报道的性能(81.6%)。另一个明显的区别是长度为 2 的 slidind 最大池化窗口,而不是文章所示的,整个特征映射上的最大池化。

  1. import numpy as np
  2. import data_helpers
  3. from w2v import train_word2vec
  4. from keras.models import Sequential, Model
  5. from keras.layers import (Activation, Dense, Dropout, Embedding,
  6. Flatten, Input, Merge,
  7. Convolution1D, MaxPooling1D)
  8. np.random.seed(2)
  9. '''
  10. Using gpu device 0: GeForce GTX 760 (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
  11. Using Theano backend.
  12. '''

参数

模型变体。详细信息请参阅 Kim Yoon 用于句子分类的卷积神经网络,第 3 节。

  1. model_variation = 'CNN-rand' # CNN-rand | CNN-non-static | CNN-static
  2. print('Model variation is %s' % model_variation)
  3. # 模型变体是 CNN-rand
  4. # 模型超参数
  5. sequence_length = 56
  6. embedding_dim = 20
  7. filter_sizes = (3, 4)
  8. num_filters = 150
  9. dropout_prob = (0.25, 0.5)
  10. hidden_dims = 150
  11. # 训练参数
  12. batch_size = 32
  13. num_epochs = 100
  14. val_split = 0.1
  15. # Word2Vec 参数,请见 train_word2vec
  16. min_word_count = 1 # 最小词数
  17. context = 10 # 上下文窗口大小

数据准备

  1. # 加载数据
  2. print("Loading data...")
  3. x, y, vocabulary, vocabulary_inv = data_helpers.load_data()
  4. if model_variation=='CNN-non-static' or model_variation=='CNN-static':
  5. embedding_weights = train_word2vec(x, vocabulary_inv,
  6. embedding_dim, min_word_count,
  7. context)
  8. if model_variation=='CNN-static':
  9. x = embedding_weights[0][x]
  10. elif model_variation=='CNN-rand':
  11. embedding_weights = None
  12. else:
  13. raise ValueError('Unknown model variation')
  14. # Loading data...
  15. # 打乱数据
  16. shuffle_indices = np.random.permutation(np.arange(len(y)))
  17. x_shuffled = x[shuffle_indices]
  18. y_shuffled = y[shuffle_indices].argmax(axis=1)
  19. print("Vocabulary Size: {:d}".format(len(vocabulary)))
  20. # Vocabulary Size: 18765

构建 CNN 模型

  1. graph_in = Input(shape=(sequence_length, embedding_dim))
  2. convs = []
  3. for fsz in filter_sizes:
  4. conv = Convolution1D(nb_filter=num_filters,
  5. filter_length=fsz,
  6. border_mode='valid',
  7. activation='relu',
  8. subsample_length=1)(graph_in)
  9. pool = MaxPooling1D(pool_length=2)(conv)
  10. flatten = Flatten()(pool)
  11. convs.append(flatten)
  12. if len(filter_sizes)>1:
  13. out = Merge(mode='concat')(convs)
  14. else:
  15. out = convs[0]
  16. graph = Model(input=graph_in, output=out)
  17. # 主要序列模型
  18. model = Sequential()
  19. if not model_variation=='CNN-static':
  20. model.add(Embedding(len(vocabulary), embedding_dim, input_length=sequence_length,
  21. weights=embedding_weights))
  22. model.add(Dropout(dropout_prob[0], input_shape=(sequence_length, embedding_dim)))
  23. model.add(graph)
  24. model.add(Dense(hidden_dims))
  25. model.add(Dropout(dropout_prob[1]))
  26. model.add(Activation('relu'))
  27. model.add(Dense(1))
  28. model.add(Activation('sigmoid'))
  29. model.compile(loss='binary_crossentropy', optimizer='rmsprop',
  30. metrics=['accuracy'])
  31. # 训练模型
  32. # ==================================================
  33. model.fit(x_shuffled, y_shuffled, batch_size=batch_size,
  34. nb_epoch=num_epochs, validation_split=val_split, verbose=2)
  35. '''
  36. Train on 9595 samples, validate on 1067 samples
  37. Epoch 1/100
  38. 1s - loss: 0.6516 - acc: 0.6005 - val_loss: 0.5692 - val_acc: 0.7151
  39. Epoch 2/100
  40. 1s - loss: 0.4556 - acc: 0.7896 - val_loss: 0.5154 - val_acc: 0.7573
  41. Epoch 3/100
  42. 1s - loss: 0.3556 - acc: 0.8532 - val_loss: 0.5050 - val_acc: 0.7816
  43. Epoch 4/100
  44. 1s - loss: 0.2978 - acc: 0.8779 - val_loss: 0.5335 - val_acc: 0.7901
  45. Epoch 5/100
  46. 1s - loss: 0.2599 - acc: 0.8972 - val_loss: 0.5592 - val_acc: 0.7769
  47. ...
  48. Epoch 95/100
  49. 1s - loss: 0.0012 - acc: 0.9997 - val_loss: 2.9582 - val_acc: 0.7545
  50. Epoch 96/100
  51. 1s - loss: 0.0058 - acc: 0.9989 - val_loss: 2.8944 - val_acc: 0.7479
  52. Epoch 97/100
  53. 1s - loss: 0.0094 - acc: 0.9985 - val_loss: 2.7146 - val_acc: 0.7516
  54. Epoch 98/100
  55. 1s - loss: 0.0044 - acc: 0.9993 - val_loss: 2.9052 - val_acc: 0.7498
  56. Epoch 99/100
  57. 1s - loss: 0.0030 - acc: 0.9995 - val_loss: 3.1474 - val_acc: 0.7470
  58. Epoch 100/100
  59. 1s - loss: 0.0051 - acc: 0.9990 - val_loss: 3.1746 - val_acc: 0.7451
  60. <keras.callbacks.History at 0x7f78362ae400>
  61. '''

另一个示例

使用 Keras + GloVe - 用于单词表示的全局向量

在 Keras 模型中使用预训练的词向量

参考:https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html

4.10 循环神经网络

致谢:派生于 Valerio Maggio 的 deep-learning-keras-tensorflow

RNN

四、Keras - 图28

循环神经网络(RNN)是一类人工神经网络,其中单元之间的连接形成有向循环。 这创建了网络的内部状态,允许它展示动态时间行为。

  1. keras.layers.recurrent.SimpleRNN(output_dim,
  2. init='glorot_uniform', inner_init='orthogonal', activation='tanh',
  3. W_regularizer=None, U_regularizer=None, b_regularizer=None,
  4. dropout_W=0.0, dropout_U=0.0)

时间上的反向传播

与前馈神经网络相反,RNN 的特征在于编码更长的过去信息的能力,因此非常适合于序列模型。 BPTT 扩展了普通的 BP 算法来适应循环神经结构。

四、Keras - 图29

  1. %matplotlib inline
  2. import numpy as np
  3. import pandas as pd
  4. import theano
  5. import theano.tensor as T
  6. import keras
  7. from keras.models import Sequential
  8. from keras.layers import Dense, Activation
  9. from keras.preprocessing import image
  10. from __future__ import print_function
  11. import numpy as np
  12. import matplotlib.pyplot as plt
  13. from keras.datasets import imdb
  14. from keras.datasets import mnist
  15. from keras.models import Sequential
  16. from keras.layers import Dense, Dropout, Activation, Flatten
  17. from keras.layers import Convolution2D, MaxPooling2D
  18. from keras.utils import np_utils
  19. from keras.preprocessing import sequence
  20. from keras.layers.embeddings import Embedding
  21. from keras.layers.recurrent import LSTM, GRU, SimpleRNN
  22. from sklearn.preprocessing import LabelEncoder
  23. from sklearn.preprocessing import StandardScaler
  24. from sklearn.cross_validation import train_test_split
  25. from keras.layers.core import Activation, TimeDistributedDense, RepeatVector
  26. from keras.callbacks import EarlyStopping, ModelCheckpoint
  27. # Using Theano backend.

IMDB 情感分类任务

这是用于二元情感分类的数据集,其包含比先前基准数据集更多的数据。IMDB 为训练提供了 25,000 个高级性电影评论,还有 25,000 个用于测试。还有其他未标记的数据可供使用。提供原始文本和已处理的词袋格式。

http://ai.stanford.edu/~amaas/data/sentiment/

数据准备 - IMDB

  1. max_features = 20000
  2. maxlen = 100 # 在这个数量的单词之后截断文本(以及前 max_features 个最常见的单词)
  3. batch_size = 32
  4. print("Loading data...")
  5. (X_train, y_train), (X_test, y_test) = imdb.load_data(nb_words=max_features, test_split=0.2)
  6. print(len(X_train), 'train sequences')
  7. print(len(X_test), 'test sequences')
  8. print('Example:')
  9. print(X_train[:1])
  10. print("Pad sequences (samples x time)")
  11. X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
  12. X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
  13. print('X_train shape:', X_train.shape)
  14. print('X_test shape:', X_test.shape)
  15. '''
  16. Loading data...
  17. 20000 train sequences
  18. 5000 test sequences
  19. Example:
  20. [ [1, 20, 28, 716, 48, 495, 79, 27, 493, 8, 5067, 7, 50, 5, 4682, 13075, 10, 5, 852, 157, 11, 5, 1716, 3351, 10, 5, 500, 7308, 6, 33, 256, 41, 13610, 7, 17, 23, 48, 1537, 3504, 26, 269, 929, 18, 2, 7, 2, 4284, 8, 105, 5, 2, 182, 314, 38, 98, 103, 7, 36, 2184, 246, 360, 7, 19, 396, 17, 26, 269, 929, 18, 1769, 493, 6, 116, 7, 105, 5, 575, 182, 27, 5, 1002, 1085, 130, 62, 17, 24, 89, 17, 13, 381, 1421, 8, 5167, 7, 5, 2723, 38, 325, 7, 17, 23, 93, 9, 156, 252, 19, 235, 20, 28, 5, 104, 76, 7, 17, 169, 35, 14764, 17, 23, 1460, 7, 36, 2184, 934, 56, 2134, 6, 17, 891, 214, 11, 5, 1552, 6, 92, 6, 33, 256, 82, 7]]
  21. Pad sequences (samples x time)
  22. X_train shape: (20000L, 100L)
  23. X_test shape: (5000L, 100L)
  24. '''

模型构建

  1. print('Build model...')
  2. model = Sequential()
  3. model.add(Embedding(max_features, 128, input_length=maxlen))
  4. model.add(SimpleRNN(128))
  5. model.add(Dropout(0.5))
  6. model.add(Dense(1))
  7. model.add(Activation('sigmoid'))
  8. # 尝试使用不同的优化器和不同的优化器配置
  9. model.compile(loss='binary_crossentropy', optimizer='adam', class_mode="binary")
  10. print("Train...")
  11. model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=1, validation_data=(X_test, y_test), show_accuracy=True)
  12. '''
  13. Build model...
  14. Train...
  15. Train on 20000 samples, validate on 5000 samples
  16. Epoch 1/1
  17. 20000/20000 [==============================] - 174s - loss: 0.7213 - val_loss: 0.6179
  18. <keras.callbacks.History at 0x20519860>
  19. '''

LSTM

LSTM 网络是一种人工神经网络,它包含 LSTM 块而不是常规网络单元,或者除了常规网络单元之外还包含 LSTM 块。 LSTM 块可以被描述为“智能”网络单元,它可以记住任意时间长度的值。

与传统的 RNN 不同,长短期记忆网络非常适合从经验中学习,以便在重要事件之间存在非常长的未知大小的滞后时,对时间序列进行分类,处理和预测。

四、Keras - 图30

  1. keras.layers.recurrent.LSTM(output_dim, init='glorot_uniform', inner_init='orthogonal',
  2. forget_bias_init='one', activation='tanh',
  3. inner_activation='hard_sigmoid',
  4. W_regularizer=None, U_regularizer=None, b_regularizer=None,
  5. dropout_W=0.0, dropout_U=0.0)

GRU

门控循环单元是循环神经网络中的门控机制。与 LSTM 非常相似,它们的参数比 LSTM 少,因为它们没有输出门。

  1. keras.layers.recurrent.GRU(output_dim, init='glorot_uniform', inner_init='orthogonal',
  2. activation='tanh', inner_activation='hard_sigmoid',
  3. W_regularizer=None, U_regularizer=None, b_regularizer=None,
  4. dropout_W=0.0, dropout_U=0.0)

你的回合! - RNN 实战

  1. print('Build model...')
  2. model = Sequential()
  3. model.add(Embedding(max_features, 128, input_length=maxlen))
  4. # 玩转它们!尝试获得更好的结果!
  5. #model.add(SimpleRNN(128))
  6. #model.add(GRU(128))
  7. #model.add(LSTM(128))
  8. model.add(Dropout(0.5))
  9. model.add(Dense(1))
  10. model.add(Activation('sigmoid'))
  11. # 尝试使用不同的优化器和不同的优化器配置
  12. model.compile(loss='binary_crossentropy', optimizer='adam', class_mode="binary")
  13. print("Train...")
  14. model.fit(X_train, y_train, batch_size=batch_size,
  15. nb_epoch=4, validation_data=(X_test, y_test), show_accuracy=True)
  16. score, acc = model.evaluate(X_test, y_test, batch_size=batch_size, show_accuracy=True)
  17. print('Test score:', score)
  18. print('Test accuracy:', acc)

使用 RNN(LSTM) 的句子生成

  1. from keras.models import Sequential
  2. from keras.layers import Dense, Activation, Dropout
  3. from keras.layers import LSTM
  4. from keras.optimizers import RMSprop
  5. from keras.utils.data_utils import get_file
  6. import numpy as np
  7. import random
  8. import sys
  9. path = get_file('nietzsche.txt', origin="https://s3.amazonaws.com/text-datasets/nietzsche.txt")
  10. text = open(path).read().lower()
  11. print('corpus length:', len(text))
  12. chars = sorted(list(set(text)))
  13. print('total chars:', len(chars))
  14. char_indices = dict((c, i) for i, c in enumerate(chars))
  15. indices_char = dict((i, c) for i, c in enumerate(chars))
  16. # 在 maxlen 个字符的半冗余序列中剪切文本
  17. maxlen = 40
  18. step = 3
  19. sentences = []
  20. next_chars = []
  21. for i in range(0, len(text) - maxlen, step):
  22. sentences.append(text[i: i + maxlen])
  23. next_chars.append(text[i + maxlen])
  24. print('nb sequences:', len(sentences))
  25. print('Vectorization...')
  26. X = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
  27. y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
  28. for i, sentence in enumerate(sentences):
  29. for t, char in enumerate(sentence):
  30. X[i, t, char_indices[char]] = 1
  31. y[i, char_indices[next_chars[i]]] = 1
  32. # 构建模型:单 LSTM
  33. print('Build model...')
  34. model = Sequential()
  35. model.add(LSTM(128, input_shape=(maxlen, len(chars))))
  36. model.add(Dense(len(chars)))
  37. model.add(Activation('softmax'))
  38. optimizer = RMSprop(lr=0.01)
  39. model.compile(loss='categorical_crossentropy', optimizer=optimizer)
  40. def sample(preds, temperature=1.0):
  41. # 用于从概率数组中抽样索引的辅助函数
  42. preds = np.asarray(preds).astype('float64')
  43. preds = np.log(preds) / temperature
  44. exp_preds = np.exp(preds)
  45. preds = exp_preds / np.sum(exp_preds)
  46. probas = np.random.multinomial(1, preds, 1)
  47. return np.argmax(probas)
  48. # 训练模型,输出每个迭代之后生成的文本
  49. for iteration in range(1, 60):
  50. print()
  51. print('-' * 50)
  52. print('Iteration', iteration)
  53. model.fit(X, y, batch_size=128, nb_epoch=1)
  54. start_index = random.randint(0, len(text) - maxlen - 1)
  55. for diversity in [0.2, 0.5, 1.0, 1.2]:
  56. print()
  57. print('----- diversity:', diversity)
  58. generated = ''
  59. sentence = text[start_index: start_index + maxlen]
  60. generated += sentence
  61. print('----- Generating with seed: "' + sentence + '"')
  62. sys.stdout.write(generated)
  63. for i in range(400):
  64. x = np.zeros((1, maxlen, len(chars)))
  65. for t, char in enumerate(sentence):
  66. x[0, t, char_indices[char]] = 1.
  67. preds = model.predict(x, verbose=0)[0]
  68. next_index = sample(preds, diversity)
  69. next_char = indices_char[next_index]
  70. generated += next_char
  71. sentence = sentence[1:] + next_char
  72. sys.stdout.write(next_char)
  73. sys.stdout.flush()
  74. print()
  75. '''
  76. Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
  77. 598016/600901 [============================>.] - ETA: 0s('corpus length:', 600901)
  78. ('total chars:', 59)
  79. ('nb sequences:', 200287)
  80. Vectorization...
  81. Build model...
  82. ()
  83. --------------------------------------------------
  84. ('Iteration', 1)
  85. Epoch 1/1
  86. 200287/200287 [==============================] - 1367s - loss: 1.9977
  87. ()
  88. ('----- diversity:', 0.2)
  89. ----- Generating with seed: "nd the frenzied
  90. speeches of the prophets"
  91. nd the frenzied
  92. speeches of the prophets and the present and and the preases and the soul to the sense of the morals and the some the consequence of the most and one only the some of the proment and interent of the some devertal to the self-consertion of the some deverent of the some distiness and the sense of the some of the morality of the most proves and the some of the some in the seem of the self-conception of the sees of the sense()
  93. ()
  94. ('----- diversity:', 0.5)
  95. ----- Generating with seed: "nd the frenzied
  96. speeches of the prophets"
  97. nd the frenzied
  98. speeches of the prophets of the preat weak to the master of man who onow in interervain of even which who with it is the isitaial conception of the some live the contented the one who exilfacied in the sees to raters, and the passe expecience the inte that the persented in the pass, in the experious of the soulity of the waith the morally distanding of the some of the most interman only and as a period of the sense and o()
  99. ()
  100. ('----- diversity:', 1.0)
  101. ----- Generating with seed: "nd the frenzied
  102. speeches of the prophets"
  103. nd the frenzied
  104. speeches of the prophets of
  105. ar self now no ecerspoped ivent so not,
  106. that itsed undiswerbatarlials. what it is altrenively evok
  107. now be scotnew
  108. prigardiness intagualds, and coumond-grow to
  109. the respence you as penires never wand be
  110. natuented ost ablinice to love worts an who itnopeancew be than mrank againribl
  111. some something lines in the estlenbtupenies of korils divenowry apmains, curte, were,
  112. ind "feulness. a will, natur()
  113. ()
  114. ('----- diversity:', 1.2)
  115. ----- Generating with seed: "nd the frenzied
  116. speeches of the prophets"
  117. nd the frenzied
  118. speeches of the prophets, ind someaterting will stroour hast-fards and lofe beausold, in souby in ruarest, we withquus. "the capinistin and it a mode what it be
  119. my oc, to th[se condectay
  120. of ymo fre
  121. dunt and so asexthersess renieved concecunaulies tound"), from glubiakeitiouals kenty am feelitafouer deceanw or sumpind, and by afolod peall--phasoos of sole
  122. iy copprajakias
  123. in
  124. in adcyont-mean to prives apf-rigionall thust wi()
  125. ()
  126. --------------------------------------------------
  127. ('Iteration', 2)
  128. Epoch 1/1
  129. 40576/200287 [=====>........................] - ETA: 1064s - loss: 1.6878
  130. '''

4.11 使用 LSTM 的 RNN

致谢:派生于 Valerio Maggio 的 deep-learning-keras-tensorflow

四、Keras - 图31

四、Keras - 图32

四、Keras - 图33

来源:http://colah.github.io/posts/2015-08-Understanding-LSTMs

  1. from keras.optimizers import SGD
  2. from keras.preprocessing.text import one_hot, text_to_word_sequence, base_filter
  3. from keras.utils import np_utils
  4. from keras.models import Sequential
  5. from keras.layers.core import Dense, Dropout, Activation
  6. from keras.layers.embeddings import Embedding
  7. from keras.layers.recurrent import LSTM, GRU
  8. from keras.preprocessing import sequence

从数据目录读取博客文章

  1. import os
  2. import pickle
  3. import numpy as np
  4. DATA_DIRECTORY = os.path.join(os.path.abspath(os.path.curdir), 'data')
  5. print(DATA_DIRECTORY)
  6. # /home/valerio/deep-learning-keras-euroscipy2016/data
  7. male_posts = []
  8. female_post = []
  9. with open(os.path.join(DATA_DIRECTORY,"male_blog_list.txt"),"rb") as male_file:
  10. male_posts= pickle.load(male_file)
  11. with open(os.path.join(DATA_DIRECTORY,"female_blog_list.txt"),"rb") as female_file:
  12. female_posts = pickle.load(female_file)
  13. filtered_male_posts = list(filter(lambda p: len(p) > 0, male_posts))
  14. filtered_female_posts = list(filter(lambda p: len(p) > 0, female_posts))
  15. # 文本处理 - 单热构建词索引
  16. male_one_hot = []
  17. female_one_hot = []
  18. n = 30000
  19. for post in filtered_male_posts:
  20. try:
  21. male_one_hot.append(one_hot(post, n, split=" ", filters=base_filter(), lower=True))
  22. except:
  23. continue
  24. for post in filtered_female_posts:
  25. try:
  26. female_one_hot.append(one_hot(post,n,split=" ",filters=base_filter(),lower=True))
  27. except:
  28. continue
  29. # 男性为 0,女性为 1
  30. concatenate_array_rnn = np.concatenate((np.zeros(len(male_one_hot)),
  31. np.ones(len(female_one_hot))))
  32. from sklearn.cross_validation import train_test_split
  33. X_train_rnn, X_test_rnn, y_train_rnn, y_test_rnn = train_test_split(np.concatenate((female_one_hot,male_one_hot)),
  34. concatenate_array_rnn,
  35. test_size=0.2)
  36. maxlen = 100
  37. X_train_rnn = sequence.pad_sequences(X_train_rnn, maxlen=maxlen)
  38. X_test_rnn = sequence.pad_sequences(X_test_rnn, maxlen=maxlen)
  39. print('X_train_rnn shape:', X_train_rnn.shape, y_train_rnn.shape)
  40. print('X_test_rnn shape:', X_test_rnn.shape, y_test_rnn.shape)
  41. '''
  42. X_train_rnn shape: (3873, 100) (3873,)
  43. X_test_rnn shape: (969, 100) (969,)
  44. '''
  45. max_features = 30000
  46. dimension = 128
  47. output_dimension = 128
  48. model = Sequential()
  49. model.add(Embedding(max_features, dimension))
  50. model.add(LSTM(output_dimension))
  51. model.add(Dropout(0.5))
  52. model.add(Dense(1))
  53. model.add(Activation('sigmoid'))
  54. model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy'])
  55. model.fit(X_train_rnn, y_train_rnn, batch_size=32,
  56. nb_epoch=4, validation_data=(X_test_rnn, y_test_rnn))
  57. '''
  58. Train on 3873 samples, validate on 969 samples
  59. Epoch 1/4
  60. 3873/3873 [==============================] - 3s - loss: 0.2487 - acc: 0.5378 - val_loss: 0.2506 - val_acc: 0.5191
  61. Epoch 2/4
  62. 3873/3873 [==============================] - 3s - loss: 0.2486 - acc: 0.5401 - val_loss: 0.2508 - val_acc: 0.5191
  63. Epoch 3/4
  64. 3873/3873 [==============================] - 3s - loss: 0.2484 - acc: 0.5417 - val_loss: 0.2496 - val_acc: 0.5191
  65. Epoch 4/4
  66. 3873/3873 [==============================] - 3s - loss: 0.2484 - acc: 0.5399 - val_loss: 0.2502 - val_acc: 0.5191
  67. <keras.callbacks.History at 0x7fa1e96ac4e0>
  68. '''
  69. score, acc = model.evaluate(X_test_rnn, y_test_rnn, batch_size=32)
  70. # 969/969 [==============================] - 0s
  71. print(score, acc)
  72. # 0.250189056399 0.519091847357

将 TFIDF 向量化器用作输入,来代替单热编码器

  1. from sklearn.feature_extraction.text import TfidfVectorizer
  2. vectorizer = TfidfVectorizer(decode_error='ignore', norm='l2', min_df=5)
  3. tfidf_male = vectorizer.fit_transform(filtered_male_posts)
  4. tfidf_female = vectorizer.fit_transform(filtered_female_posts)
  5. flattened_array_tfidf_male = tfidf_male.toarray()
  6. flattened_array_tfidf_female = tfidf_male.toarray()
  7. y_rnn = np.concatenate((np.zeros(len(flattened_array_tfidf_male)),
  8. np.ones(len(flattened_array_tfidf_female))))
  9. X_train_rnn, X_test_rnn, y_train_rnn, y_test_rnn = train_test_split(np.concatenate((flattened_array_tfidf_male,
  10. flattened_array_tfidf_female)),
  11. y_rnn,test_size=0.2)
  12. maxlen = 100
  13. X_train_rnn = sequence.pad_sequences(X_train_rnn, maxlen=maxlen)
  14. X_test_rnn = sequence.pad_sequences(X_test_rnn, maxlen=maxlen)
  15. print('X_train_rnn shape:', X_train_rnn.shape, y_train_rnn.shape)
  16. print('X_test_rnn shape:', X_test_rnn.shape, y_test_rnn.shape)
  17. '''
  18. X_train_rnn shape: (4152, 100) (4152,)
  19. X_test_rnn shape: (1038, 100) (1038,)
  20. '''
  21. max_features = 30000
  22. model = Sequential()
  23. model.add(Embedding(max_features, dimension))
  24. model.add(LSTM(output_dimension))
  25. model.add(Dropout(0.5))
  26. model.add(Dense(1))
  27. model.add(Activation('sigmoid'))
  28. model.compile(loss='mean_squared_error',optimizer='sgd', metrics=['accuracy'])
  29. model.fit(X_train_rnn, y_train_rnn,
  30. batch_size=32, nb_epoch=4,
  31. validation_data=(X_test_rnn, y_test_rnn))
  32. '''
  33. Train on 4152 samples, validate on 1038 samples
  34. Epoch 1/4
  35. 4152/4152 [==============================] - 3s - loss: 0.2502 - acc: 0.4988 - val_loss: 0.2503 - val_acc: 0.4865
  36. Epoch 2/4
  37. 4152/4152 [==============================] - 3s - loss: 0.2507 - acc: 0.4843 - val_loss: 0.2500 - val_acc: 0.4865
  38. Epoch 3/4
  39. 4152/4152 [==============================] - 3s - loss: 0.2504 - acc: 0.4952 - val_loss: 0.2501 - val_acc: 0.4865
  40. Epoch 4/4
  41. 4152/4152 [==============================] - 3s - loss: 0.2506 - acc: 0.4913 - val_loss: 0.2500 - val_acc: 0.5135
  42. <keras.callbacks.History at 0x7fa1f466f278>
  43. '''
  44. score,acc = model.evaluate(X_test_rnn, y_test_rnn,
  45. batch_size=32)
  46. '''
  47. 1038/1038 [==============================] - 0s
  48. '''
  49. print(score, acc)
  50. '''
  51. 0.249981284572 0.513487476145
  52. '''

使用 LSTM 的句子生成

  1. # 读取所有男性文本数据到一个字符串中
  2. male_post = ' '.join(filtered_male_posts)
  3. # 为男性内容构建字符集
  4. character_set_male = set(male_post)
  5. # 构建两个字典 - 字符到索引的映射,和索引到字符的映射
  6. char_indices = dict((c, i) for i, c in enumerate(character_set_male))
  7. indices_char = dict((i, c) for i, c in enumerate(character_set_male))
  8. # 在 maxlen 个字符的半冗余序列中剪切文本
  9. maxlen = 20
  10. step = 1
  11. sentences = []
  12. next_chars = []
  13. for i in range(0, len(male_post) - maxlen, step):
  14. sentences.append(male_post[i : i + maxlen])
  15. next_chars.append(male_post[i + maxlen])
  16. # 将输入向量化
  17. x_male = np.zeros((len(male_post), maxlen, len(character_set_male)), dtype=np.bool)
  18. y_male = np.zeros((len(male_post), len(character_set_male)), dtype=np.bool)
  19. print(x_male.shape, y_male.shape)
  20. for i, sentence in enumerate(sentences):
  21. for t, char in enumerate(sentence):
  22. x_male[i, t, char_indices[char]] = 1
  23. y_male[i, char_indices[next_chars[i]]] = 1
  24. print(x_male.shape, y_male.shape)
  25. '''
  26. (2552476, 20, 152) (2552476, 152)
  27. (2552476, 20, 152) (2552476, 152)
  28. '''
  29. # 构建模型:单 LSTM
  30. print('Build model...')
  31. model = Sequential()
  32. model.add(LSTM(128, input_shape=(maxlen, len(character_set_male))))
  33. model.add(Dense(len(character_set_male)))
  34. model.add(Activation('softmax'))
  35. optimizer = RMSprop(lr=0.01)
  36. model.compile(loss='categorical_crossentropy', optimizer=optimizer)
  37. # Build model...
  38. auto_text_generating_male_model.compile(loss='mean_squared_error',optimizer='sgd')
  39. import random, sys
  40. # 用于从概率数组中抽样索引的辅助函数
  41. def sample(a, diversity=0.75):
  42. if random.random() > diversity:
  43. return np.argmax(a)
  44. while 1:
  45. i = random.randint(0, len(a)-1)
  46. if a[i] > random.random():
  47. return i
  48. # 训练模型,输出每个迭代之后生成的文本
  49. for iteration in range(1,10):
  50. print()
  51. print('-' * 50)
  52. print('Iteration', iteration)
  53. model.fit(x_male, y_male, batch_size=128, nb_epoch=1)
  54. start_index = random.randint(0, len(male_post) - maxlen - 1)
  55. for diversity in [0.2, 0.4, 0.6, 0.8]:
  56. print()
  57. print('----- diversity:', diversity)
  58. generated = ''
  59. sentence = male_post[start_index : start_index + maxlen]
  60. generated += sentence
  61. print('----- Generating with seed: "' + sentence + '"')
  62. for iteration in range(400):
  63. try:
  64. x = np.zeros((1, maxlen, len(character_set_male)))
  65. for t, char in enumerate(sentence):
  66. x[0, t, char_indices[char]] = 1.
  67. preds = model.predict(x, verbose=0)[0]
  68. next_index = sample(preds, diversity)
  69. next_char = indices_char[next_index]
  70. generated += next_char
  71. sentence = sentence[1:] + next_char
  72. except:
  73. continue
  74. print(sentence)
  75. print()
  76. '''
  77. --------------------------------------------------
  78. Iteration 1
  79. Epoch 1/1
  80. 2552476/2552476 [==============================] - 226s - loss: 1.8022
  81. ----- diversity: 0.2
  82. ----- Generating with seed: "p from the lack of "
  83. sense of the search
  84. ----- diversity: 0.4
  85. ----- Generating with seed: "p from the lack of "
  86. through that possibl
  87. ----- diversity: 0.6
  88. ----- Generating with seed: "p from the lack of "
  89. . This is a " by p
  90. ----- diversity: 0.8
  91. ----- Generating with seed: "p from the lack of "
  92. d he latermal ta we
  93. ...
  94. --------------------------------------------------
  95. Iteration 9
  96. Epoch 1/1
  97. 2552476/2552476 [==============================] - 228s - loss: 8.7874
  98. ----- diversity: 0.2
  99. ----- Generating with seed: " I’ve always looked "
  100. ea e ton ann n ffee
  101. ----- diversity: 0.4
  102. ----- Generating with seed: " I’ve always looked "
  103. o tire n a anV sia a
  104. ----- diversity: 0.6
  105. ----- Generating with seed: " I’ve always looked "
  106. r i jooe Vag o en
  107. ----- diversity: 0.8
  108. ----- Generating with seed: " I’ve always looked "
  109. ao at ge ena oro o
  110. '''