一、算法思想

用当前时间点和前面两点建立模型,并且这个模型可以用多个岭函数(平滑曲线)之和来表示。

二、建模过程

1.基本数据准备:将连续的实测功率值表示为如下形式,前两行为输入x,最后一行为输出y
投影寻踪回归ppr - 图1
2.对输出y进行标准化处理,处理成
3.找投影方向α
投影寻踪回归ppr - 图2
投影向量的行数为岭函数的个数,列数为输入x的维度。
找第一个投影方向(a,a),这个投影方向并不是随机的,由输入x以及输出y的协方差(xy的关系)和自方差(xx的关系)之间的关系来确定确定权重,权重就是投影方向的系数,这样就找到第一个投影方向。(github上的α是随机的)。
4. 求岭函数f
有了第一个投影方向(a,a)之后,用所有的输入和这个投影方向作向量乘,将向量乘后的结果作为横坐标和一一对应,这样就有了多对散点(αx, y),拟合这些散点的过程就是求岭函数f的过程。求岭函数f首先是确定一个窗口,每个窗口里包含的点数是一致的,并且和光滑系数有关,每个窗口的点数=总点数0.5光滑系数+0.5(向下取整)。每次滑动的距离为一个窗口,每个窗口里包含多少点,就要平滑出多少点,最后的岭函数f其实就是这些平滑之后的散点。
5.求权重系数β
它等于协方差和除以自方差和,自方差和是岭函数的预测值的平方和,协方差和是岭函数和的乘积和
6.找第二个岭函数的α,β,f
用第一轮每一个标准化后的实际值和岭函数的预测值ŷ的偏差,用来作为下一轮的输出y
7. 迭代停止条件
(1)迭代次数>岭函数个数
(2)残差收敛:前一次的残差<后一次的残差

三、电场测试

投影寻踪回归ppr - 图3
江苏盐城大丰润龙
20天:0.579,30天:0.592,现在:0.564,邓老师(前):0.704,邓老师:0.315,平滑:0.527,加入预测功率:0.654
5月调和平均准确率对比(20天数据建模)

电场名 ppr 现在
江苏大丰润龙 0.572 0.581
苜蓿台 0.606 0.688
望洋台 0.593 0.639
华电草湖二场 0.487 0.559

四、后续计划

处理方式 20天 30天 直接预测第8点 加入短期预测功率 平滑处理
调和平均准确率 0.579 0.592 0.515 0.653 0.514

复现邓老师的ppr算法基础上进行改进:
(1)参数测试,特别是建模时长和输入维数对超短期的准确率有一定的影响:
有提升,但是提升不太多。
(2)直接预测第8点:效果不好,没有直接对第一点进行建模效果好。
(3)和纯时序的超短期预测算法进行对比,看ppr算法是否是纯时序算法中效果最好的:由于现有的算法都加了短期预测功率,不是纯时序,从纯时序角度对比较困难。
(4)预测功率数据参与建模,方式是将这个预测功率数据加入到输入中:提升较多。
(5)数据预处理一:补断点,处理得比较简单,两点之间的均值。
(6)数据预处理二:实测数据进行平滑处理,避免波动较大的数据干扰模型:准确率比没有平滑更低,有可能和采用的平滑方式效果不佳有关,目前采用的是卷积平滑(np.convolve)。

对github上的ppr算法的改进
(7)测试github的ppr算法进行迭代训练、预测和加上短期预测功率的效果。
(8)对调和平均准确率异常的分析,华电草湖二场(652112)5月12日调和平均准确率0.088,现在的算法是0.272。

github的代码

  1. # -*- coding: utf-8 -*-
  2. import numpy
  3. from scipy.interpolate import UnivariateSpline
  4. from sklearn.base import BaseEstimator, RegressorMixin, TransformerMixin
  5. from sklearn.utils.validation import check_X_y, check_is_fitted
  6. from sklearn.utils import as_float_array, check_random_state, check_array
  7. from matplotlib import pyplot
  8. import numpy as np
  9. import MySQLdb
  10. import pandas as pd
  11. import datetime
  12. import time
  13. def mysql_conn(db='exportdb'): # 数据库连接信息
  14. user = 'dbadmin'
  15. pwd = 'Goldwind@123'
  16. ip = '10.12.17.97'
  17. port = 3306
  18. conn_modeling = MySQLdb.connect(host=ip, db=db, user=user, passwd=pwd, port=port, charset="utf8")
  19. return conn_modeling
  20. def get_data(start_, end_): # 获取实测数据
  21. export_data = pd.read_sql("select rectime, power, predict_power from `%s` where rectime>='%s' and rectime<='%s'" %
  22. ('factory_' + wfid, start_, end_), mysql_conn())
  23. return export_data
  24. def get_expand_data(data, dimension_): # 将一维数据变为多维,最后一维为输出,前面为输入
  25. refactor_sam = []
  26. for i in range(dimension_ - 1, len(data)):
  27. block = dimension_ * [0]
  28. for j in range(dimension_):
  29. block[j] = data[i]
  30. i -= 1
  31. block = list(reversed(block))
  32. refactor_sam.append(block)
  33. col = []
  34. for i in range(dimension_):
  35. if i != dimension_ - 1:
  36. col.append(data.name + '_%s' % i)
  37. else:
  38. col.append('output')
  39. expand_df = pd.DataFrame(refactor_sam, columns=col)
  40. return expand_df
  41. def update_input(predict_result_, array_input_): # 对输入值进行更新
  42. array_input_ = np.delete(array_input_, 0) # 删除掉numpy.ndarray的第一个值
  43. array_input_ = np.append(array_input_, predict_result_) # 插入预测值
  44. return array_input_
  45. def rmse(real, pred, cap_=49500.0): # 均方根误差准确率
  46. s = 1 - np.sqrt(np.mean((real - pred) ** 2)) / cap_
  47. return s * 100
  48. def harmonic_mean(sc, pre, cap_): # 计算超短期准确率
  49. index = (sc < 0.03 * cap_) & (pre < 0.03 * cap_)
  50. sc = sc[~index]
  51. pre = pre[~index]
  52. if len(sc) > 0:
  53. a = np.abs(sc / (sc + pre) - 0.5)
  54. b = abs(sc - pre) / sum(abs(sc - pre))
  55. th = 1 - 2 * sum(a * b)
  56. else:
  57. th = 1
  58. return th
  59. def get_predict_data(ptime):
  60. predict_power = pd.read_sql("select predict_power from `%s` where rectime='%s'" %('factory_' + wfid, ptime),
  61. mysql_conn()).iloc[0, 0]
  62. return predict_power
  63. def smooth_data(df): # 数据平滑
  64. x = df['power'].values
  65. box = np.ones(10) / 10
  66. y = np.convolve(x, box, mode='same') # 两个数组长度可以不一样,参数为same时取更长的一个
  67. y[:5], y[-5:] = x[:5], x[-5:]
  68. df['new_power'] = y
  69. return df
  70. def compensate_data(df, start, end): # 补缺值
  71. time_list = pd.date_range(start, end, freq='15min').tolist()
  72. time_df = pd.DataFrame({'rectime': time_list}) # 首先构造一个时间完整的df
  73. merge_df = pd.merge(time_df, df, on='rectime', how='outer') # merge外链接
  74. merge_df = merge_df.reset_index(drop=True)
  75. for m in range(len(merge_df)):
  76. if m == 0 and merge_df.iloc[0, 1] == np.nan:
  77. merge_df.iloc[0, 1] = (merge_df.iloc[1, 1]+merge_df.iloc[2, 1])/2
  78. elif m == len(merge_df) and merge_df.iloc[-1, 1] == np.nan:
  79. merge_df.iloc[-1, 1] = (merge_df.iloc[-2, 1]+merge_df.iloc[-3, 1])/2
  80. elif merge_df.iloc[m, 1] == np.nan:
  81. merge_df.iloc[m, 1] = (merge_df.iloc[m+1, 1]+merge_df.iloc[m-1, 1])/2
  82. # merge_df = merge_df.fillna(merge_df.mean()) # NAN值用平均来补
  83. return merge_df
  84. class ProjectionPursuitRegressor(BaseEstimator, TransformerMixin, RegressorMixin):
  85. def __init__(self, r=10, fit_type='polyfit', degree=3, opt_level='high',
  86. example_weights='uniform', out_dim_weights='inverse-variance',
  87. eps_stage=0.0001, eps_backfit=0.01, stage_maxiter=100,
  88. backfit_maxiter=10, random_state=None, show_plots=False,
  89. plot_epoch=50):
  90. if not isinstance(r, int) or r < 1: # 如果r不是int或者r<1,报错
  91. raise ValueError('r must be an int >= 1.')
  92. if fit_type not in ['polyfit', 'spline']:
  93. raise NotImplementedError('fit_type ' + fit_type + ' not supported')
  94. if not isinstance(degree, int) or degree < 1:
  95. raise ValueError('degree must be >= 1.')
  96. if opt_level not in ['low', 'medium', 'high']:
  97. raise ValueError('opt_level must be either low, medium, or high.')
  98. if not (isinstance(example_weights, str) and example_weights == 'uniform'):
  99. try:
  100. example_weights = as_float_array(example_weights)
  101. except (TypeError, ValueError) as error:
  102. raise ValueError('example_weights must be `uniform`, or array-like.')
  103. if numpy.any(example_weights < 0):
  104. raise ValueError('example_weights can not contain negatives.')
  105. if not (isinstance(out_dim_weights, str) and out_dim_weights in
  106. ['inverse-variance', 'uniform']):
  107. try:
  108. out_dim_weights = as_float_array(out_dim_weights)
  109. except (TypeError, ValueError) as error:
  110. raise ValueError('out_dim_weights must be either ' + \
  111. 'inverse-variance, uniform, or array-like.')
  112. if numpy.any(out_dim_weights < 0):
  113. raise ValueError('out_dim_weights can not contain negatives.')
  114. if eps_stage <= 0 or eps_backfit <= 0:
  115. raise ValueError('Epsilons must be > 0.')
  116. if not isinstance(stage_maxiter, int) or stage_maxiter <= 0 or \
  117. not isinstance(backfit_maxiter, int) or backfit_maxiter <= 0:
  118. raise ValueError('Maximum iteration settings must be ints > 0.')
  119. # Save parameters to the object
  120. params = locals()
  121. for k, v in params.items():
  122. if k != 'self':
  123. setattr(self, k, v)
  124. def transform(self, X):
  125. check_is_fitted(self, '_alpha')
  126. X = check_array(X)
  127. return numpy.dot(X, self._alpha)
  128. def predict(self, X):
  129. P = self.transform(X)
  130. Y = sum([numpy.outer(self._f[j](P[:, j]), self._beta[:, j]) for j in range(self.r)])
  131. return Y if Y.shape[1] != 1 else Y[:, 0]
  132. def fit(self, X, Y):
  133. X, Y = check_X_y(X, Y, multi_output=True) # 标准估算器的输入验证,检查X和y的长度是否一致,强制X为2D和y为1D
  134. if Y.ndim == 1: # standardize Y as 2D so the below always works 判断Y的维度
  135. Y = Y.reshape((-1, 1)) # reshape returns a view to existing data
  136. self._random = check_random_state(self.random_state) # 随机产生随机数, 没有限制
  137. if isinstance(self.example_weights, str) and self.example_weights == 'uniform':
  138. self._example_weights = numpy.ones(X.shape[0])
  139. elif isinstance(self.example_weights, numpy.ndarray):
  140. if X.shape[0] != self.example_weights.shape[0]:
  141. raise ValueError('example_weights provided to the constructor' +
  142. ' have dimension ' + str(self.example_weights.shape[0]) +
  143. ', which disagrees with the size of X: ' + str(X.shape[0]))
  144. else:
  145. self._example_weights = self.example_weights
  146. if isinstance(self.out_dim_weights, str) and self.out_dim_weights == 'inverse-variance':
  147. variances = Y.var(axis=0)
  148. # There is a problem if a variance for any column is zero, because
  149. # its relative scale will appear infinite.
  150. if max(variances) == 0: # if all zeros, don't readjust weights
  151. variances = numpy.ones(Y.shape[1])
  152. else:
  153. # Fill zeros with the max of variances, so the corresponding
  154. # columns have small weight and are not major determiners of loss.
  155. variances[variances == 0] = max(variances)
  156. self._out_dim_weights = 1. / variances
  157. elif isinstance(self.out_dim_weights, str) and \
  158. self.out_dim_weights == 'uniform':
  159. self._out_dim_weights = numpy.ones(Y.shape[1])
  160. elif isinstance(self.out_dim_weights, numpy.ndarray):
  161. if Y.shape[1] != self.out_dim_weights.shape[0]:
  162. raise ValueError('out_dim_weights provided to the constructor' +
  163. ' have dimension ' + str(self.out_dim_weights.shape[0]) +
  164. ', which disagrees with the width of Y: ' + str(Y.shape[1]))
  165. else:
  166. self._out_dim_weights = self.out_dim_weights
  167. self._alpha = self._random.randn(X.shape[1], self.r) # p x r
  168. self._beta = self._random.randn(Y.shape[1], self.r) # d x r
  169. self._f = [lambda x: x * 0 for j in range(self.r)] # zero functions
  170. self._df = [None for j in range(self.r)] # no derivatives yet
  171. for j in range(self.r): # for each term in the additive model
  172. self._fit_stage(X, Y, j, True)
  173. if self.opt_level == 'high':
  174. self._backfit(X, Y, j, True)
  175. elif self.opt_level == 'medium':
  176. self._backfit(X, Y, j, False)
  177. return self
  178. def _fit_stage(self, X, Y, j, fit_weights):
  179. P = self.transform(X) # the n x r projections matrix
  180. R_j = Y - sum([numpy.outer(self._f[t](P[:, t]), self._beta[:, t].T) for t
  181. in range(self.r) if t is not j]) # the n x d residuals matrix
  182. # main alternating optimization loop
  183. itr = 0 # iteration counter
  184. prev_loss = -numpy.inf
  185. loss = numpy.inf
  186. p_j = P[:, j] # n x 1, the jth column of the projections matrix
  187. # 总的投影次数不能大于stage_maxiter,并且前后两次的损失差值不能小于eps_stage
  188. while abs(prev_loss - loss) > self.eps_stage and itr < self.stage_maxiter:
  189. # To understand how to optimize each set of parameters assuming the
  190. # others remain constant, see math.pdf section 3.
  191. # find the f_j
  192. beta_j_w = self._out_dim_weights * self._beta[:, j] # weighted beta
  193. targets = numpy.dot(R_j, beta_j_w) / (
  194. numpy.inner(self._beta[:, j], beta_j_w) + 1e-9)
  195. # Fit the targets against projections.
  196. self._f[j], self._df[j] = self._fit_2d(p_j, targets, j, itr)
  197. # find beta_j
  198. f = self._f[j](p_j) # Find the n x 1 vector of function outputs.
  199. f_w = self._example_weights * f # f weighted by examples
  200. self._beta[:, j] = numpy.dot(R_j.T, f_w) / (numpy.inner(f, f_w) + 1e-9)
  201. # find alpha_j
  202. if fit_weights:
  203. # Find the part of the Jacobians that is common to all
  204. J = -(self._df[j](p_j) * numpy.sqrt(self._example_weights) * X.T).T
  205. JTJ = numpy.dot(J.T, J)
  206. A = sum([self._out_dim_weights[k] * (self._beta[k, j] ** 2) * JTJ
  207. for k in range(Y.shape[1])])
  208. # Collect all g_jk vectors in to a convenient matrix G_j
  209. G_j = R_j - numpy.outer(self._f[j](p_j), self._beta[:, j].T)
  210. b = -sum([self._out_dim_weights[k] * self._beta[k, j] *
  211. numpy.dot(J.T, G_j[:, k]) for k in range(Y.shape[1])])
  212. delta = numpy.linalg.lstsq(A.astype(numpy.float64, copy=False),
  213. b.astype(numpy.float64, copy=False), rcond=-1)[0]
  214. # TODO implement halving step if the loss doesn't decrease with
  215. # this update.
  216. alpha = self._alpha[:, j] + delta
  217. # normalize to avoid numerical drift
  218. self._alpha[:, j] = alpha / numpy.linalg.norm(alpha)
  219. # Recalculate the jth projection with new f_j and alpha_j
  220. p_j = numpy.dot(X, self._alpha[:, j])
  221. # Calculate mean squared error for this iteration
  222. prev_loss = loss
  223. # Subtract updated contribution of the jth term to get the
  224. # difference between Y and the predictions.
  225. diff = R_j - numpy.outer(self._f[j](p_j), self._beta[:, j].T)
  226. # square the difference, multiply rows by example weights, multiply
  227. # columns by their weights, and sum to get the final loss
  228. loss = numpy.dot(numpy.dot(self._example_weights, diff ** 2),
  229. self._out_dim_weights)
  230. itr += 1
  231. def _backfit(self, X, Y, j, fit_weights):
  232. itr = 0
  233. prev_loss = -numpy.inf
  234. loss = numpy.inf
  235. while abs(prev_loss - loss) > self.eps_backfit and itr < self.backfit_maxiter:
  236. for t in range(j):
  237. self._fit_stage(X, Y, t, fit_weights)
  238. prev_loss = loss
  239. diff = Y - self.predict(X).reshape(Y.shape)
  240. loss = numpy.dot(numpy.dot(self._example_weights, diff ** 2), self._out_dim_weights)
  241. itr += 1
  242. def _fit_2d(self, x, y, j, itr):
  243. if self.fit_type == 'polyfit':
  244. coeffs = numpy.polyfit(x.astype(numpy.float64, copy=False),
  245. y.astype(numpy.float64, copy=False), deg=self.degree,
  246. w=self._example_weights)
  247. fit = numpy.poly1d(coeffs)
  248. deriv = fit.deriv(m=1)
  249. elif self.fit_type == 'spline':
  250. order = numpy.argsort(x)
  251. fit = UnivariateSpline(x[order], y[order], w=self._example_weights, k=self.degree, s=len(y) * y.var(),
  252. ext=0)
  253. deriv = fit.derivative(1)
  254. if self.show_plots and (itr % self.plot_epoch == 0):
  255. pyplot.scatter(x, y)
  256. pyplot.title('stage ' + str(j) + ' iteration ' + str(itr))
  257. pyplot.xlabel('projections')
  258. pyplot.ylabel('residuals')
  259. xx = numpy.linspace(min(x), max(x), 100)
  260. yy = fit(xx)
  261. pyplot.plot(xx, yy, 'g', linewidth=1)
  262. pyplot.show()
  263. return fit, deriv
  264. if __name__ == '__main__':
  265. # 需要配置如下6个变量
  266. wfid = '320907'
  267. ridge_number = 20 # 岭函数个数
  268. dimension = 5 # 需要的数据总维度,如果是二阶输入的话,就是2+8=10,后面8个为输出
  269. train_data_sets = 96 * 30 # 训练的时候用到的输入和输出对数
  270. predict_start = '2019-05-01 00:00:00' # 要预测的开始时间
  271. predict_points = 96*2 # 要预测的点数(一天96点)
  272. start_time = time.time()
  273. cap = pd.read_sql("select powercap from `wind_farm_info` where wfid='%s'"
  274. % wfid, mysql_conn('WindFarmdb')).iloc[0, 0]
  275. train_end = datetime.datetime.strptime(str(predict_start), '%Y-%m-%d %H:%M:%S') + \
  276. datetime.timedelta(hours=-2) # 超短期提前两小时预测当前的
  277. # train_start由train_end和train_data_sets以及dimension决定,保证有192对输入输出
  278. train_start = datetime.datetime.strptime(str(train_end), '%Y-%m-%d %H:%M:%S') + \
  279. datetime.timedelta(minutes=-15 * (train_data_sets + dimension - 2))
  280. predict_result_dict = {}
  281. for i in range(predict_points): # 每次训练和预测1个点
  282. new_train_start = datetime.datetime.strptime(str(train_start), '%Y-%m-%d %H:%M:%S') + \
  283. datetime.timedelta(minutes=15 * i)
  284. new_train_end = datetime.datetime.strptime(str(train_end), '%Y-%m-%d %H:%M:%S') + datetime.timedelta(
  285. minutes=15*i)
  286. predict_result_list = []
  287. for j in range(1, 9): # 每一个时间点进行8次迭代训练和预测(之前16次,节约时间)
  288. predict_power_time_t = datetime.datetime.strptime(str(new_train_end), '%Y-%m-%d %H:%M:%S') + \
  289. datetime.timedelta(minutes=15 * (j - 1)) # 取预测功率的时间,用于训练
  290. predict_power_time_p = datetime.datetime.strptime(str(new_train_end), '%Y-%m-%d %H:%M:%S') + \
  291. datetime.timedelta(minutes=15 * j) # 取预测功率的时间,用于预测
  292. if j == 1: # 第1个点进行训练和预测
  293. train_data = get_data(new_train_start, new_train_end)
  294. expanded_real_data = get_expand_data(train_data['power'], dimension) # 首先取实测功率数据并扩展为dimension-1维
  295. predict_data = train_data['predict_power'].iloc[dimension - 1:].reset_index(
  296. drop=True) # 预测功率的时间和输出的时间一致
  297. expanded_data = pd.merge(expanded_real_data, predict_data.to_frame(), left_index=True,
  298. right_index=True)
  299. expanded_data.insert(dimension - 1, 'p_power', expanded_data['predict_power'],
  300. allow_duplicates=False)
  301. expanded_data = expanded_data.drop(['predict_power'], axis=1)
  302. train_input = expanded_data.iloc[:, 0: -1] # 前几列为训练的输入,最后一列为训练的输出
  303. train_output = expanded_data.iloc[:, -1]
  304. estimator = ProjectionPursuitRegressor(ridge_number)
  305. estimator.fit(train_input.values, train_output.values) # 建立模型
  306. init_predict_input = expanded_data.iloc[-1, np.r_[1:dimension - 1, dimension]].values.tolist()
  307. predict_power = get_predict_data(predict_power_time_p) # 预测数据的预测功率部分
  308. init_predict_input.append(predict_power)
  309. predict_input_array = np.array(init_predict_input).reshape(1, -1) # 转换为数组并reshape
  310. predict_result = estimator.predict(predict_input_array)
  311. predict_result_list.append(predict_result[0])
  312. else: # 第2~16个点进行训练和预测
  313. if j == 2: # 第一次取j=1时expanded_data最后一行
  314. last_col = {}
  315. for k in range(dimension): # 先构建dict
  316. if 0 <= k < dimension-2:
  317. last_col['power_%s' % k] = expanded_data.iloc[-1, k + 1]
  318. elif k == dimension-2:
  319. last_col['power_%s' % (dimension - 2)] = expanded_data.iloc[-1, -1]
  320. else:
  321. last_col['output'] = predict_result
  322. predict_power = get_predict_data(predict_power_time_t)
  323. last_col['p_power'] = predict_power
  324. columns = ['power_%s' % i for i in range(dimension - 1)]
  325. columns.append('p_power') # 最后一行的columns
  326. columns.append('output')
  327. last_df = pd.DataFrame(last_col, columns=columns, index=[0])
  328. new_expanded_data = pd.concat([expanded_data, last_df]) # 新构建的和前面的expanded_data连接
  329. new_expanded_data = new_expanded_data.iloc[1:, :].reset_index(drop=True) # 删掉第一行并更新索引
  330. else:
  331. last_col = {}
  332. for k in range(dimension): # 先构建dict
  333. if 0 <= k < dimension - 2:
  334. last_col['power_%s' % k] = new_expanded_data.iloc[-1, k + 1]
  335. elif k == dimension - 2:
  336. last_col['power_%s' % (dimension - 2)] = new_expanded_data.iloc[-1, -1]
  337. else:
  338. last_col['output'] = predict_result
  339. predict_power = get_predict_data(predict_power_time_t)
  340. last_col['p_power'] = predict_power
  341. columns = ['power_%s' % i for i in range(dimension - 1)]
  342. columns.append('p_power') # 最后一行的columns
  343. columns.append('output')
  344. last_df = pd.DataFrame(last_col, columns=columns, index=[0])
  345. new_expanded_data = pd.concat([new_expanded_data, last_df]) # 新构建的和前面的expanded_data连接
  346. new_expanded_data = new_expanded_data.iloc[1:, :].reset_index(drop=True) # 删掉第一行并更新索引
  347. train_input = new_expanded_data.iloc[:, 0: -1] # 前几列为训练的输入,最后一列为训练的输出
  348. train_output = new_expanded_data.iloc[:, -1]
  349. estimator = ProjectionPursuitRegressor(ridge_number)
  350. estimator.fit(train_input.values, train_output.values) # 建立模型
  351. predict_input = new_expanded_data.iloc[-1, np.r_[1:dimension - 1, dimension]].values.tolist()
  352. predict_power = get_predict_data(predict_power_time_p) # 预测数据的预测功率部分
  353. predict_input.append(predict_power)
  354. predict_input_array = np.array(predict_input).reshape(1, -1)
  355. predict_result = estimator.predict(predict_input_array)
  356. predict_result_list.append(predict_result[0])
  357. for p in range(len(predict_result_list)): # 最后对小于0和大于cap的情况进行处理
  358. if predict_result_list[p] > cap:
  359. predict_result_list[p] = cap
  360. elif predict_result_list[p] < 0:
  361. predict_result_list[p] = 0
  362. predict_result_dict[new_train_end + datetime.timedelta(hours=2)] = predict_result_list
  363. print predict_result_dict
  364. key_list = []
  365. predict_list_1, predict_list_2, predict_list_3, predict_list_4, predict_list_5, predict_list_6, predict_list_7, \
  366. predict_list_8 = [], [], [], [], [], [], [], []
  367. for k, v in predict_result_dict.items():
  368. key_list.append(str(k))
  369. predict_list_1.append(v[0])
  370. predict_list_2.append(v[1])
  371. predict_list_3.append(v[2])
  372. predict_list_4.append(v[3])
  373. predict_list_5.append(v[4])
  374. predict_list_6.append(v[5])
  375. predict_list_7.append(v[6])
  376. predict_list_8.append(v[7])
  377. df = pd.DataFrame({'time': key_list, 'predict_1': predict_list_1, 'predict_2': predict_list_2,
  378. 'predict_3': predict_list_3, 'predict_4': predict_list_4,
  379. 'predict_5': predict_list_5, 'predict_6': predict_list_6,
  380. 'predict_7': predict_list_7, 'predict_8': predict_list_8})
  381. df = df[['time', 'predict_1', 'predict_2', 'predict_3', 'predict_4', 'predict_5',
  382. 'predict_6', 'predict_7', 'predict_8']].sort_values('time').reset_index(drop=True)
  383. df.to_csv(r"C:\Users\35760\PycharmProjects\LocalProject\PPR\datas\320907\predict_df_%s.csv" %
  384. datetime.datetime.now().strftime('%m_%d_%H_%M'))
  385. # df.to_csv(r"predict_df_%s.csv" % wfid) # 服务器部署
  386. df['time'] = pd.to_datetime(df['time'], format="%Y-%m-%d %H:%M:%S")
  387. df['day'] = df['time'].apply(lambda x: '%s-%02d-%02d' % (x.year, x.month, x.day))
  388. df = df.rename(columns={'predict_8': 'predict_power'})
  389. harmonic_acc_list = []
  390. day_list = []
  391. rmse_list = []
  392. data = pd.read_sql("select rectime, power from `%s` where rectime>='2019-05-01' and rectime<='2019-05-31'" %
  393. ('factory_' + wfid), mysql_conn('exportdb'))
  394. data = data.rename(columns={'rectime': 'time'})
  395. merge_data = pd.merge(data, df, on='time', how='inner')
  396. print merge_data
  397. for day, df_ in merge_data.groupby('day'): # 计算调和平均准确率
  398. harmonic_acc = harmonic_mean(df_['power'], df_['predict_power'], cap)
  399. accuracy = rmse(df_['power'], df_['predict_power'], cap)
  400. harmonic_acc_list.append(harmonic_acc)
  401. rmse_list.append(accuracy)
  402. day_list.append(day)
  403. df_harv = pd.DataFrame({'day': day_list, 'harv': harmonic_acc_list[0], 'rmse': rmse_list[0]})
  404. df_harv.to_csv(r"C:\Users\35760\PycharmProjects\LocalProject\PPR\datas\320907\ppr_result_%s.csv" %
  405. datetime.datetime.now().strftime('%m_%d_%H_%M'))
  406. # df_harv.to_csv(r"ppr_result_%s.csv" % wfid)
  407. end_time = time.time()
  408. print 'time:%s' % (end_time - start_time) # 计算一下程序执行的时间,如果过长,要考虑优化代码

邓老师的ppr算法改写

  1. # -*- coding: utf-8 -*-
  2. """
  3. @author: Haojie Shu
  4. @time:
  5. @description:ppr算法加入风速特征
  6. """
  7. import numpy as np
  8. import MySQLdb
  9. import pandas as pd
  10. import datetime
  11. import math
  12. import time
  13. from sklearn.metrics import make_scorer
  14. from sklearn.metrics import r2_score
  15. from sklearn import svm
  16. from sklearn.model_selection import GridSearchCV
  17. def mysql_conn(db='exportdb'): # 数据库连接信息
  18. user = 'dbadmin'
  19. pwd = 'Goldwind@123'
  20. ip = '10.12.17.97'
  21. port = 3306
  22. conn_modeling = MySQLdb.connect(host=ip, db=db, user=user, passwd=pwd, port=port, charset="utf8")
  23. return conn_modeling
  24. def get_data(start_, end_): # 获取实测数据
  25. export_data = pd.read_sql("select rectime, power, predict_power from `%s` where rectime>='%s' and rectime<='%s'" %
  26. ('factory_' + wfid, start_, end_), mysql_conn())
  27. return export_data
  28. def get_expand_data(data, dimension_): # 将一维数据变为多维,最后一维为输出,前面为输入
  29. refactor_sam = []
  30. for i in range(dimension_ - 1, len(data)):
  31. block = dimension_ * [0]
  32. for j in range(dimension_):
  33. block[j] = data[i]
  34. i -= 1
  35. block = list(reversed(block))
  36. refactor_sam.append(block)
  37. col = []
  38. for i in range(dimension_):
  39. if i != dimension_ - 1:
  40. col.append(data.name + '_%s' % i)
  41. else:
  42. col.append('output')
  43. expand_df = pd.DataFrame(refactor_sam, columns=col)
  44. return expand_df
  45. def update_input(predict_result_, array_input_): # 对输入值进行更新
  46. array_input_ = np.delete(array_input_, 0) # 删除掉numpy.ndarray的第一个值
  47. array_input_ = np.append(array_input_, predict_result_) # 插入预测值
  48. return array_input_
  49. def rmse(real, pred, cap_=49500.0): # 均方根误差准确率
  50. s = 1 - np.sqrt(np.mean((real - pred) ** 2)) / cap_
  51. return s * 100
  52. def harmonic_mean(sc, pre, cap_): # 计算超短期准确率
  53. index = (sc < 0.03 * cap_) & (pre < 0.03 * cap_)
  54. sc = sc[~index]
  55. pre = pre[~index]
  56. if len(sc) > 0:
  57. a = np.abs(sc / (sc + pre) - 0.5)
  58. b = abs(sc - pre) / sum(abs(sc - pre))
  59. th = 1 - 2 * sum(a * b)
  60. else:
  61. th = 1
  62. return th
  63. def get_predict_data(ptime):
  64. predict_power = pd.read_sql("select predict_power from `%s` where rectime='%s'" %('factory_' + wfid, ptime),
  65. mysql_conn()).iloc[0, 0]
  66. return predict_power
  67. def smooth_data(df): # 数据平滑
  68. x = df['power'].values
  69. box = np.ones(10) / 10
  70. y = np.convolve(x, box, mode='same') # 两个数组长度可以不一样,参数为same时取更长的一个
  71. y[:5], y[-5:] = x[:5], x[-5:]
  72. df['new_power'] = y
  73. return df
  74. def compensate_data(df, start, end): # 补缺值
  75. time_list = pd.date_range(start, end, freq='15min').tolist()
  76. time_df = pd.DataFrame({'rectime': time_list})
  77. merge_df = pd.merge(time_df, df, on='rectime', how='outer')
  78. merge_df = merge_df.reset_index(drop=True)
  79. merge_df = merge_df.fillna(merge_df.mean()) # NAN值用平均来补
  80. return merge_df
  81. class PPR:
  82. def __init__(self, r):
  83. """
  84. r:岭函数个数, int
  85. """
  86. self.r = r
  87. if not isinstance(self.r, int) or self.r < 1:
  88. raise ValueError('r>1')
  89. self.smooth_ratio = 0.1 # 拟合岭函数的光滑系数
  90. def fit(self, x, y): # 训练
  91. """
  92. 1. x:输入数组, numpy.ndarray
  93. 2. y:输出数组, numpy.ndarray
  94. """
  95. ave = np.average(y)
  96. std = np.std(y)
  97. y_list = []
  98. for i in range(y.shape[0]): # 进行标准化处理
  99. new_y = (y[i] - ave) / std
  100. y_list.append(new_y)
  101. y = np.array(y_list)
  102. # 投影, 岭函数模型, 权重, 损失, 拟合值, x带索引的list
  103. projection_list, ridge_weight_list, loss_list, smo_list, enu_x_list, x_list_list = [], [], [], [], [], []
  104. projection = self.find_projection(x, y) # 第一个投影方向
  105. projection_list.append(projection)
  106. x_list, y_list, smo, loss, enu_x = self.create_one_f(x, y) # smo,拟合值列表(list), enu_x带顺序的列表
  107. loss_list.append(loss)
  108. smo_list.append(smo) # smo_list是list的list
  109. enu_x_list.append(enu_x) # enu_x_list也是list的list
  110. x_list_list.append(x_list)
  111. ridge_weight = self.get_ridge_weight(x, y) # 第一个权重
  112. ridge_weight_list.append(ridge_weight)
  113. for i in range(1, self.r): # 从第二个岭函数开始循环(如果是4个岭函数,不能这么循环)
  114. new_y_list = []
  115. for j in range(y.shape[0]):
  116. for k in range(len(enu_x_list[i-1])):
  117. if enu_x_list[i-1][k][0] == j: # k用来找索引
  118. new_y = y[j] - ridge_weight_list[i-1] * smo_list[i-1][k] # 偏差 = y- 权重*smo
  119. new_y_list.append(new_y) # 每一行得到一个新的y
  120. new_y_array = np.array(new_y_list) # 转为一维数组
  121. projection = self.find_projection(x, new_y_array) # 将偏差作为下一轮的y
  122. projection_list.append(projection)
  123. x_list, y_list, smo, loss, enu_x = self.create_one_f(x, new_y_array)
  124. loss_list.append(loss)
  125. smo_list.append(smo)
  126. enu_x_list.append(enu_x)
  127. x_list_list.append(x_list)
  128. ridge_weight = self.get_ridge_weight(x, new_y_array)
  129. ridge_weight_list.append(ridge_weight)
  130. if loss_list[i] > loss_list[i-1]: # 后一次的残差大于前一次的残差,迭代停止
  131. break
  132. return projection_list, ridge_weight_list, ave, std, smo_list, x_list_list
  133. def predict(self, x, y, input_data_): # ppr函数里的predict函数
  134. projection_list, ridge_weight_list, ave, std, new_smo_list, x_list_list = self.fit(x, y)
  135. # 判断是哪种方式迭代停止的
  136. if len(x_list_list) == self.r: # 岭函数个数=r
  137. output_list = []
  138. for j in range(len(x_list_list)): # 遍历每一个岭函数
  139. input_sum = 0
  140. for k in range(input_data_.shape[1]):
  141. input_sum += input_data_[0][k] * projection_list[j][k] # 投影*输入
  142. # print 'input_sum:%s' % input_sum
  143. min_x, max_x = min(x_list_list[j]), max(x_list_list[j])
  144. # 都是等比例输出
  145. if input_sum > max_x:
  146. if x_list_list[j][-1] == x_list_list[j][-2]: # 最大的两个值刚好相等(防止分母为0)
  147. output = new_smo_list[j][-1]
  148. new_output = output * ridge_weight_list[j]
  149. output_list.append(new_output)
  150. else:
  151. ratio = (input_sum - x_list_list[j][-2]) / (x_list_list[j][-1] - x_list_list[j][-2])
  152. output = ratio * (new_smo_list[j][-1] - new_smo_list[j][-2]) + new_smo_list[j][-2]
  153. # print 'max_output:%s' % output
  154. new_output = output * ridge_weight_list[j]
  155. output_list.append(new_output)
  156. elif input_sum < min_x:
  157. if x_list_list[j][1] == x_list_list[j][0]:
  158. output = new_smo_list[j][0]
  159. new_output = output * ridge_weight_list[j]
  160. output_list.append(new_output)
  161. else:
  162. ratio = (x_list_list[j][1]-input_sum)/(x_list_list[j][1]-x_list_list[j][0])
  163. output = new_smo_list[j][1]-(new_smo_list[j][1]-new_smo_list[j][0])*ratio
  164. # print 'min_output:%s' % output
  165. new_output = output * ridge_weight_list[j]
  166. output_list.append(new_output)
  167. else: # 最大最小之间
  168. for h in range(len(x_list_list[j])):
  169. if x_list_list[j][h] == input_sum: # input_sum和x_list_list某个值刚好相等
  170. output_list.append(new_smo_list[j][h] * ridge_weight_list[j])
  171. # print 'mid_output:%s' % ridge_weight_list[j]
  172. break # 有可能相同的不止一个,取第一个就行
  173. elif x_list_list[j][h] < input_sum < x_list_list[j][h + 1]: # 如果不相等,必定在两个之间
  174. ratio = (input_sum - x_list_list[j][h]) / (x_list_list[j][h+1] - x_list_list[j][h])
  175. output = (new_smo_list[j][h+1] - new_smo_list[j][h]) * ratio + new_smo_list[j][h]
  176. # print 'mid_output:%s' % output
  177. new_output = output * ridge_weight_list[j]
  178. output_list.append(new_output)
  179. else:
  180. pass
  181. else:
  182. output_list = []
  183. for j in range(len(x_list_list)): # 遍历每一个岭函数
  184. input_sum = 0
  185. for k in range(input_data_.shape[1]):
  186. input_sum += input_data_[0][k] * projection_list[j][k] # 投影*输入
  187. min_x, max_x = min(x_list_list[j]), max(x_list_list[j])
  188. # 都是等比例输出
  189. if input_sum > max_x:
  190. if x_list_list[j][-1] == x_list_list[j][-2]:
  191. output = new_smo_list[j][-1]
  192. new_output = output * ridge_weight_list[j]
  193. output_list.append(new_output)
  194. else:
  195. ratio = (input_sum - x_list_list[j][-2]) / (x_list_list[j][-1] - x_list_list[j][-2])
  196. output = ratio * (new_smo_list[j][-1] - new_smo_list[j][-2]) + new_smo_list[j][-2]
  197. new_output = output * ridge_weight_list[j]
  198. output_list.append(new_output)
  199. elif input_sum < min_x:
  200. ratio = (x_list_list[j][0] - input_sum) / (x_list_list[j][1] - input_sum)
  201. output = (ratio * new_smo_list[j][1] - new_smo_list[j][0]) / ratio - 1
  202. new_output = output * ridge_weight_list[j]
  203. output_list.append(new_output)
  204. else: # 最大最小之间
  205. for h in range(len(x_list_list[j])):
  206. if x_list_list[j][h] == input_sum: # input_sum和x_list_list某个值刚好相等
  207. output_list.append(new_smo_list[j][h] * ridge_weight_list[j])
  208. elif x_list_list[j][h] < input_sum < x_list_list[j][h + 1]: # 如果不相等,必定在两个之间
  209. ratio = (input_sum - x_list_list[j][h]) / (x_list_list[j][h+1] - x_list_list[j][h])
  210. output = (new_smo_list[j][h+1] - new_smo_list[j][h]) * ratio + new_smo_list[j][h]
  211. new_output = output * ridge_weight_list[j]
  212. output_list.append(new_output)
  213. else:
  214. pass
  215. ult_output = sum(output_list)*std+ave
  216. return ult_output
  217. @staticmethod
  218. def cal_var(x, y):
  219. avg_x = np.mean(x, axis=0) # 求数组x每一列的均值
  220. cov_x_y_list = []
  221. for p in range(x.shape[1]):
  222. s = 0.0
  223. for q in range(x.shape[0]):
  224. s = s + y[q] * (x[q, p] - avg_x[p])
  225. cov_x_y_list.append([p, s / x.shape[0]]) # 计算x/y协方差
  226. var_x_x_list = []
  227. for i in range(x.shape[1]):
  228. for j in range(i, x.shape[1]):
  229. s = 0.0
  230. for k in range(x.shape[0]):
  231. tt = (x[k, j] - avg_x[j]) * (x[k, j] - avg_x[i])
  232. s = s + tt
  233. var_x_x_list.append([i, j, s / x.shape[0]])
  234. return cov_x_y_list, var_x_x_list
  235. def find_projection(self, x, y): # 由自方差和协方差来寻找投影方向(权重系数)
  236. cov_list, var_list = self.cal_var(x, y)
  237. project_dict = {} # 投影方向的dict
  238. for i in range(x.shape[1]):
  239. project_dict[i] = 1
  240. for i in range(x.shape[1]):
  241. k1 = i + (i * (i + 1) / 2) # 取自方差的索引,按照C#代码写的
  242. sum_var = project_dict[i] * var_list[k1][2]
  243. for j in range(i + 1, x.shape[1]):
  244. k2 = i + (j * (j + 1) / 2)
  245. sum_var += project_dict[i] * var_list[k2][2]
  246. for k in range(i):
  247. k3 = i + (k * (k + 1) / 2)
  248. sum_var += project_dict[i] * var_list[k3][2]
  249. project_dict[i] = cov_list[i][1] / sum_var
  250. '''
  251. 用自方差和协方差求完投影方向后,还要进行一个简单的处理
  252. '''
  253. sum_project = 0
  254. new_projection_dict = {}
  255. for v in project_dict.values():
  256. sum_project += v * v
  257. sum_project = 1.0 / math.sqrt(sum_project)
  258. for k, v in project_dict.items():
  259. new_projection = v * sum_project
  260. new_projection_dict[k] = new_projection
  261. return new_projection_dict
  262. def get_project_label(self, x, y): # 获取投影后的坐标list, x和y是对应起来的
  263. project_dict = self.find_projection(x, y)
  264. x_list, y_list = [], []
  265. for i in range(x.shape[0]):
  266. x_label = 0
  267. for j in range(x.shape[1]):
  268. x_label += x[i][j] * project_dict[j] # 投影方向和输入作乘法,得到的值作为横坐标
  269. x_list.append(x_label)
  270. y_list.append(y[i])
  271. return x_list, y_list
  272. @staticmethod
  273. def performance_metric(y_true, y_predict):
  274. score = r2_score(y_true, y_predict)
  275. return score
  276. def fit_curve(self, x, y): # 拟合曲线
  277. x_list, y_list, smo_list, loss, index_list = self.create_one_f(x, y)
  278. x_array = np.array(x_list)
  279. smo_array = np.array(smo_list)
  280. reshape_x_array = x_array.reshape(x_array.shape[0], -1)
  281. scorig_fnc = make_scorer(self.performance_metric)
  282. params = {'C': [10], 'gamma': [0.1]}
  283. svc = svm.SVR()
  284. grid = GridSearchCV(svc, param_grid=params, scoring=scorig_fnc, cv=5) # 网格搜索
  285. model = grid.fit(reshape_x_array, smo_array)
  286. return model, loss
  287. def create_one_f_back(self, x, y): # 创建一个岭函数
  288. x_list, y_list = self.get_project_label(x, y)
  289. enu_x = list(enumerate(x_list))
  290. df = pd.DataFrame({'x': x_list, 'y': y_list, 'enu_x': enu_x})
  291. sorted_df = df.sort_values(by='x') # 按照x进行升序排序
  292. new_x_list = sorted_df['x'].tolist()
  293. new_y_list = sorted_df['y'].tolist()
  294. new_enu_x = sorted_df['enu_x'].tolist()
  295. smo_list = self.super_filter(new_x_list, new_y_list) # 岭函数拟合后的纵坐标
  296. loss_sum = 0
  297. for j in range(len(smo_list)):
  298. loss = math.pow((new_y_list[j] - smo_list[j]), 2)
  299. loss_sum += loss # 总的误差
  300. return new_x_list, new_y_list, smo_list, loss_sum, new_enu_x
  301. def create_one_f(self, x, y): # 创建一个岭函数
  302. x_list, y_list = self.get_project_label(x, y)
  303. enu_x = list(enumerate(x_list))
  304. df = pd.DataFrame({'x': x_list, 'y': y_list, 'enu_x': enu_x})
  305. sorted_df = df.sort_values(by='x') # 按照x进行升序排序
  306. new_x_list = sorted_df['x'].tolist()
  307. new_y_list = sorted_df['y'].tolist()
  308. new_enu_x = sorted_df['enu_x'].tolist()
  309. smo_list = self.super_filter(new_x_list, new_y_list) # 岭函数拟合后的纵坐标
  310. loss_sum = 0
  311. for j in range(len(smo_list)):
  312. loss = math.pow((new_y_list[j] - smo_list[j]), 2)
  313. loss_sum += loss # 总的误差
  314. avg_smo_list = sum(smo_list)/len(smo_list)
  315. v = 0.0
  316. for k in range(len(smo_list)):
  317. smo_list[k] = smo_list[k]-avg_smo_list
  318. v += smo_list[k]*smo_list[k]
  319. if v > 0:
  320. v = 1/math.sqrt(v/len(smo_list))
  321. for s in range(len(smo_list)):
  322. smo_list[s] = smo_list[s]*v
  323. return new_x_list, new_y_list, smo_list, loss_sum, new_enu_x
  324. def super_filter(self, x_list, y_list): # 超级滤波器
  325. """
  326. :param x_list: 经过排序的list,投影方向和输入x的乘积
  327. :param y_list: y_list和x_list是对应的
  328. 在c#中
  329. :return:
  330. """
  331. len_list = len(x_list)
  332. scale = x_list[3 * len_list / 4] - x_list[len_list / 4]
  333. vsm = scale * scale
  334. smo_list = self.smooth_window(x_list, y_list, vsm)
  335. new_y_list = []
  336. for i in range(len_list):
  337. new_y_list.append(smo_list[i])
  338. new_smo_list = self.smooth_window(x_list, new_y_list, vsm) # 调用两次,增加平滑度
  339. return new_smo_list
  340. def smooth_window(self, x_list, y_list, vsm): # 对窗口内的点作平滑处理
  341. """
  342. :param x_list:经过排序的list,投影方向和输入x的乘积
  343. :param y_list:y_list和x_list是对应的
  344. :param vsm:滑动距离参数
  345. 需要注意的是,在C#中smooth_window函数虽然有smo_list,acvr_list这两个参数,但是它们并不起作用
  346. :return:
  347. """
  348. win_x, win_y, fbw, xvar, cvar = 0.0, 0.0, 0.0, 0.0, 0.0 # 岭函数滑动窗口的一些初始值
  349. n_points = int(0.5 * self.smooth_ratio * len(x_list) + 0.5) # 每次窗口滑动要包含的点数
  350. if n_points < 2: # 每个窗口内的点数不能少于2
  351. n_points = 2
  352. n_iter = int(2 * n_points + 1) # 循环次数
  353. if n_iter > len(x_list): # 循环次数不能大于横坐标的总个数
  354. n_iter = len(x_list)
  355. for i in range(n_iter): # 第一次循环滤波处理, 计算波型边界参数
  356. list_index = i # 取x_list, y_list的一个索引
  357. xti = x_list[list_index]
  358. if list_index < 1:
  359. list_index = len(x_list) - 1 + list_index
  360. xti = x_list[list_index] - 1.0
  361. fbo = fbw
  362. fbw = fbw + 1
  363. if fbw > 0:
  364. win_x = (fbo * win_x + xti) / fbw
  365. win_y = (fbo * win_y + y_list[list_index]) / fbw
  366. tmp = 0.0
  367. if fbo > 0:
  368. tmp = fbw * (xti - win_x) / fbo
  369. xvar = xvar + tmp * (xti - win_x)
  370. cvar = cvar + tmp * (y_list[list_index] - win_y)
  371. smo_list, acvr_list = [], []
  372. for p in range(len(x_list)): # 第二次循环滤波处理,计算SMO[0..N],ACVR[0..N]
  373. out = p - n_points - 1
  374. inp = p + n_points
  375. if out >= 1 and inp <= len(x_list):
  376. if out < 1:
  377. out = len(x_list) + out
  378. xto = x_list[out] - 1.0
  379. xti = x_list[inp]
  380. else:
  381. if inp > len(x_list) - 1:
  382. inp = inp - len(x_list)
  383. xti = x_list[inp] + 1.0
  384. xto = x_list[out]
  385. else:
  386. xto = x_list[out]
  387. xti = x_list[inp]
  388. fbo = fbw
  389. fbw = fbw - 1
  390. tmp = 0.0
  391. if fbw > 0.0:
  392. tmp = fbo * (xto - win_x) / fbw
  393. xvar = xvar - tmp * (xto - win_x)
  394. cvar = cvar - tmp * (y_list[out] - win_y)
  395. if fbw > 0.0:
  396. win_x = (fbo * win_x - xto) / fbw
  397. win_y = (fbo * win_y - y_list[out]) / fbw
  398. fbo = fbw
  399. fbw = fbw + 1
  400. if fbw > 0.0:
  401. win_x = (fbo * win_x + xti) / fbw
  402. win_y = (fbo * win_y + y_list[inp]) / fbw
  403. tmp = 0.0
  404. if fbo > 0.0:
  405. tmp = fbw * (xti - win_x) / fbo
  406. xvar = xvar + tmp * (xti - win_x)
  407. cvar = cvar + tmp * (y_list[inp] - win_y)
  408. a = 0.0
  409. if xvar > vsm:
  410. a = cvar / xvar
  411. smo = a * (x_list[p] - win_x) + win_y
  412. smo_list.append(smo)
  413. h = 0.0
  414. if fbw > 0.0:
  415. h = 1.0 / fbw
  416. if xvar > vsm:
  417. h = h + (x_list[p] - win_x) * (x_list[p] - win_x) / xvar
  418. acvr = 0
  419. a = 1.0 - h
  420. if a > 0.0:
  421. acvr = abs(y_list[p] - smo_list[p]) / a
  422. else:
  423. if a > 1:
  424. acvr = acvr_list[p - 1]
  425. acvr_list.append(acvr)
  426. jj = 0
  427. while jj <= len(x_list) - 1:
  428. jo = jj
  429. sy = smo_list[jj]
  430. j = jj
  431. fbw = 1
  432. while True:
  433. j = j + 1
  434. if j >= len(x_list) - 1:
  435. break
  436. if x_list[j - 1] <= x_list[j]:
  437. break
  438. sy = sy + smo_list[j - 1]
  439. fbw = fbw + 1
  440. if j > jo:
  441. a = 0.0
  442. if fbw > 0.0:
  443. a = sy / fbw
  444. for q in range(j):
  445. smo_list[q] = a
  446. jj = jj + 1
  447. return smo_list
  448. def get_ridge_weight(self, x, y): # 岭函数的权重,也即β参数
  449. x_list, y_list, smo_list, loss, index_list = self.create_one_f(x, y)
  450. var_sum, cov_sum = 0.0, 0.0
  451. for i in range(len(smo_list)):
  452. cov_sum += y_list[i] * smo_list[i] # 协方差和
  453. var_sum += smo_list[i] * smo_list[i] # 自方差和
  454. ridge_weight = cov_sum / var_sum
  455. return ridge_weight
  456. if __name__ == '__main__':
  457. # 需要配置如下6个变量
  458. wfid = '320907'
  459. ridge_number = 4 # 岭函数个数
  460. dimension = 5 # 需要的数据总维度,如果是二阶输入的话,就是2+8=10,后面8个为输出
  461. train_data_sets = 96*30 # 训练的时候用到的输入和输出对数
  462. predict_start = '2019-05-01 00:00:00' # 要预测的开始时间
  463. predict_points = 96*1 # 要预测的点数(一天96点)
  464. start_time = time.time()
  465. cap = pd.read_sql("select powercap from `wind_farm_info` where wfid='%s'"
  466. % wfid, mysql_conn('WindFarmdb')).iloc[0, 0]
  467. train_end = datetime.datetime.strptime(str(predict_start), '%Y-%m-%d %H:%M:%S') + \
  468. datetime.timedelta(hours=-2) # 超短期提前两小时预测当前的
  469. # train_start由train_end和train_data_sets以及dimension决定,保证有192对输入输出
  470. train_start = datetime.datetime.strptime(str(train_end), '%Y-%m-%d %H:%M:%S') + \
  471. datetime.timedelta(minutes=-15*(train_data_sets+dimension-2))
  472. predict_result_dict = {}
  473. for i in range(predict_points): # 每次训练和预测1个点
  474. new_train_start = datetime.datetime.strptime(str(train_start), '%Y-%m-%d %H:%M:%S') + \
  475. datetime.timedelta(minutes=15*i)
  476. new_train_end = datetime.datetime.strptime(str(train_end), '%Y-%m-%d %H:%M:%S') + datetime.timedelta(
  477. minutes=15*i)
  478. predict_result_list = []
  479. for j in range(1, 9): # 每一个时间点进行8次迭代训练和预测(之前16次,节约时间)
  480. predict_power_time_t = datetime.datetime.strptime(str(new_train_end), '%Y-%m-%d %H:%M:%S') + \
  481. datetime.timedelta(minutes=15*(j-1)) # 取预测功率的时间,用于训练
  482. predict_power_time_p = datetime.datetime.strptime(str(new_train_end), '%Y-%m-%d %H:%M:%S') + \
  483. datetime.timedelta(minutes=15*j) # 取预测功率的时间,用于预测
  484. if j == 1: # 第1个点进行训练和预测
  485. train_data = get_data(new_train_start, new_train_end)
  486. expanded_real_data = get_expand_data(train_data['power'], dimension) # 首先取实测功率数据并扩展为dimension-1维
  487. predict_data = train_data['predict_power'].iloc[dimension-1:].reset_index(drop=True) # 预测功率的时间和输出的时间一致
  488. expanded_data = pd.merge(expanded_real_data, predict_data.to_frame(), left_index=True, right_index=True)
  489. expanded_data.insert(dimension-1, 'p_power', expanded_data['predict_power'], allow_duplicates=False)
  490. expanded_data = expanded_data.drop(['predict_power'], axis=1)
  491. train_input = expanded_data.iloc[:, 0: -1] # 前几列为训练的输入,最后一列为训练的输出
  492. train_output = expanded_data.iloc[:, -1]
  493. estimator = PPR(ridge_number)
  494. estimator.fit(train_input.values, train_output.values) # 建立模型
  495. init_predict_input = expanded_data.iloc[-1, np.r_[1:dimension-1, dimension]].values.tolist()
  496. predict_power = get_predict_data(predict_power_time_p) # 预测数据的预测功率部分
  497. init_predict_input.append(predict_power)
  498. predict_input_array = np.array(init_predict_input).reshape(1, -1) # 转换为数组并reshape
  499. predict_result = estimator.predict(train_input.values, train_output.values, predict_input_array)
  500. predict_result_list.append(predict_result)
  501. else: # 第2~16个点进行训练和预测
  502. if j == 2: # 第一次取j=1时expanded_data最后一行
  503. last_col = {}
  504. for k in range(dimension): # 先构建dict
  505. if 0 <= k < dimension-2:
  506. last_col['power_%s' % k] = expanded_data.iloc[-1, k+1]
  507. elif k == dimension-2:
  508. last_col['power_%s' % (dimension-2)] = expanded_data.iloc[-1, -1]
  509. else:
  510. last_col['output'] = predict_result
  511. predict_power = get_predict_data(predict_power_time_t)
  512. last_col['p_power'] = predict_power
  513. columns = ['power_%s' % i for i in range(dimension-1)]
  514. columns.append('p_power') # 最后一行的columns
  515. columns.append('output')
  516. last_df = pd.DataFrame(last_col, columns=columns, index=[0])
  517. new_expanded_data = pd.concat([expanded_data, last_df]) # 新构建的和前面的expanded_data连接
  518. new_expanded_data = new_expanded_data.iloc[1:, :].reset_index(drop=True) # 删掉第一行并更新索引
  519. else:
  520. last_col = {}
  521. for k in range(dimension): # 先构建dict
  522. if 0 <= k < dimension-2:
  523. last_col['power_%s' % k] = new_expanded_data.iloc[-1, k+1]
  524. elif k == dimension-2:
  525. last_col['power_%s' % (dimension-2)] = new_expanded_data.iloc[-1, -1]
  526. else:
  527. last_col['output'] = predict_result
  528. predict_power = get_predict_data(predict_power_time_t)
  529. last_col['p_power'] = predict_power
  530. columns = ['power_%s' % i for i in range(dimension-1)]
  531. columns.append('p_power') # 最后一行的columns
  532. columns.append('output')
  533. last_df = pd.DataFrame(last_col, columns=columns, index=[0])
  534. new_expanded_data = pd.concat([new_expanded_data, last_df]) # 新构建的和前面的expanded_data连接
  535. new_expanded_data = new_expanded_data.iloc[1:, :].reset_index(drop=True) # 删掉第一行并更新索引
  536. train_input = new_expanded_data.iloc[:, 0: -1] # 前几列为训练的输入,最后一列为训练的输出
  537. train_output = new_expanded_data.iloc[:, -1]
  538. estimator = PPR(ridge_number)
  539. estimator.fit(train_input.values, train_output.values) # 建立模型
  540. predict_input = new_expanded_data.iloc[-1, np.r_[1:dimension-1, dimension]].values.tolist()
  541. predict_power = get_predict_data(predict_power_time_p) # 预测数据的预测功率部分
  542. predict_input.append(predict_power)
  543. predict_input_array = np.array(predict_input).reshape(1, -1)
  544. predict_result = estimator.predict(train_input.values, train_output.values, predict_input_array)
  545. predict_result_list.append(predict_result)
  546. for p in range(len(predict_result_list)): # 最后对小于0和大于cap的情况进行处理
  547. if predict_result_list[p] > cap:
  548. predict_result_list[p] = cap
  549. elif predict_result_list[p] < 0:
  550. predict_result_list[p] = 0
  551. predict_result_dict[new_train_end + datetime.timedelta(hours=2)] = predict_result_list
  552. """
  553. 计算调和平均准确率
  554. """
  555. key_list = []
  556. predict_list_1, predict_list_2, predict_list_3, predict_list_4, predict_list_5, predict_list_6, predict_list_7, \
  557. predict_list_8 = [], [], [], [], [], [], [], []
  558. for k, v in predict_result_dict.items():
  559. key_list.append(str(k))
  560. predict_list_1.append(v[0])
  561. predict_list_2.append(v[1])
  562. predict_list_3.append(v[2])
  563. predict_list_4.append(v[3])
  564. predict_list_5.append(v[4])
  565. predict_list_6.append(v[5])
  566. predict_list_7.append(v[6])
  567. predict_list_8.append(v[7])
  568. df = pd.DataFrame({'time': key_list, 'predict_1': predict_list_1, 'predict_2': predict_list_2,
  569. 'predict_3': predict_list_3, 'predict_4': predict_list_4,
  570. 'predict_5': predict_list_5, 'predict_6': predict_list_6,
  571. 'predict_7': predict_list_7, 'predict_8': predict_list_8})
  572. df = df[['time', 'predict_1', 'predict_2', 'predict_3', 'predict_4', 'predict_5',
  573. 'predict_6', 'predict_7', 'predict_8']].sort_values('time').reset_index(drop=True) # 调整df列的顺序并按照time进行排序
  574. # df.to_csv(r"C:\Users\35760\PycharmProjects\LocalProject\PPR\datas\320907\predict_df_%s.csv" %
  575. # datetime.datetime.now().strftime('%m_%d_%H_%M'))
  576. df.to_csv(r"predict_df_%s.csv" % wfid) # 服务器部署
  577. df['time'] = pd.to_datetime(df['time'], format="%Y-%m-%d %H:%M:%S")
  578. df['day'] = df['time'].apply(lambda x: '%s-%02d-%02d' % (x.year, x.month, x.day))
  579. df = df.rename(columns={'predict_8': 'predict_power'})
  580. harmonic_acc_list = []
  581. day_list = []
  582. rmse_list = []
  583. data = pd.read_sql("select rectime, power from `%s` where rectime>='2019-05-01' and rectime<='2019-05-31'" %
  584. ('factory_' + wfid), mysql_conn('exportdb'))
  585. data = data.rename(columns={'rectime': 'time'})
  586. merge_data = pd.merge(data, df, on='time', how='inner')
  587. for day, df_ in merge_data.groupby('day'): # 计算调和平均准确率
  588. harmonic_acc = harmonic_mean(df_['power'], df_['predict_power'], cap)
  589. accuracy = rmse(df_['power'], df_['predict_power'], cap)
  590. harmonic_acc_list.append(harmonic_acc)
  591. rmse_list.append(accuracy)
  592. day_list.append(day)
  593. df_harv = pd.DataFrame({'day': day_list, 'harv': harmonic_acc_list, 'rmse': rmse_list})
  594. # df_harv.to_csv(r"C:\Users\35760\PycharmProjects\LocalProject\PPR\datas\320907\ppr_result_%s.csv" %
  595. # datetime.datetime.now().strftime('%m_%d_%H_%M'))
  596. df_harv.to_csv(r"ppr_result_%s.csv" % wfid)
  597. end_time = time.time()
  598. print 'time:%s' % (end_time - start_time) # 计算一下程序执行的时间,如果过长,要考虑优化代码