Java 类名:com.alibaba.alink.operator.batch.timeseries.HoltWintersBatchOp
Python 类名:HoltWintersBatchOp
功能介绍
给定分组,对每一组的数据使用HoltWinters进行时间序列预测。
使用方式
参考文档 https://www.yuque.com/pinshu/alink_guide/xbp5ky
算法原理
HoltWinters由Holt和Winters提出的三次指数平滑算法,又称holt-winters,
HoltWinters 详细介绍请见链接 https://en.wikipedia.org/wiki/Exponential_smoothing
holt-winters支持2种季节类型: additive 和 multiplicative
- additive seasonal holt-winters
- multiplicative seasonal holt_winters
其中,
smoothValue(l、b、s)分别表示level,trend,seasonal
- smoothParameter(α、β、γ)分别表示alpha,beta,gamma
- t表示当前时刻,h表示要预测h步
- p表示period或frequency,时间序列的周期
使用方式
- 第一步,将每组数据(时间列和数据列) 聚合成MTable.``` GroupByBatchOp() .setGroupByPredicate(“id”) .setSelectClause(“id, mtable_agg(ts, val) as data”)
- 第二步,使用时间序列方法进行预测,预测结果也是MTable。
- 第三步,使用FlattenMTableBatchOp,将MTable转换成列,
FlattenMTableBatchOp() .setReservedCols([“id”, “predict”]) .setSelectedCol(“predict”) .setSchemaStr(“ts timestamp, val double”)
## 参数说明
|
名称
| 中文名称
| 描述
| 类型
| 是否必须?
| 取值范围
| 默认值
|
| --- | --- | --- | --- | --- | --- | --- |
|
predictionCol
| 预测结果列名
| 预测结果列名
| String
| ✓
|
|
|
|
valueCol
| value列,类型为MTable
| value列,类型为MTable
| String
| ✓
| 所选列类型为 [M_TABLE]
|
|
|
alpha
| alpha
| alpha
| Double
|
| [0.0, 1.0]
| 0.3
|
|
beta
| beta
| beta
| Double
|
| [0.0, 1.0]
| 0.1
|
|
doSeasonal
| 时间是否具有季节性
| 时间是否具有季节性
| Boolean
|
|
| false
|
|
doTrend
| 时间是否具有趋势性
| 时间是否具有趋势性
| Boolean
|
|
| false
|
|
frequency
| 时序频率
| 时序频率
| Integer
|
| [1, +inf)
| 10
|
|
gamma
| gamma
| gamma
| Double
|
| [0.0, 1.0]
| 0.1
|
|
levelStart
| level初始值
| level初始值
| Double
|
|
|
|
|
predictNum
| 预测条数
| 预测条数
| Integer
|
|
| 1
|
|
predictionDetailCol
| 预测详细信息列名
| 预测详细信息列名
| String
|
|
|
|
|
reservedCols
| 算法保留列名
| 算法保留列
| String[]
|
|
| null
|
|
seasonalStart
| seasonal初始值
| seasonal初始值
| double[]
|
|
|
|
|
seasonalType
| 季节类型
| 季节类型
| String
|
| "MULTIPLICATIVE", "ADDITIVE"
| "ADDITIVE"
|
|
trendStart
| trend初始值
| trend初始值
| Double
|
|
|
|
|
numThreads
| 组件多线程线程个数
| 组件多线程线程个数
| Integer
|
|
| 1
|
## 代码示例
### Python 代码
from pyalink.alink import *
import pandas as pd
useLocalEnv(1)
import time, datetime import numpy as np import pandas as pd
data = pd.DataFrame([ [1, datetime.datetime.fromtimestamp(1), 10.0], [1, datetime.datetime.fromtimestamp(2), 11.0], [1, datetime.datetime.fromtimestamp(3), 12.0], [1, datetime.datetime.fromtimestamp(4), 13.0], [1, datetime.datetime.fromtimestamp(5), 14.0], [1, datetime.datetime.fromtimestamp(6), 15.0], [1, datetime.datetime.fromtimestamp(7), 16.0], [1, datetime.datetime.fromtimestamp(8), 17.0], [1, datetime.datetime.fromtimestamp(9), 18.0], [1, datetime.datetime.fromtimestamp(10), 19.0] ])
source = dataframeToOperator(data, schemaStr=’id int, ts timestamp, val double’, op_type=’batch’)
source.link( GroupByBatchOp() .setGroupByPredicate(“id”) .setSelectClause(“id, mtable_agg(ts, val) as data”) ).link(HoltWintersBatchOp() .setValueCol(“data”) .setPredictionCol(“pred”) .setPredictNum(12) ).print()
### Java 代码
package com.alibaba.alink.operator.batch.timeseries;
import org.apache.flink.types.Row;
import com.alibaba.alink.operator.batch.source.MemSourceBatchOp; import com.alibaba.alink.operator.batch.sql.GroupByBatchOp; import org.junit.Test;
import java.sql.Timestamp; import java.util.Arrays; import java.util.List;
public class HoltWintersBatchOpTest {
@Test
public void test() throws Exception {
List
MemSourceBatchOp source = new MemSourceBatchOp(mTableData, new String[] {"id", "ts", "val"});
source.link(
new GroupByBatchOp()
.setGroupByPredicate("id")
.setSelectClause("mtable_agg(ts, val) as data")
).link(new HoltWintersBatchOp()
.setValueCol("data")
.setPredictionCol("pred")
.setPredictNum(12)
).print();
}
}
```
运行结果
| id | data | pred | | —- | —- | —- |
| 1 | {“data”:{“ts”:[“1970-01-01 08:00:00.001”,”1970-01-01 08:00:00.002”,”1970-01-01 08:00:00.003”,”1970-01-01 08:00:00.004”,”1970-01-01 08:00:00.005”,”1970-01-01 08:00:00.006”,”1970-01-01 08:00:00.007”,”1970-01-01 08:00:00.008”,”1970-01-01 08:00:00.009”,”1970-01-01 08:00:00.01”],”val”:[10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0]},”schema”:”ts TIMESTAMP,val DOUBLE”} | {“data”:{“ts”:[“1970-01-01 08:00:00.011”,”1970-01-01 08:00:00.012”,”1970-01-01 08:00:00.013”,”1970-01-01 08:00:00.014”,”1970-01-01 08:00:00.015”,”1970-01-01 08:00:00.016”,”1970-01-01 08:00:00.017”,”1970-01-01 08:00:00.018”,”1970-01-01 08:00:00.019”,”1970-01-01 08:00:00.02”,”1970-01-01 08:00:00.021”,”1970-01-01 08:00:00.022”],”val”:[19.0,19.0,19.0,19.0,19.0,19.0,19.0,19.0,19.0,19.0,19.0,19.0]},”schema”:”ts TIMESTAMP,val DOUBLE”} |