Hourly Time Series Forecasting using Facebook’s Prophet
Zillow Prize: Zillow’s Home Value Prediction (Zestimate)">Zillow Prize: Zillow’s Home Value Prediction (Zestimate)

Hourly Time Series Forecasting using Facebook’s Prophet

基于Facebook的Prophet算法完成时间序列预测
项目URL：https://www.kaggle.com/robikscube/time-series-forecasting-with-prophet
算法具体参数设置：https://blog.csdn.net/wjskeepmaking/article/details/64905745
算法逻辑：https://blog.csdn.net/a358463121/article/details/70194279
底层算法逻辑：https://blog.csdn.net/weixin_39445556/article/details/84502260

基本模型
y(t)=g(t)+s(s)+h(t)+ϵt

这里，模型将时间序列分成3个部分的叠加，其中g(t)g(t)表示增长函数，用来拟合非周期性变化的。s(t)s(t)用来表示周期性变化，比如说每周，每年，季节等，h(t)h(t)表示假期，节日等特殊原因等造成的变化，最后ϵtϵt为噪声项，用他来表示随机无法预测的波动，我们假设ϵtϵt是高斯的。事实上，这是generalized additive model(GAM)模型的特例，但我们这里只用到了时间作为拟合的参数。

数据集：只有历史时间序列数据以及对应时间产生的用电量

Datetime	PJME_MW
2002-12-31 01:00:00	26498.0
2002-12-31 02:00:00	25147.0
2002-12-31 03:00:00	24574.0
2002-12-31 04:00:00	24393.0
2002-12-31 05:00:00	24860.0

# Setup and train model
model = Prophet()
model.fit(pjme_train.resetindex().rename(columns={‘Datetime’:’ds’,’PJME_MW’:’y’}))
#Predict on training set with model_
pjme_test_fcst = model.predict(df=pjme_test.reset_index().rename(columns={‘Datetime’:’ds’}))

随着预测时间的加长，预测精度的波动范围也逐渐扩大

趋势性，日期维度、周维度、月维度

可以增加节假日特征来预测

Zillow Prize: Zillow’s Home Value Prediction (Zestimate)

房价预测
Kaggle URL：https://www.kaggle.com/sudalairajkumar/simple-exploration-notebook-zillow-prize

缺失值检查
missing_df = prop_df.isnull().sum(axis=0).reset_index()
missing_df.columns = [‘column_name’, ‘missing_count’]
missing_df = missing_df.ix[missing_df[‘missing_count’]>0]
missing_df = missing_df.sort_values(by=’missing_count’)

ind = np.arange(missing_df.shape[0])
width = 0.9
fig, ax = plt.subplots(figsize=(12,18))
rects = ax.barh(ind, missing_df.missing_count.values, color=’blue’)
ax.set_yticks(ind)
ax.set_yticklabels(missing_df.column_name.values, rotation=’horizontal’)
ax.set_xlabel(“Count of missing values”)
ax.set_title(“Number of missing values in each column”)
plt.show()

各特征和目标变量之间的相关性
# Now let us look at the correlation coefficient of each of these variables #
x_cols = [col for col in train_df_new.columns if col not in [‘logerror’] if train_df_new[col].dtype==’float64’]

labels = []
values = []
for col in xcols:
labels.append(col)
values.append(np.corrcoef(train_df_new[col].values, train_df_new.logerror.values)[0,1])
corr_df = pd.DataFrame({‘col_labels’:labels, ‘corr_values’:values})
corr_df = corr_df.sort_values(by=’corr_values’)

ind = np.arange(len(labels))
width = 0.9
fig, ax = plt.subplots(figsize=(12,40))
rects = ax.barh(ind, np.array(corr_df.corr_values.values), color=’y’)
ax.set_yticks(ind)
ax.set_yticklabels(corr_df.col_labels.values, rotation=’horizontal’)
ax.set_xlabel(“Correlation coefficient”)
ax.set_title(“Correlation coefficient of the variables”)
#autolabel(rects)_
plt.show()

万象局

2020-01-06

Hourly Time Series Forecasting using Facebook’s Prophet

Zillow Prize: Zillow’s Home Value Prediction (Zestimate)