時間序列預測法及Spark-Timeserial
時間序列預測法
時間序列預測法(Time Series Forecasting Method)
什么是時間序列預測法?
? 一種歷史資料延伸預測,也稱歷史引伸預測法。是以時間數列所能反映的社會經濟現象的發展過程和規律性,進行引伸外推,預測其發展趨勢的方法。
時間序列,也叫時間數列、歷史復數或動態數列。它是將某種統計指標的數值,按時間先后順序排到所形成的數列。時間序列預測法就是通過編制和分析時間序列,根據時間序列所反映出來的發展過程、方向和趨勢,進行類推或延伸,借以預測下一段時間或以后若干年內可能達到的水平。其內容包括:收集與整理某種社會現象的歷史資料;對這些資料進行檢查鑒別,排成數列;分析時間數列,從中尋找該社會現象隨時間變化而變化的規律,得出一定的模式;以此模式去預測該社會現象將來的情況。
時間序列預測法的步驟
? 第一步 收集歷史資料,加以整理,編成時間序列,并根據時間序列繪成統計圖。時間序列分析通常是把各種可能發生作用的因素進行分類,傳統的分類方法是按各種因素的特點或影響效果分為四大類:(1)長期趨勢;(2)季節變動;(3)循環變動;(4)不規則變動。
第二步 分析時間序列。時間序列中的每一時期的數值都是由許許多多不同的因素同時發生作用后的綜合結果。
第三步 求時間序列的長期趨勢(T)季節變動(s)和不規則變動(I)的值,并選定近似的數學模式來代表它們。對于數學模式中的諸未知參數,使用合適的技術方法求出其值。
第四步 利用時間序列資料求出長期趨勢、季節變動和不規則變動的數學模型后,就可以利用它來預測未來的長期趨勢值T和季節變動值s,在可能的情況下預測不規則變動值I。然后用以下模式計算出未來的時間序列的預測值Y:
加法模式T+S+I=Y
乘法模式T×S×I=Y
如果不規則變動的預測值難以求得,就只求長期趨勢和季節變動的預測值,以兩者相乘之積或相加之和為時間序列的預測值。如果經濟現象本身沒有季節變動或不需預測分季分月的資料,則長期趨勢的預測值就是時間序列的預測值,即T=Y。但要注意這個預測值只反映現象未來的發展趨勢,即使很準確的趨勢線在按時間順序的觀察方面所起的作用,本質上也只是一個平均數的作用,實際值將圍繞著它上下波動。
時間序列分析基本特征
? 1.時間序列分析法是根據過去的變化趨勢預測未來的發展,它的前提是假定事物的過去延續到未來。
時間序列分析,正是根據客觀事物發展的連續規律性,運用過去的歷史數據,通過統計分析,進一步推測未來的發展趨勢。事物的過去會延續到未來這個假設前提包含兩層含義:一是不會發生突然的跳躍變化,是以相對小的步伐前進;二是過去和當前的現象可能表明現在和將來活動的發展變化趨向。這就決定了在一般情況下,時間序列分析法對于短、近期預測比較顯著,但如延伸到更遠的將來,就會出現很大的局限性,導致預測值偏離實際較大而使決策失誤。
2.時間序列數據變動存在著規律性與不規律性
時間序列中的每個觀察值大小,是影響變化的各種不同因素在同一時刻發生作用的綜合結果。從這些影響因素發生作用的大小和方向變化的時間特性來看,這些因素造成的時間序列數據的變動分為四種類型。
(1)趨勢性:某個變量隨著時間進展或自變量變化,呈現一種比較緩慢而長期的持續上升、下降、停留的同性質變動趨向,但變動幅度可能不相等。
(2)周期性:某因素由于外部影響隨著自然季節的交替出現高峰與低谷的規律。
(3)隨機性:個別為隨機變動,整體呈統計規律。
(4)綜合性:實際變化情況是幾種變動的疊加或組合。預測時設法過濾除去不規則變動,突出反映趨勢性和周期性變動。
時間序列預測法的分類
時間序列預測法可用于短期預測、中期預測和長期預測。根據對資料分析方法的不同,又可分為:簡單序時平均數法、加權序時平均數法、移動平均法、加權移動平均法、趨勢預測法、指數平滑法、季節性趨勢預測法、市場壽命周期預測法等。
簡單序時平均數法 也稱算術平均法。即把若干歷史時期的統計數值作為觀察值,求出算術平均數作為下期預測值。這種方法基于下列假設:“過去這樣,今后也將這樣”,把近期和遠期數據等同化和平均化,因此只能適用于事物變化不大的趨勢預測。如果事物呈現某種上升或下降的趨勢,就不宜采用此法。
加權序時平均數法 就是把各個時期的歷史數據按近期和遠期影響程度進行加權,求出平均值,作為下期預測值。
簡單移動平均法 就是相繼移動計算若干時期的算術平均數作為下期預測值。
加權移動平均法 即將簡單移動平均數進行加權計算。在確定權數時,近期觀察值的權數應該大些,遠期觀察值的權數應該小些。
上述幾種方法雖然簡便,能迅速求出預測值,但由于沒有考慮整個社會經濟發展的新動向和其他因素的影響,所以準確性較差。應根據新的情況,對預測結果作必要的修正。
指數平滑法 即根據歷史資料的上期實際數和預測值,用指數加權的辦法進行預測。此法實質是由內加權移動平均法演變而來的一種方法,優點是只要有上期實際數和上期預測值,就可計算下期的預測值,這樣可以節省很多數據和處理數據的時間,減少數據的存儲量,方法簡便。是國外廣泛使用的一種短期預測方法。
季節趨勢預測法 根據經濟事物每年重復出現的周期性季節變動指數,預測其季節性變動趨勢。推算季節性指數可采用不同的方法,常用的方法有季(月)別平均法和移動平均法兩種:a.季(月)別平均法。就是把各年度的數值分季(或月)加以平均,除以各年季(或月)的總平均數,得出各季(月)指數。這種方法可以用來分析生產、銷售、原材料儲備、預計資金周轉需要量等方面的經濟事物的季節性變動;b.移動平均法。即應用移動平均數計算比例求典型季節指數。
市場壽命周期預測法 就是對產品市場壽命周期的分析研究。例如對處于成長期的產品預測其銷售量,最常用的一種方法就是根據統計資料,按時間序列畫成曲線圖,再將曲線外延,即得到未來銷售發展趨勢。最簡單的外延方法是直線外延法,適用于對耐用消費品的預測。這種方法簡單、直觀、易于掌握。
時間序列預測法
1.逐步自回歸(StepAR)模型:StepAR模型是有趨勢、季節因素數據的模型類。
2.Winters Method—Additive模型:它是將時勢和乘法季節因素相結合,考慮序列中有規律節波動。
3.ARlMA模型:它是處理帶有趨勢、季節因平穩隨機項數據的模型類[3]。
4.Winters Method—Muhiplicative模型:該方將時同趨勢和乘法季節因素相結合,考慮序列規律的季節波動。時間趨勢模型可根據該序列律的季節波動對該趨勢進行修正。為了能捕捉到季節性,趨勢模型包含每個季節的一個季節參季節因子采用乘法季節因子。隨機時間序列整理匯總歷史上各類保險的數據得到逐月的數據,Winters Method-Multiplicative模型表示為
xt = (a + bt)s(t) + εt (1)
其中a和b為趨勢參數,s(t)為對應于時刻t的這個季節選擇的季節參數,修正方程為。
,
bt = ω2(at ? at ? 1) + (1 ? ω2)bt ? 1 (2)
其中:xt,at,bt,分別為序列在時刻t的實測值、平滑值和平滑趨勢s{t-1}(t)選擇在季節因子被修正之前對應于時刻t的季節因子的過去值。
在該修正系統中,趨勢多項式在當前周期中總是被中心化,以便在t以后的時間里預報值的趨勢多項式的截距參數總是修正后的截距參數at。向前τ個周期的預報值是。
xt + τ = (at + btτ)st(t + τ)(3)
當季節在數據中改變時季節參數被修正,它使用季節實測值與預報值比率的平均值。
5.GARCH(ARCH)模型
帶自相關擾動的回歸模型為。
xt = ξtβ + vt,
,
εt = N(0,σ2) (4)
其中:xt為因變量;ξt為回歸因子構成的列向量;\beta為結構參數構成的列向量;εt為均值是0、方差是一的獨立同分布正態隨機變量。
服從GARCH過程的序列xt,對于t時刻X的條件分布記為
xt | φt ? 1?N(0,ht) (5)
其中\phi_{t-1}表示時間t-1前的所有可用信息,條件方差。
(6)。
其中p≥0,q>0,當p=0時,GARCH(p,q)模型退化為ARCH(p)模型,ARCH參數至少要有一個不為0。
GARCH回歸模型可寫成
,
,
et? N(0,1) (7)
也可以考慮服從自回歸過程的擾動或帶有GARCH誤差的模型,即AR(n)-GARCH(p,q)。
,
,
(8)
其中三次平滑指數(HoltWinters)http://www.dataguru.cn/article-3235-1.html
該部分詳細介紹見
http://wiki.mbalib.com/wiki/%E6%97%B6%E9%97%B4%E5%BA%8F%E5%88%97%E9%A2%84%E6%B5%8B%E6%B3%95
Spark-TimeSerial
spark里面的庫是沒有時間序列算法的,但是國外有人已經寫好了相應的算法。其github網址是:https://github.com/sryza/spark-timeseries
spark-timeserial介紹:https://yq.aliyun.com/articles/70292
實例:http://blog.csdn.net/qq_30232405/article/details/70622400
實際應用。
數據格式(處理過后的):每5分鐘一個值;
預測代碼:
/**
* Created by ${lyp} on 2017/6/21.
*/
case class PV(time:String,key:String,ct: Double );
object RunModel {
val starttime="2017-01-01 00:00:00"
val endtime= "2017-05-31 23:55:00"
val predictedN=288
val outputTableName=""
val modelName="holtwinters"
val sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
val hiveColumnName=List("time","key","ct")
def main(args: Array[String]): Unit = {
Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
val conf= new SparkConf().setAppName("timeserial").setMaster("local")
val sc= new SparkContext(conf)
val sqlContext=new SQLContext(sc)
import sqlContext.implicits._
//create dataframe
val trainData=getData(sc,sqlContext,"src/main/resource/data.csv")
//val vertifyData=getData(sc,sqlContext,"src/main/resource/data2.csv")
trainData.show()
//vertifyData.show()
//create DateTimeIndex
val zone = ZoneId.systemDefault()
var dtIndex:UniformDateTimeIndex=DateTimeIndex.uniformFromInterval(
ZonedDateTime.of(2017, 1, 1, 0, 0, 0, 0, zone),
ZonedDateTime.of(2017, 5, 31, 23, 55, 0, 0, zone),
new MinuteFrequency(5)
)
//create TimeSeriesRDD
val trainTsrdd=TimeSeriesRDD.timeSeriesRDDFromObservations(dtIndex,trainData,hiveColumnName(0),hiveColumnName(1), hiveColumnName(2))
trainTsrdd.foreach(println(_))
//cache
trainTsrdd.cache()
//add absent value "linear", "nearest", "next", or "previous"
val filledTrainTsrdd=trainTsrdd.fill("linear")
//create model
val timeSeriesModel= new TimeSeriesModel(predictedN)
//train model
val forcast=modelName match {
case "arima"=>{
println("begin train")
val (forecast,coefficients)=timeSeriesModel.arimaModelTrain(filledTrainTsrdd)
forecast
}
case "holtwinters"=>{
//季節性參數(12或者4)
val period=288*30
//holtWinters選擇模型:additive(加法模型)、Multiplicative(乘法模型)
val holtWintersModelType="Multiplicative"
val (forecast,sse)=timeSeriesModel.holtWintersModelTrain(filledTrainTsrdd,period,holtWintersModelType)
forecast
}
case _=>throw new UnsupportedOperationException("Currently only supports 'ariam' and 'holtwinters")
}
val time=timeSeriesModel.productStartDatePredictDate(predictedN,endtime,endtime)
forcast.map{
row=>
val key=row._1
val values=row._2.toArray.mkString(",")
(key,values)
}.flatMap(row=>row._2.split(","))saveAsTextFile("src/main/resource/30Multiplicative")
}
def getTrainData(sqlContext:SQLContext):DataFrame={
val data=sqlContext.sql(
s"""
|select time, 'key' as key, ct from tmp_music.tmp_lyp_nginx_result_ct2 where time between ${starttime} and ${endtime}
""".stripMargin)
data
}
def getData(sparkContext: SparkContext,sqlContext:SQLContext,path:String):DataFrame={
val data=sparkContext.textFile(path).map(line=>line.split(",")).map{
line=>
val time =sdf.parse(line(0))
val timestamp= new Timestamp(time.getTime)
Row(timestamp,line(1),line(2).toDouble)
}
val field=Seq(
StructField(hiveColumnName(0), TimestampType, true),
StructField(hiveColumnName(1), StringType, true),
StructField(hiveColumnName(2), DoubleType, true)
)
val schema=StructType(field)
val zonedDateDataDf=sqlContext.createDataFrame(data,schema)
zonedDateDataDf
}
}
/**
* 時間序列模型
* Created by Administrator on 2017/4/19.
*/
class TimeSeriesModel extends Serializable{
//預測后面N個值
private var predictedN=1
//存放的表名字
private var outputTableName="timeseries_output"
def this(predictedN:Int){
this()
this.predictedN=predictedN
}
case class Coefficient(coefficients: String,p: String,d: String,q:String,logLikelihoodCSS:String,arc:String);
/**
* Arima模型:
* 輸出其p,d,q參數
* 輸出其預測的predictedN個值
* @param trainTsrdd
*/
def arimaModelTrain(trainTsrdd:TimeSeriesRDD[String]): (RDD[(String,Vector)],RDD[(String,Coefficient)])={
val predictedN=this.predictedN
//train model
val arimaAndVectorRdd=trainTsrdd.map{line=>
line match {
case (key,denseVector)=>
(key,ARIMA.autoFit(denseVector),denseVector)
}
}
/**參數輸出:p,d,q的實際值和其系數值、最大似然估計值、aic值**/
val coefficients=arimaAndVectorRdd.map{line=>
line match{
case (key,arimaModel,denseVector)=>{
val coefficients=arimaModel.coefficients.mkString(",")
val p=arimaModel.p.toString
val d=arimaModel.d.toString
val q=arimaModel.q.toString
val logLikelihoodCSS=arimaModel.logLikelihoodCSS(denseVector).toString
val arc=arimaModel.approxAIC(denseVector).toString
(key,Coefficient(coefficients,p,d,q,logLikelihoodCSS,arc))
}
}
}
//print coefficients
coefficients.collect().map{f=>
val key=f._1
val coefficients=f._2
println(key+" coefficients:"+coefficients.coefficients+"=>"+"(p="+coefficients.p+",d="+coefficients.d+",q="+coefficients.q+")"
+"logLikelihoodCSS:"+coefficients.logLikelihoodCSS+" arc:"+coefficients.arc)
}
//predict
val forecast= arimaAndVectorRdd.map{
row=>
val key=row._1
val model=row._2
val denseVector=row._3
(key,model.forecast(denseVector,predictedN))
}
//print predict
val forecastValue=forecast.map{
_ match{
case (key,value)=>{
val partArray=value.toArray.mkString(",").split(",")
var forecastArrayBuffer=new ArrayBuffer[Double]()
var i=partArray.length-predictedN
while(i<partArray.length){
forecastArrayBuffer+=partArray(i).toDouble
i=i+1
}
(key,Vectors.dense(forecastArrayBuffer.toArray))
}
}
}
println("Arima forecast of next "+predictedN+" observations:")
forecastValue.foreach(println)
//return forecastValue & coefficients
(forecastValue,coefficients)
}
/**
*實現HoltWinters模型
* @param trainTsrdd
*/
def holtWintersModelTrain(trainTsrdd:TimeSeriesRDD[String],period:Int,holtWintersModelType:String): (RDD[(String,Vector)],RDD[(String,Double)]) ={
//set parms
val predictedN=this.predictedN
//create model
val holtWintersAndVectorRdd=trainTsrdd.map{
row=>
val key=row._1
val denseVector=row._2
//ts: Vector, period: Int, modelType: String = "additive", method: String = "BOBYQA"
val model=HoltWinters.fitModel(denseVector,period,holtWintersModelType)
(key,model,denseVector)
}
//create dist vector
val predictedArrayBuffer=new ArrayBuffer[Double]()
var i=0
while(i<predictedN){
predictedArrayBuffer+=i
i=i+1
}
val predictedVectors=Vectors.dense(predictedArrayBuffer.toArray)
//predict
val forecast=holtWintersAndVectorRdd.map{
row=>
val key=row._1
val model=row._2
val denseVector=row._3
val forcaset=model.forecast(denseVector,predictedVectors)
(key,forcaset)
}
//print predict
println("HoltWinters forecast of next "+predictedN+" observations:")
forecast.foreach(println)
//sse- to get sum of squared errors
val sse=holtWintersAndVectorRdd.map{
row=>
val key=row._1
val model=row._2
val vector=row._3
(key,model.sse(vector))
}
return (forecast,sse)
}
/**
* 批量生成日期(具體到分鐘秒),用來保存
* 格式為:yyyy-MM-dd HH:mm:ss
* @param predictedN
* @param startTime
* @param endTime
*/
def productStartDatePredictDate(predictedN:Int,startTime:String,endTime:String): ArrayBuffer[String] ={
//形成開始start到預測predicted的日期
var dateArrayBuffer=new ArrayBuffer[String]()
val dateFormat= new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
val cal1 = Calendar.getInstance()
val cal2 = Calendar.getInstance()
val st=dateFormat.parse(startTime)
val et=dateFormat.parse(endTime)
//設置訓練數據中開始和結束日期
cal1.set(st.getYear,st.getMonth,st.getDay,st.getHours,st.getMinutes,st.getSeconds)
cal2.set(et.getYear,et.getMonth,et.getDay,et.getHours,et.getMinutes,et.getSeconds)
//間隔差
val minuteDiff=(cal2.getTimeInMillis-cal1.getTimeInMillis)/ (1000 * 60 * 5)+predictedN
var iMinuteDiff = 0;
while(iMinuteDiff<=minuteDiff){
cal1.add(Calendar.MINUTE,5)
//保存日期
dateArrayBuffer+=dateFormat.format(cal1.getTime)
iMinuteDiff=iMinuteDiff+1;
}
dateArrayBuffer
}
}
Holtwinters實現
/**
* Copyright (c) 2015, Cloudera, Inc. All Rights Reserved.
*
* Cloudera, Inc. licenses this file to you under the Apache License,
* Version 2.0 (the "License"). You may not use this file except in
* compliance with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* This software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
* CONDITIONS OF ANY KIND, either express or implied. See the License for
* the specific language governing permissions and limitations under the
* License.
*/
package com.cloudera.sparkts.models
import org.apache.commons.math3.analysis.MultivariateFunction
import org.apache.spark.mllib.linalg._
import org.apache.commons.math3.optim.MaxIter
import org.apache.commons.math3.optim.nonlinear.scalar.ObjectiveFunction
import org.apache.commons.math3.optim.MaxEval
import org.apache.commons.math3.optim.SimpleBounds
import org.apache.commons.math3.optim.nonlinear.scalar.noderiv.BOBYQAOptimizer
import org.apache.commons.math3.optim.InitialGuess
import org.apache.commons.math3.optim.nonlinear.scalar.GoalType
/**
* Triple exponential smoothing takes into account seasonal changes as well as trends.
* Seasonality is de?ned to be the tendency of time-series data to exhibit behavior that repeats
* itself every L periods, much like any harmonic function.
*
* The Holt-Winters method is a popular and effective approach to forecasting seasonal time series
*
* See https://en.wikipedia.org/wiki/Exponential_smoothing#Triple_exponential_smoothing
* for more information on Triple Exponential Smoothing
* See https://www.otexts.org/fpp/7/5 and
* https://stat.ethz.ch/R-manual/R-devel/library/stats/html/HoltWinters.html
* for more information on Holt Winter Method.
*/
object HoltWinters {
/**
* Fit HoltWinter model to a given time series. Holt Winter Model has three parameters
* level, trend and season component of time series.
* We use BOBYQA optimizer which is used to calculate minimum of a function with
* bounded constraints and without using derivatives.
* See http://www.damtp.cam.ac.uk/user/na/NA_papers/NA2009_06.pdf for more details.
*
* @param ts Time Series for which we want to fit HoltWinter Model
* @param period Seasonality of data i.e period of time before behavior begins to repeat itself
* @param modelType Two variations differ in the nature of the seasonal component.
* Additive method is preferred when seasonal variations are roughly constant through the series,
* Multiplicative method is preferred when the seasonal variations are changing
* proportional to the level of the series.
* @param method: Currently only BOBYQA is supported.
*/
def fitModel(ts: Vector, period: Int, modelType: String = "additive", method: String = "BOBYQA")
: HoltWintersModel = {
method match {
case "BOBYQA" => fitModelWithBOBYQA(ts, period, modelType)
case _ => throw new UnsupportedOperationException("Currently only supports 'BOBYQA'")
}
}
def fitModelWithBOBYQA(ts: Vector, period: Int, modelType:String): HoltWintersModel = {
val optimizer = new BOBYQAOptimizer(7)
val objectiveFunction = new ObjectiveFunction(new MultivariateFunction() {
def value(params: Array[Double]): Double = {
new HoltWintersModel(modelType, period, params(0), params(1), params(2)).sse(ts)
}
})
// The starting guesses in R's stats:HoltWinters
val initGuess = new InitialGuess(Array(0.3, 0.1, 0.1))
val maxIter = new MaxIter(30000)
val maxEval = new MaxEval(30000)
val goal = GoalType.MINIMIZE
val bounds = new SimpleBounds(Array(0.0, 0.0, 0.0), Array(1.0, 1.0, 1.0))
val optimal = optimizer.optimize(objectiveFunction, goal, bounds,initGuess, maxIter, maxEval)
val params = optimal.getPoint
new HoltWintersModel(modelType, period, params(0), params(1), params(2))
}
}
class HoltWintersModel(
val modelType: String,
val period: Int,
val alpha: Double,
val beta: Double,
val gamma: Double) extends TimeSeriesModel {
if (!modelType.equalsIgnoreCase("additive") && !modelType.equalsIgnoreCase("multiplicative")) {
throw new IllegalArgumentException("Invalid model type: " + modelType)
}
val additive = modelType.equalsIgnoreCase("additive")
/**
* Calculates sum of squared errors, used to estimate the alpha and beta parameters
*
* @param ts A time series for which we want to calculate the SSE, given the current parameters
* @return SSE
*/
def sse(ts: Vector): Double = {
val n = ts.size
val smoothed = new DenseVector(Array.fill(n)(0.0))
addTimeDependentEffects(ts, smoothed)
var error = 0.0
var sqrErrors = 0.0
// We predict only from period by using the first period - 1 elements.
for(i <- period to (n - 1)) {
error = ts(i) - smoothed(i)
sqrErrors += error * error
}
sqrErrors
}
/**
* {@inheritDoc}
*/
override def removeTimeDependentEffects(ts: Vector, dest: Vector = null): Vector = {
throw new UnsupportedOperationException("not yet implemented")
}
/**
* {@inheritDoc}
*/
override def addTimeDependentEffects(ts: Vector, dest: Vector): Vector = {
val destArr = dest.toArray
val fitted = getHoltWintersComponents(ts)._1
for (i <- 0 to (dest.size - 1)) {
destArr(i) = fitted(i)
}
dest
}
/**
* Final prediction Value is sum of level trend and season
* But in R's stats:HoltWinters additional weight is given for trend
*
* @param ts
* @param dest
*/
def forecast(ts: Vector, dest: Vector): Vector = {
val destArr = dest.toArray
val (_, level, trend, season) = getHoltWintersComponents(ts)
val n = ts.size
val finalLevel = level(n - period)
val finalTrend = trend(n - period)
val finalSeason = new Array[Double](period)
for (i <- 0 until period) {
finalSeason(i) = season(i + n - period)
}
for (i <- 0 until dest.size) {
destArr(i) = if (additive) {
(finalLevel + (i + 1) * finalTrend) + finalSeason(i % period)
} else {
(finalLevel + (i + 1) * finalTrend) * finalSeason(i % period)
}
}
dest
}
/**
* Start from the intial parameters and then iterate to find the final parameters
* using the equations of HoltWinter Method.
* See https://www.otexts.org/fpp/7/5 and
* https://stat.ethz.ch/R-manual/R-devel/library/stats/html/HoltWinters.html
* for more information on Holt Winter Method equations.
*
* @param ts A time series for which we want the HoltWinter parameters level,trend and season.
* @return (level trend season). Final vectors of level trend and season are returned.
*/
def getHoltWintersComponents(ts: Vector): (Vector, Vector, Vector, Vector) = {
val n = ts.size
require(n >= 2, "Requires length of at least 2")
val dest = new Array[Double](n)
val level = new Array[Double](n)
val trend = new Array[Double](n)
val season = new Array[Double](n)
val (initLevel, initTrend, initSeason) = initHoltWinters(ts)
level(0) = initLevel
trend(0) = initTrend
for (i <- 0 until initSeason.size){
season(i) = initSeason(i)
}
for (i <- 0 to (n - period - 1)) {
dest(i + period) = level(i) + trend(i)
// Add the seasonal factor for additive and multiply for multiplicative model.
if (additive) {
dest(i + period) += season(i)
} else {
dest(i + period) *= season(i)
}
val levelWeight = if (additive) {
ts(i + period) - season(i)
} else {
ts(i + period) / season(i)
}
level(i + 1) = alpha * levelWeight + (1 - alpha) * (level(i) + trend(i))
trend(i + 1) = beta * (level(i + 1) - level(i)) + (1 - beta) * trend(i)
val seasonWeight = if (additive) {
ts(i + period) - level(i + 1)
} else {
ts(i + period) / level(i + 1)
}
season(i + period) = gamma * seasonWeight + (1 - gamma) * season(i)
}
(Vectors.dense(dest), Vectors.dense(level), Vectors.dense(trend), Vectors.dense(season))
}
def getKernel(): (Array[Double]) = {
if (period % 2 == 0){
val kernel = Array.fill(period + 1)(1.0 / period)
kernel(0) = 0.5 / period
kernel(period) = 0.5 / period
kernel
} else {
Array.fill(period)(1.0 / period)
}
}
/**
* Function to calculate the Weighted moving average/convolution using above kernel/weights
* for input data.
* See http://robjhyndman.com/papers/movingaverage.pdf for more information
* @param inData Series on which you want to do moving average
* @param kernel Weight vector for weighted moving average
*/
def convolve(inData: Array[Double], kernel: Array[Double]): (Array[Double]) = {
val kernelSize = kernel.size
val dataSize = inData.size
val outData = new Array[Double](dataSize - kernelSize + 1)
var end = 0
while (end <= (dataSize - kernelSize)) {
var sum = 0.0
for (i <- 0 until kernelSize) {
sum += kernel(i) * inData(end + i)
}
outData(end) = sum
end += 1
}
outData
}
/**
* Function to get the initial level, trend and season using method suggested in
* http://robjhyndman.com/hyndsight/hw-initialization/
* @param ts
*/
def initHoltWinters(ts: Vector): (Double, Double, Array[Double]) = {
val arrTs = ts.toArray
// Decompose a window of time series into level trend and seasonal using convolution
val kernel = getKernel()
val kernelSize = kernel.size
val trend = convolve(arrTs.take(period * 2), kernel)
// Remove the trend from time series. Subtract for additive and divide for multiplicative
val n = (kernelSize -1) / 2
val removeTrend = arrTs.take(period * 2).zip(
Array.fill(n)(0.0) ++ trend ++ Array.fill(n)(0.0)).map{
case (a, t) =>
if (t != 0){
if (additive) {
(a - t)
} else {
(a / t)
}
} else{
0
}
}
// seasonal mean is sum of mean of all season values of that period
val seasonalMean = removeTrend.splitAt(period).zipped.map { case (prevx, x) =>
if (prevx == 0 || x == 0) (x + prevx) else (x + prevx) / 2
}
val meanOfFigures = seasonalMean.sum / period
// The seasonal mean is then centered and removed to get season.
// Subtract for additive and divide for multiplicative.
val initSeason = if (additive) {
seasonalMean.map(_ - meanOfFigures )
} else {
seasonalMean.map(_ / meanOfFigures )
}
// Do Simple Linear Regression to find the initial level and trend
val indices = 1 to trend.size
val xbar = (indices.sum: Double) / indices.size
val ybar = trend.sum / trend.size
val xxbar = indices.map( x => (x - xbar) * (x - xbar) ).sum
val xybar = indices.zip(trend).map {
case (x, y) => (x - xbar) * (y - ybar)
}.sum
val initTrend = xybar / xxbar
val initLevel = ybar - (initTrend * xbar)
(initLevel, initTrend, initSeason)
}
}
預測結果:
折線圖:
actural:6/1當天的實際值
Hotwind7*MUTI :預測值(周期為一周,乘法)
Hotwind30*MUTI :預測值(周期為一個月,乘法)
其余:預測值與實際值的差值
結論:
按照5分鐘為時間間隔。最小的周期性為天。即period=288。這個周期誤差較其余兩者稍微大一些。選擇一周288 * 7 或者一個月288 * 30 ,效果如圖。還可以。但其實這個值怎么選。有點存疑,即使根據sum of squared errors來看。