吳曼曼,徐建新,王 欽
基于數(shù)據(jù)分解的AQI的CEEMD-Elman神經(jīng)網(wǎng)絡(luò)預(yù)測研究
吳曼曼1,2,徐建新2,3*,王 欽1
(1.昆明理工大學(xué)質(zhì)量發(fā)展研究院,云南 昆明 650093;2.省部共建復(fù)雜有色金屬清潔能源利用國家重點(diǎn)實(shí)驗(yàn)室,云南 昆明 650093;3.昆明理工大學(xué)冶金與能源工程學(xué)院,云南 昆明 650093)
針對(duì)Elman神經(jīng)網(wǎng)絡(luò)在預(yù)測空氣質(zhì)量指數(shù)(AQI)時(shí)易受到數(shù)據(jù)非平穩(wěn)性的影響導(dǎo)致預(yù)測趨勢良好但準(zhǔn)確度較低的問題,提出以互補(bǔ)集合經(jīng)驗(yàn)?zāi)B(tài)分解(Complementary Ensemble Empirical Mode Decomposition, CEEMD)為基礎(chǔ)的CEEMD-Elman模型.應(yīng)用CEEMD對(duì)AQI序列分解成不同時(shí)間尺度上的本征模態(tài)函數(shù)分量和剩余分量,進(jìn)而首次將對(duì)非平穩(wěn)的AQI序列的預(yù)測研究轉(zhuǎn)化為對(duì)多個(gè)平穩(wěn)的本征模態(tài)函數(shù)分量的研究.分別與Elman單一模型、EMD-Elman模型、BP單一模型及CEEMD-BP模型進(jìn)行實(shí)驗(yàn)對(duì)比.結(jié)果表明:應(yīng)用該方法建立的模型的均方誤差、平均絕對(duì)誤差和平均絕對(duì)百分比誤差分別為4.80、0.71、1.84%,均小于其他模型結(jié)果;對(duì)應(yīng)空氣質(zhì)量等級(jí)預(yù)報(bào)正確天數(shù)的頻率為94.12%.該模型能有效的降低非平穩(wěn)性對(duì)實(shí)驗(yàn)預(yù)測結(jié)果的影響,實(shí)現(xiàn)對(duì)空氣質(zhì)量等級(jí)的準(zhǔn)確預(yù)報(bào);該研究為進(jìn)一步預(yù)測AQI的走向提供了有效依據(jù),也為政府決策和管理部門制定空氣污染控制提供了更充分的參考.
空氣質(zhì)量指數(shù);互補(bǔ)集合經(jīng)驗(yàn)?zāi)B(tài)分解;偏自相關(guān)函數(shù);Elman神經(jīng)網(wǎng)絡(luò);空氣質(zhì)量等級(jí)
隨著城市化和工業(yè)化的快速發(fā)展,大氣污染已成為人類社會(huì)面臨的嚴(yán)峻挑戰(zhàn).空氣質(zhì)量指數(shù)(Air Quality Index,AQI)作為衡量空氣質(zhì)量的重要指標(biāo),與人類健康息息相關(guān)[1-2].AQI旨在綜合各種污染物(PM2.5、PM10、CO、O3、SO2、NO2)的綜合指數(shù),以數(shù)字的形式描繪空氣質(zhì)量狀況,使公眾能簡明的了解空氣質(zhì)量的優(yōu)劣,根據(jù)空氣質(zhì)量標(biāo)準(zhǔn)(GB3095- 2012)[3]和對(duì)人體健康的各種影響,空氣質(zhì)量指數(shù)分為6類.因此,如何準(zhǔn)確預(yù)測AQI對(duì)控制大氣污染、促進(jìn)人類社會(huì)可持續(xù)發(fā)展具有重要意義.但空氣質(zhì)量指數(shù)的分析與預(yù)測是極其復(fù)雜的,主要是因?yàn)榇髿猸h(huán)境本質(zhì)上是一個(gè)動(dòng)態(tài)、非線性、非平穩(wěn)、有噪聲的系統(tǒng)[4-5],其走勢容易受各種因素的影響:經(jīng)濟(jì)變量、環(huán)境污染物濃度、氣象要素等[6-7].因此如何更準(zhǔn)確地預(yù)測空氣質(zhì)量受到廣泛關(guān)注.
目前大氣污染預(yù)測方法主要分為三種:統(tǒng)計(jì)模型、數(shù)值模型和智能預(yù)測.統(tǒng)計(jì)模型常用的有自回歸綜合移動(dòng)平均(ARIMA)模型[8]、多元線性回歸(MLR)模型[9]、和灰色模型[10],但其顯著的缺點(diǎn)是預(yù)測精度依賴于非線性過程中的線性映射能力[11],對(duì)非線性問題預(yù)測結(jié)果通常不太穩(wěn)定;數(shù)值模型預(yù)測常用的有CMAQ[12]、CAMx[13]和WRF-Chem[14],然而,這類模型初始參數(shù)的要求較高,且大氣化學(xué)演變過程非常復(fù)雜,從而使得模型在預(yù)測過程中存在不確定性,導(dǎo)致預(yù)測結(jié)果的精度較低;智能預(yù)測常用的有人工神經(jīng)網(wǎng)絡(luò)[15-16],人工神經(jīng)網(wǎng)絡(luò)因具備很強(qiáng)的非線性逼近能力和自適應(yīng)、自學(xué)習(xí)等特點(diǎn),在用來研究空氣污染預(yù)測時(shí)具有特定的優(yōu)勢.
針對(duì)使用神經(jīng)網(wǎng)絡(luò)進(jìn)行預(yù)測時(shí),目前大多數(shù)采用的是基于BP(back propagation,BP)算法的靜態(tài)前饋神經(jīng)網(wǎng)絡(luò)[17-18].但隨系統(tǒng)階次的增加或階次未知,迅速擴(kuò)大的網(wǎng)絡(luò)結(jié)構(gòu)使網(wǎng)絡(luò)學(xué)習(xí)的收斂速度減慢且易陷入局部極小值收斂[19].相比之下,動(dòng)態(tài)回歸神經(jīng)網(wǎng)絡(luò)提供了一種極具潛力的選擇,它能夠更生動(dòng)、更直接地反映系統(tǒng)的動(dòng)態(tài)特點(diǎn).Elman回歸神經(jīng)網(wǎng)絡(luò)是一種典型的動(dòng)態(tài)神經(jīng)元網(wǎng)絡(luò),具有適應(yīng)時(shí)變特性的能力.考慮到AQI序列具有動(dòng)態(tài)性的特點(diǎn),可采用Elman網(wǎng)絡(luò)進(jìn)行預(yù)測.但現(xiàn)有的研究大多從污染物指標(biāo)和氣象要素出發(fā),先獲取影響空氣質(zhì)量的關(guān)鍵指標(biāo)進(jìn)而實(shí)現(xiàn)對(duì)AQI序列的預(yù)測,而該研究工作量較為復(fù)雜且易受其他因素影響;且國內(nèi)直接對(duì)AQI序列進(jìn)行預(yù)測的研究較少,AQI預(yù)測的理論體系還不成熟;盡管神經(jīng)網(wǎng)絡(luò)模型具有更好的性能,然而在大多數(shù)實(shí)際情況下,AQI序列是非平穩(wěn)或混沌的,因此,直接利用Elman神經(jīng)網(wǎng)絡(luò)對(duì)其進(jìn)行建模時(shí)預(yù)測精度較低[20-21].
互補(bǔ)集合經(jīng)驗(yàn)?zāi)B(tài)分解是經(jīng)驗(yàn)?zāi)B(tài)分解家族中的一員,由EMD(Empirical Mode Decomposition)和EEMD(Ensemble Empirical Mode Decomposition)演變而來.EMD方法是Huang[22]提出一種基于“篩選”思想將不平穩(wěn)信號(hào)分解成相對(duì)平穩(wěn)的有限個(gè)本征模態(tài)函數(shù)(Intrinsic Mode Function,IMF)分量和剩余分量的方法,EMD方法適用于非線性信號(hào)分解,在非線性信號(hào)處理上具有很大的優(yōu)勢.該算法被認(rèn)為是對(duì)傳統(tǒng)方法的重大突破,但分解結(jié)果往往受到模態(tài)混合的影響.為了解決這個(gè)問題,Wu和Huang提出了EEMD[23],顯著提高了EMD算法的穩(wěn)定性,但在實(shí)現(xiàn)足夠大的集成手段時(shí),需要花費(fèi)大量的時(shí)間,而且不能完全抵消增加的白噪聲.Yeh等[24]引入CEEMD來解決EMD和EEMD問題,CEEMD方法不僅有效地解決了EMD分解的模態(tài)混合問題,而且?guī)缀跬耆藲堄喟自肼暤挠绊懬姨岣吡擞?jì)算效率.為了提高模型的預(yù)測精度和穩(wěn)定性,利用CEEMD消除原始數(shù)據(jù)中的噪聲,將AQI序列分解為若干個(gè)IMF分量和剩余分量,并將其作為Elman神經(jīng)網(wǎng)絡(luò)的輸入.
研究證實(shí),將經(jīng)驗(yàn)?zāi)B(tài)分解技術(shù)與神經(jīng)網(wǎng)絡(luò)的結(jié)合可以為時(shí)間序列預(yù)測提供良好的預(yù)測結(jié)果.例如,Wang等[25]將EMD與Elman相結(jié)合,建立了一種新的風(fēng)速趨勢預(yù)測分解模型,結(jié)果表明,該模型具有較小的誤差,適用于風(fēng)速預(yù)測;Yang等[26]將EEMD和Elman神經(jīng)網(wǎng)絡(luò)結(jié)合,構(gòu)建一種新的空氣質(zhì)量監(jiān)測預(yù)警系統(tǒng),預(yù)測結(jié)果表明,該混合模型具有較高的預(yù)測精度和穩(wěn)定性,Yu等[27]直接將CEEMD分解后的序列與集成極限學(xué)習(xí)機(jī)結(jié)合預(yù)測原油價(jià)格,實(shí)驗(yàn)結(jié)果表明,模型具有更好的預(yù)測性能模型的統(tǒng)計(jì)指標(biāo)和預(yù)測精度.
本文從AQI序列本身出發(fā),將用于處理時(shí)頻信號(hào)的CEEMD分解技術(shù)與Elman神經(jīng)網(wǎng)絡(luò)相結(jié)合,以期改進(jìn)Elman神經(jīng)網(wǎng)絡(luò)在預(yù)測大氣污染準(zhǔn)確度不高的問題.首先,用CEEMD對(duì)AQI序列進(jìn)行分解成多個(gè)IMF分量和剩余分量;其次,用偏自相關(guān)函數(shù)求出AQI滯后期數(shù),以確定Elman網(wǎng)絡(luò)的輸入和輸出變量;再次,分別對(duì)各分量構(gòu)建AQI的CEEMD- Elman預(yù)測模型;最后,將各分量的預(yù)測值相加得到最終預(yù)測值.并將該方法分別與Elman單一神經(jīng)網(wǎng)絡(luò)模型、BP單一網(wǎng)絡(luò)、EMD-Elman模型、EEMD- Elman模型和CEEMD-BP模型的均方誤差、平均絕對(duì)誤差以及平均百分比誤差進(jìn)行對(duì)比分析.
本實(shí)驗(yàn)采用Windows7系統(tǒng)平臺(tái),處理工具為MATLAB和Eviews.
Elman神經(jīng)網(wǎng)絡(luò)是Elman提出的一種典型局部回歸網(wǎng)絡(luò),主要結(jié)構(gòu)是前饋鏈接,包括輸入層、隱含層、輸出層和承接層[28],網(wǎng)絡(luò)示意如圖1所示.一般情況下,輸入層傳輸信號(hào),輸出層單元線性加權(quán),隱含層的傳遞函數(shù)可采用線性或非線性函數(shù),承接層是用來記憶隱含層前一時(shí)刻的輸出值.Elman神經(jīng)網(wǎng)絡(luò)的非線性空間表達(dá)式為[28]:
圖1 Elman神經(jīng)網(wǎng)絡(luò)示意
CEEMD(Complementary Ensemble Empirical Mode Decomposition,CEEMD)[24]是由EMD和EEMD演變而來,可解決EMD存在模態(tài)混疊的問題和EEMD中分解時(shí)間較長的問題.其分解過程如下:
②根據(jù)上下包絡(luò)線求得平均曲線:
⑥用剩余序列重復(fù)上述①-⑤個(gè)步驟,可得到多個(gè)IMF,直到滿足終止條件:
(3)兩組IMF1和IMF2和的集成平均值即為最終的分解結(jié)果IMF.
偏自相關(guān)函數(shù)(Partial Autocorrelation Function, PACF)是辨識(shí)自回歸滑動(dòng)平均模型ARMA(,)中滯后階數(shù)的常用方法,它可以反映時(shí)間序列中任意兩個(gè)變量在排除了中間變量的影響后的相關(guān)性[29].本文利用 PACF 來確定各分量的滯后階數(shù),進(jìn)而確定 Elman神經(jīng)網(wǎng)絡(luò)的輸入層神經(jīng)元個(gè)數(shù)、輸出變量.
1.4.1 流程圖 CEEMD-Elman網(wǎng)絡(luò)預(yù)測模型的思路(具體流程見圖2):首先利用CEEMD對(duì)AQI序列進(jìn)行分解處理,該方法可將AQI序列分解成頻率不同的多個(gè)IMF分量和剩余分量,分別將其作為CEEMD-Elman模型的輸入變量;其次,利用偏自相關(guān)函數(shù)求AQI的滯后期,以確定各子模型的輸入層神經(jīng)元個(gè)數(shù);再次,對(duì)各分量建模組成CEEMD- Elman預(yù)測模型,獲得每個(gè)分量的預(yù)測值;最后,將各個(gè)子模型的預(yù)測結(jié)果相加得到最終預(yù)測結(jié)果.步驟如下:
①對(duì)上海市AQI序列進(jìn)行CEEMD分解,得到多個(gè)IMF分量和剩余分量;
②利用PACF求得AQI序列的滯后階數(shù),進(jìn)而確定子模型的輸入變量;再通過多次訓(xùn)練并對(duì)比訓(xùn)練網(wǎng)絡(luò)誤差大小來確定隱含層的神經(jīng)元個(gè)數(shù),最終確定模型的輸出變量;
④重復(fù)步驟二到步驟三,分別獲得其他分量和剩余分量的預(yù)測結(jié)果;
⑤將各分量和剩余分量的預(yù)測值相加.
圖2 CEEMD-Elman預(yù)測模型的流程
1.4.2 關(guān)鍵參數(shù)的設(shè)定 在進(jìn)行網(wǎng)絡(luò)訓(xùn)練和預(yù)測中,由于權(quán)數(shù)和閾值是由隨機(jī)數(shù)隨機(jī)產(chǎn)生,因此每次運(yùn)行軟件時(shí),網(wǎng)絡(luò)的收斂速度會(huì)有所不同.經(jīng)過反復(fù)對(duì)比,權(quán)的初始值域選擇為(-0.1, 0.1)較好.在模型學(xué)習(xí)過程中隱含層傳遞函數(shù)采用tansig函數(shù),輸出層傳遞函數(shù)采用purelin函數(shù),訓(xùn)練函數(shù)為traingdx,其中,最大迭代次數(shù)為5000次,最多驗(yàn)證失敗次數(shù)為5次,誤差容限為0.00001.
隱含層的節(jié)點(diǎn)數(shù)及神經(jīng)元個(gè)數(shù)也同樣是神經(jīng)網(wǎng)絡(luò)模型中關(guān)鍵參數(shù)之一.理論上,如果神經(jīng)網(wǎng)絡(luò)只要有足夠多的隱含層節(jié)點(diǎn)就可以逼近任意的連續(xù)函數(shù).但目前仍沒有具體的計(jì)算標(biāo)準(zhǔn),大多數(shù)是根據(jù)經(jīng)驗(yàn)來設(shè)定隱含層神經(jīng)元個(gè)數(shù).有研究說明一個(gè)三層神經(jīng)元的隱含層神經(jīng)元個(gè)數(shù)應(yīng)該是該神經(jīng)網(wǎng)絡(luò)輸入神經(jīng)元個(gè)數(shù)的75%[21];而有研究建議最優(yōu)隱藏層的神經(jīng)元個(gè)數(shù)應(yīng)該為其輸入神經(jīng)元個(gè)數(shù)的1.5至3倍[22].筆者結(jié)合這兩種經(jīng)驗(yàn),確定了各模型隱含層神經(jīng)元個(gè)數(shù)的實(shí)驗(yàn)范圍為對(duì)應(yīng)其輸入神經(jīng)元數(shù)目的75%~ 300%.對(duì)CEEMD-Elman模型進(jìn)行反復(fù)訓(xùn)練對(duì)比,通過比較網(wǎng)絡(luò)誤差的大小來判斷隱含層最佳神經(jīng)元數(shù)目.最終確定AQI序列的各子模型的隱含層神經(jīng)元個(gè)數(shù),如表2所示
為更準(zhǔn)確地觀察數(shù)據(jù)的處理情況,選取平均絕對(duì)誤差、均方誤差和平均絕對(duì)百分比誤差3個(gè)指標(biāo).平均絕對(duì)誤差表征預(yù)測誤差的離散程度,數(shù)值越小表明預(yù)測結(jié)果越好;均方誤差是反映預(yù)測值誤差的實(shí)際情況的一種方法,可評(píng)價(jià)數(shù)據(jù)的變化程度,數(shù)值越小越好;平均絕對(duì)百分比誤差說明預(yù)測值與原始值差別程度,數(shù)值越小,表明預(yù)測效果越好.
選取上海市2017年3月10日至2019年3月10日每日空氣質(zhì)量指數(shù)(http://www.semc.com.cn/aqi/ Home/Index)共712個(gè)有效樣本數(shù)據(jù)為研究對(duì)象.樣本數(shù)據(jù)的基本統(tǒng)計(jì)特征如表1所示.表1可知,該樣本數(shù)據(jù)的標(biāo)準(zhǔn)差為34.84,即該組數(shù)據(jù)的波動(dòng)性較大;對(duì)應(yīng)的偏度和峰度為1.19、4.68,J-B(Jarque-Bera)構(gòu)造統(tǒng)計(jì)量結(jié)果為251.64,為0.00,說明要拒絕該序列與正態(tài)分布無顯著差異的原假設(shè),即AQI序列不服從正態(tài)分布.進(jìn)一步對(duì)AQI序列做平穩(wěn)性檢驗(yàn),得到ADF (augmented Dickey-Fuller)值為-0.98,大于臨界值-1.94,也即有充分的理由說明AQI序列是非平穩(wěn)的.綜上檢驗(yàn),嚴(yán)格適合平穩(wěn)序列的方法并不可行,而Elman神經(jīng)網(wǎng)絡(luò)可對(duì)非平穩(wěn)非線性的AQI序列進(jìn)行預(yù)測.
表1 樣本數(shù)據(jù)的基本統(tǒng)計(jì)特征
表2 各分量模型的隱含層神經(jīng)元個(gè)數(shù)
圖3 上海市的AQI序列
選取上海市AQI序列進(jìn)行CEEMD-Elman建模時(shí),利用偏自相關(guān)函數(shù)求得上海市AQI序列滯后期為6,即用前6d的AQI來預(yù)測第7d的AQI,并將其作為Elman模型的輸入層神經(jīng)元個(gè)數(shù);通過對(duì)比模型網(wǎng)絡(luò)誤差大小,最終確定隱含層神經(jīng)元個(gè)數(shù)如表2所示.
上海市AQI(2017年3月10日~2019年3月10日)數(shù)據(jù)如圖3所示.由圖3可知,該序列呈現(xiàn)出較大的波動(dòng)性和一定的趨勢性,但并沒有顯示出清晰可循的規(guī)律性,若想對(duì)其進(jìn)行準(zhǔn)確預(yù)測則存在較大的難度.CEEMD是一種有效處理非平穩(wěn)信號(hào)的方法,可應(yīng)用到廣泛的領(lǐng)域.對(duì)上海市AQI序列的CEEMD分解結(jié)果如圖4所示,分解后共得到9個(gè)IMF分量,即對(duì)非平穩(wěn)AQI的研究可轉(zhuǎn)化為對(duì)9個(gè)平穩(wěn)的IMF分量的研究.由圖4所知,每一個(gè)IMF分量的頻率成分都不相同,且每一時(shí)刻的幅值也不同,但非平穩(wěn)和非線性性質(zhì)都有所降低.
圖4 上海市AQI數(shù)據(jù)的CEEMD分解
選取上海市AQI共712個(gè)有效樣本數(shù)據(jù)為研究對(duì)象.以AQI的2017年3月10日至2018年10月19日共570個(gè)數(shù)據(jù)建立了CEEMD-Elman網(wǎng)絡(luò)預(yù)測模型,并以AQI的2018年10月20日至2019年3月10日共136個(gè)有效數(shù)據(jù)作為檢驗(yàn)數(shù)據(jù)用于模型的預(yù)測.分別對(duì)各分量IMF和剩余分量進(jìn)行CEEMD-Elman建模訓(xùn)練,各分量預(yù)測結(jié)果如圖5所示.
為更直觀的觀察CEEMD-Elman模型的預(yù)測情況,將各子模型預(yù)測值進(jìn)行自適應(yīng)疊加,并分別與Elman單一模型、BP單一模型、EMD-Elman模型和CEEMD-BP模型進(jìn)行對(duì)比分析.各模型AQI的預(yù)測值與實(shí)測值的對(duì)比結(jié)果如圖6所示;各模型的誤差對(duì)比如圖7所示.
根據(jù)《環(huán)境空氣質(zhì)量指數(shù)(AQI)技術(shù)規(guī)定(試行)》(HJ 633—2012)規(guī)定:空氣質(zhì)量指數(shù)劃分為0~50、51~100、101~150、151~200、201~300和大于300六檔,對(duì)應(yīng)于空氣質(zhì)量的六個(gè)級(jí)別.為計(jì)算本模型對(duì)空氣質(zhì)量預(yù)報(bào)的準(zhǔn)確率,按照空氣質(zhì)量劃分標(biāo)準(zhǔn),將各模型預(yù)測的上海市空氣質(zhì)量指數(shù)AQI變換成對(duì)應(yīng)的空氣質(zhì)量等級(jí);并與真實(shí)的AQI對(duì)比,用空氣質(zhì)量等級(jí)預(yù)報(bào)正確天數(shù)的頻率表示預(yù)測準(zhǔn)確率,統(tǒng)計(jì)可得各模型的預(yù)測準(zhǔn)確率如表3所示.
圖6 各模型的預(yù)測值與實(shí)測值對(duì)比曲線
圖7 不同模型的誤差對(duì)比曲線
表3 各模型空氣質(zhì)量等級(jí)預(yù)報(bào)正確天數(shù)的頻率對(duì)比
注: -為真實(shí)值,無準(zhǔn)確率和誤報(bào)率.
由圖5可知:CEEMD-Elman預(yù)測結(jié)果除IMF1外,其余分量預(yù)測結(jié)果基本與實(shí)際值保持一致,也即隨著AQI序列非線性和非平穩(wěn)性的程度降低,預(yù)測值與實(shí)際值的誤差也越來越小.
圖6可知:(1)對(duì)比單一神經(jīng)網(wǎng)絡(luò),3種組合模型都能更好的預(yù)測出上海市AQI的趨勢,預(yù)測具備較好的波動(dòng)性和跟隨性;(2)Elman網(wǎng)絡(luò)相對(duì)BP網(wǎng)絡(luò)表現(xiàn)出更好的預(yù)測趨勢,即對(duì)具有動(dòng)態(tài)性且非線性非平穩(wěn)的上海市AQI進(jìn)行預(yù)測時(shí),Elman神經(jīng)網(wǎng)絡(luò)更合適;(3)CEEMD-Elman模型的預(yù)測趨勢幾乎與上海市AQI趨勢一致,預(yù)測效果最好.
由圖7可知:CEEMD-Elman模型的誤差曲線在=0處上下波動(dòng),且波動(dòng)性最小,說明該模型的跟隨性最強(qiáng),預(yù)測效果與真實(shí)值最為接近,模型精度最高.
由表3可知:原AQI序列按照空氣質(zhì)量等級(jí)分別為優(yōu)、良、輕度污染和中度污染的比例分別是25%、55.88%、17.65%和1.47%;對(duì)比各模型的預(yù)測結(jié)果,CEEMD-Elman與真實(shí)值最為接近,預(yù)測比例分別是22.79%、56.62%、19.85%和0.74%,預(yù)報(bào)正確天數(shù)的頻率為94.12%.
為進(jìn)一步對(duì)比各模型的準(zhǔn)確性,分別對(duì)比計(jì)算各模型的預(yù)測誤差,如表4所示.
表4 各模型誤差指標(biāo)的比較
由表4可知:
(1)對(duì)比單一網(wǎng)絡(luò)模型.BP模型預(yù)測值的均方誤差MSE、平均絕對(duì)誤差MAE和平均絕對(duì)百分比誤差MAPE分別比Elman模型的高出61.31%、39.40%、34.04%,說明Elman模型的預(yù)測值變化較小,即對(duì)于時(shí)間序列預(yù)測時(shí)Elman模型的跟隨能力要強(qiáng)于BP模型.
(2)對(duì)比組合模型.CEEMD-Elman模型的預(yù)測值的均方誤差MSE、平均絕對(duì)誤差MAE和平均絕對(duì)百分比誤差MAPE分別比EMD-Elman模型低了93.80%、52.24%和83.33%;比CEEMD-BP模型96.89%、92.46%和86.83%.
綜合上述誤差指標(biāo)來看,CEEMD-Elman模型預(yù)測精度最高,說明將AQI序列的預(yù)測研究轉(zhuǎn)化為對(duì)多個(gè)IMF分量和剩余分量的研究是可行的.這是因?yàn)閷?shù)據(jù)分解技術(shù)與Elman神經(jīng)網(wǎng)絡(luò)結(jié)合應(yīng)用到對(duì)大氣污染的預(yù)測時(shí),CEEMD很大程度上降低了AQI序列因非平穩(wěn)和噪聲導(dǎo)致的誤差;且建模前利用PACF求得AQI序列的滯后期,考慮了AQI數(shù)值受前一天或者前幾天的影響,因此,相比其他模型,CEEMD-Elman模型更適用于AQI序列的預(yù)測和空氣質(zhì)量等級(jí)的預(yù)報(bào).
3.1 通過CEEMD將非線性非平穩(wěn)的空氣質(zhì)量指數(shù)AQI序列分解成一系列較為平穩(wěn)的IMF分量和剩余分量,將對(duì)AQI序列的預(yù)測研究轉(zhuǎn)化成對(duì)多個(gè)IMF分量的研究,進(jìn)而進(jìn)行建模,說明該模型可以處理非平穩(wěn)的AQI序列.
3.2 將該方法分別與Elman網(wǎng)絡(luò)模型、BP網(wǎng)絡(luò)模型、EMD-Elman模型以及CEEMD-BP模型進(jìn)行對(duì)比,選取上海市AQI序列為例,通過比較誤差指標(biāo)可知CEEMD-Elman模型預(yù)測值與實(shí)際值的平均絕對(duì)誤差、均方誤差以及平均絕對(duì)百分比誤差分別為:4.80、0.71、1.84%,均比其他4種模型預(yù)測效果更好,反映CEEMD-Elman模型的預(yù)測精度得到了較好的改善;對(duì)應(yīng)空氣質(zhì)量等級(jí)預(yù)報(bào)正確天數(shù)的頻率為94.12%,說明了該方法的有效性.
3.3 通過模型的預(yù)測值與實(shí)際值對(duì)比,發(fā)現(xiàn)該方法下,可以得到較為準(zhǔn)確的預(yù)測趨勢和預(yù)測值,說明該模型適用于上海市空氣質(zhì)量的預(yù)測,從而驗(yàn)證了該方法在實(shí)際應(yīng)用中具有一定的適用性.
[1] Gao Q X, Liu J R, Li W T, et al. Comparative analysis and inspiration of air quality index between China and America [J]. Environmental Science, 2015,36(4):1141-1147.
[2] Zhang Q J, Benoi L, Fanny V L, et al. An air quality forecasting system in Beijing-application to the study of dust storm events in China in May 2008 [J]. Journal of Environmental Sciences, 2012,24(1): 102-111.
[3] GB3095-2012 《環(huán)境空氣質(zhì)量標(biāo)準(zhǔn)》[S]. GB3095-2012 Ambient air quality standard [S].
[4] 趙 猛.基于數(shù)據(jù)挖掘技術(shù)的大氣環(huán)境預(yù)測研究[D]. 北京:北京交通大學(xué), 2017.Chen J. Research on atmospheric environment prediction based on data mining technology [D]. Beijing, Beijing Jiaotong University, 2017.
[5] Kurt A, Oktay, A. B. Forecasting air pollutant indicator levels with geographic models 3days in advance using neural networks [J]. Expert Systems with Applications, 2010,37(12):7986-7992.
[6] 張潔瓊,王雅倩,高 爽,等.不同時(shí)間尺度氣象要素與空氣污染關(guān)系的KZ濾波研究[J]. 中國環(huán)境科學(xué), 2018,38(10):3662-3672. Zhang J Q, Wang Y Q, Gao S, et al. study on the relationship between meteorological elements and air pollution at different time scales based on KZ filtering [J]. China Environmental Science, 2018,38(10): 3662-3672.
[7] 江 琪,王 飛,張恒德,等.北京市PM2. 5和反應(yīng)性氣體濃度的變化特征及其與氣象條件的關(guān)系[J]. 中國環(huán)境科學(xué), 2017,37(3):829-837. Jiang Q, Wang F, Zhang H D, et al. Analysis of temporal variation characteristics and meteorological conditions of reactive gas and PM2.5in Beijing [J]. China Environmental Science, 2017,37(3):829-837.
[8] Konovalov I B, Beekman M, Meleux et al. Combining deterministic and statistical approaches for PM10forecasting in Europe [J] Atmospheric Environment, 2009,43:6425–6434.
[9] Zhang H, Zhang W, Palazoglu et al. Prediction of ozone levels using a hidden markov model (HMM) with gamma distribution [J]. Atmospheric Environment, 2012,62:64-73.
[10] 肖 鳴,李衛(wèi)明,劉德富等.基于多重優(yōu)化灰色模型的三峽庫區(qū)香溪河支流回水區(qū)水華變化趨勢預(yù)測研究[J]. 環(huán)境科學(xué)學(xué)報(bào), 2017, 37(3):1153-1161.Xiao M, Li W M, Liu D F, et al. Prediction of algal bloom variation in backwater areas of tributaries in Three-Gorges based on multiple optimized grey model [J]. Acta Scientiae Circumstantiae, 37(3):1153-1161.
[11] Song Y, Qin S, Qu J, et al. The forecasting research of early warning systems for atmospheric pollutants: A case in Yangtze River Delta region [J]. Atmospheric Environment, 2015,118:58–69.
[12] Wang L T, Jang C, Zhang Y, et al. Assessment of air quality benefits from national air pollution control policies in China [J]. Atmospheric Environment, 2010,44(28):3449-3457.
[13] Huang Q, Cheng S Y, Perozzi R E, et al. Use of a MM5-CAMx-PSAT modeling system to study SO2source apportionment in the Beijing metropolitan region [J]. Environmental Modeling & Assessment, 2012,17(5):527-538.
[14] Egan S D, Stuefer M, Webley P W, et al. WRF-Chem modeling of sulfur dioxide emissions from the 2008 Kasatochi Volcano [J]. Annals of Geophysics, 2015,57:1593-5213.DOI:10.4401/ag-6626.
[15] Zhu. S L, Lian X Y, Liu H X, et. al. Daily air quality index forecasting with hybrid models: A case in China [J]. Environment Pollution, 2017,231(2):1232-1244.
[16] Li X, Peng L, Yao X J, et al. Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation [J]. Environment Pollution, 2017,231(1):997-1004.
[17] Bai Y, Li Y, Wan, X X, et al. Air pollutants concentrations forecasting using back propagation neural network based on wavelet decomposition with meteorological conditions [J].Atmospheric Pollution Research, 2016,7(3):557–566.
[18] 張恒德,張庭玉,李 濤等.基于BP神經(jīng)網(wǎng)絡(luò)的污染物濃度多模式集成預(yù)報(bào)[J]. 中國環(huán)境科學(xué), 2018,38(4):1243-1256. Zhang H D, Zhang T Y, Li T, et al. Forecast of air quality pollutants concentrations based on BP neural network multi-model ensemble method [J]. China Environmental Science, 2018,38(4):1243-1256.
[19] White. White. Economic prediction using neural networks: The case of IBM daily stock returns [C]//International Conference on Neural Networks, 1988,2(2):451-458.
[20] Wang P, Liu Y, Qin Z, et al. A novel hybrid forecasting model for PM10and SO2daily concentrations [J]. Science of the Total Environment, 2015,505(505C):1202-1212.
[21] Azman A, Hafizan J,Mohd E T, et al. Prediction of the level of air pollution using principal component analysis and artificial neural network techniques: A case study in Malaysia [J]. Water, Air & Soil Pollution, 2014,225(8):0049-6979.DOI:https://doi.org/10.1007/s11270-014-2063-1.
[22] Huang N E, Shen Z, Long S R, et al. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non- stationary time series analysis [J]. Proceedings A, 1998,454(1971):903-995.
[23] Wu Z, Huang N E. Ensemble empirical mode decomposition: A noise-assisted data analysis method [J]. Advances in Adaptive Data Analysis, 2009,1(1):1?41.
[24] Yeh J R, SHIEH J S, HUANG N E. Complementary ensemble empirical mode decomposition:a Novel Noise Enhanced Data Analysis Method [J]. Advances in Adaptive Data Analysis, 2010,2(2): 135-156.
[25] Wang J J, Zhang W Y, Li Y N, et al. Forecasting wind speed using empirical mode decomposition and Elman neural network [J]. Applied Soft Computing, 2014,23(Complete):452-459.
[26] Yang Z, Wang J. A new air quality monitoring and early warning system: air quality assessment and air pollutant concentration prediction [J]. Environment Research, 2017,158:105-117.
[27] Yu L, Dai W, Tang L. A novel decomposition ensemble model with extended extreme learning machine for crude oil price forecasting [J]. Engineering Applications of Artificial Intelligence, 2016,47:110-121.
[28] ELMAN J L. Finding structure in time [J]. Cognitive Science, 1990, 2(14):179-211.
[29] 宋菁華,楊春節(jié),周 哲等.改進(jìn)型EMD-Elman神經(jīng)網(wǎng)絡(luò)在鐵水硅含量預(yù)測中的應(yīng)用[J]. 化工學(xué)報(bào), 2016,67(3):729-735. Song J H, Yang C J, Zhou Z, et al. Application of improved EMD-Elman neural network to predict silicon content in hot metal [J]. CIESC Journal, 2016.67(3):720-735.
AQI prediction of CEEMD-Elman neural network based on data decomposition.
WU Man-man1,2, XU Jian-xin2,3*, WANG Qin1
(Quality Development Institute, Kunming University of Science and Technology, Kunming 650093, China;2.State Key Laboratory of Complex Nonferrous Metal Resources Clean Utilization, Kunming 650093, China;3.Faculty of Metallurgical and Energy Engineering, Kunming University of Science and Technology, Kunming 650093, China)., 2019, 39(11):4580~4588
Elman neural network (ENN) is susceptible to the non-stationary of data when it is used to predict the Air Quality Index (AQI), resulting in a good forecasting trend but low accuracy. Based on complementary ensemble empirical modal decomposition (CEEMD), a new hybrid model related to ENNwas proposed in this paper. Firstly, CEEMD was employed to decompose the AQI sequence into a finite number of intrinsic mode functions (IMFs) at different time scales and one residue. Secondly, partial autocorrelation function was used to calculate the lag periods of the input variables of each IMF in ENN. Finally, the predicted values of each IMF were summed up to obtain the final predicted result. The study of the AQI without stationarity sequence was then transformed into the study of steady IMFs. The experimental results show that the mean square error, the mean absolute error, and the mean absolute percent error were respectively 4.80, 0.71, and 1.84% which were all less than those of the single Elman network, EMD-Elman model, BP network and CEEMD-BP model. Furthermore, the frequency of the correct forecast for the corresponding air quality grade was 94.12%. It has been concluded that the new model could reduce the volatility impact of real AQI data and effectively predict the air quality grade. This study not only provides an effective evidence to further predict the trend of AQI, but provides a better reference for government decision-making and pollution control formulation of management departments.
air quality index (AQI);complementary ensemble empirical mode decomposition;partial autocorrelation function;Elman neutral network;air quality grade
X823
A
1000-6923(2019)11-4580-09
吳曼曼(1994-),女,安徽阜陽人,昆明理工大學(xué)碩士研究生,主要從事機(jī)器學(xué)習(xí)和時(shí)間序列預(yù)測研究.發(fā)表論文2篇.
2019-04-02
云南省高層次人才引進(jìn)項(xiàng)目(50578020)
* 責(zé)任作者, 教授, xujianxina@163.com