吳華瑞,李慶學(xué),繆祎晟,宋玉玲
基于正則化與時(shí)空約束改進(jìn)K最近鄰算法的農(nóng)業(yè)物聯(lián)網(wǎng)數(shù)據(jù)重構(gòu)
吳華瑞1,2,李慶學(xué)1,2,繆祎晟1,2,宋玉玲3
(1. 國(guó)家農(nóng)業(yè)信息化工程技術(shù)研究中心,北京 100097;2. 北京農(nóng)林科學(xué)院北京農(nóng)業(yè)信息技術(shù)研究中心,北京 100097;3.農(nóng)業(yè)農(nóng)村部農(nóng)業(yè)物聯(lián)網(wǎng)重點(diǎn)實(shí)驗(yàn)室,楊凌 712100)
針對(duì)農(nóng)業(yè)復(fù)雜環(huán)境易發(fā)的物聯(lián)網(wǎng)感知數(shù)據(jù)丟失異常問題,該文提出一種基于正則化懲罰的K最近鄰數(shù)據(jù)重構(gòu)方法(K nearest neighbor-regularization penalty,KNN-RP),采用嶺回歸方法對(duì)最近鄰方法中的最小二乘因子進(jìn)行正則化,并討論了懲罰項(xiàng)的范數(shù)選取形式。通過對(duì)農(nóng)業(yè)物聯(lián)網(wǎng)感知數(shù)據(jù)的時(shí)空穩(wěn)定性與相關(guān)性分析,確定了時(shí)間與空間約束矩陣的定義方式。采用溫室數(shù)據(jù)樣本對(duì)算法性能進(jìn)行交叉驗(yàn)證,結(jié)果顯示該文的KNN-RP性能在點(diǎn)丟失模型下優(yōu)于KNN、反距離加權(quán)KNN算法以及DT算法,而在塊丟失模型下優(yōu)于KNN和反距離加權(quán)KNN算法,略低于DT算法,提高了農(nóng)業(yè)物聯(lián)網(wǎng)的感知數(shù)據(jù)質(zhì)量。該研究可為基于物聯(lián)網(wǎng)數(shù)據(jù)的農(nóng)業(yè)生產(chǎn)決策提供參考。
算法;模型;農(nóng)業(yè)物聯(lián)網(wǎng);數(shù)據(jù)重構(gòu);聚類回歸
農(nóng)業(yè)物聯(lián)網(wǎng)是進(jìn)行農(nóng)業(yè)環(huán)境感知、生產(chǎn)決策管理與科研分析等的重要數(shù)據(jù)來(lái)源,其數(shù)據(jù)的精度與質(zhì)量對(duì)研究與決策結(jié)果有著重要影響。由于傳感器、網(wǎng)絡(luò)鏈路、采集節(jié)點(diǎn)等的軟硬件故障難以避免,農(nóng)業(yè)物聯(lián)網(wǎng)存在數(shù)據(jù)錯(cuò)誤、缺失等問題進(jìn)而降低感知數(shù)據(jù)質(zhì)量[1-4]。農(nóng)業(yè)生產(chǎn)監(jiān)測(cè)中環(huán)境復(fù)雜惡劣、信道條件復(fù)雜與網(wǎng)絡(luò)能量受限等條件與特征均會(huì)提高數(shù)據(jù)異常的概率。不僅如此,隨著網(wǎng)絡(luò)規(guī)模的擴(kuò)大,其故障出現(xiàn)的頻率也隨之升高[2]。為提高農(nóng)業(yè)物聯(lián)網(wǎng)監(jiān)測(cè)數(shù)據(jù)的完整性與質(zhì)量,有效的數(shù)據(jù)重構(gòu)方法是有待研究解決的關(guān)鍵問題。
缺失數(shù)據(jù)重構(gòu)方法在許多領(lǐng)域有著重要的作用[5-7]。常見的數(shù)據(jù)插值重構(gòu)方法有線性插值法、移動(dòng)平均法、基于機(jī)器學(xué)習(xí)重構(gòu)方法與基于壓縮感知的數(shù)據(jù)重構(gòu)方法等。線性插值法、移動(dòng)平均法僅適用于線性度較高數(shù)據(jù)的重構(gòu)問題,對(duì)于農(nóng)業(yè)環(huán)境的高度非線性數(shù)據(jù)這2類方法重構(gòu)精度較低。同理,多元回歸法雖然可以實(shí)現(xiàn)非線性數(shù)據(jù)高精度擬合,但隨著數(shù)據(jù)非線性的提高,該方法的變量數(shù)迅速增加,算法復(fù)雜度呈指數(shù)級(jí)增加?;跈C(jī)器學(xué)習(xí)的數(shù)據(jù)重構(gòu)算法,如K最近鄰(K-nearest neighbor,KNN),delaunay triangulation(DT)以及多通道奇異譜分析。這些方法通常僅適用于少數(shù)缺失值場(chǎng)景,當(dāng)缺失值較多時(shí),該類方法的性能下降明顯[2]。潘立強(qiáng)等[8]將時(shí)間估計(jì)方法與空間估計(jì)方法(multiple regression,MR)相結(jié)合,對(duì)無(wú)線傳感器網(wǎng)絡(luò)的感知缺失值進(jìn)行估計(jì),結(jié)果顯示單純的時(shí)空估計(jì)算法對(duì)相對(duì)平穩(wěn)信號(hào)的估計(jì)精度更高。Kong等[2]提出一種面向高數(shù)據(jù)丟失率的改進(jìn)型壓縮感知與重構(gòu)方法,并從單參數(shù)、多參數(shù)角度給出了重構(gòu)精度分析。Sun等[5]提出了一種面向數(shù)據(jù)塊的稀疏貝葉斯學(xué)習(xí)算法,利用數(shù)據(jù)的塊屬性與固有結(jié)構(gòu)重建變換域的CS稀疏系數(shù)進(jìn)而恢復(fù)原始信號(hào)。Eldar等[9]根據(jù)數(shù)據(jù)塊的相干度量導(dǎo)出了塊稀疏信號(hào)的不確定關(guān)系,并提出正交匹配追蹤算法的K-稀疏信號(hào)重構(gòu)方法,利用塊稀疏性實(shí)現(xiàn)了更好的重構(gòu)性能。上述研究顯示,單一維度的數(shù)據(jù)時(shí)空關(guān)聯(lián)插值與重構(gòu)方法研究已較為充分,但基于多參數(shù)間聯(lián)系或基于數(shù)據(jù)塊稀疏的重構(gòu)方法是目前研究的熱點(diǎn)[10-12]。在基于壓縮感知的數(shù)據(jù)重構(gòu)方面也有諸多研究,傳統(tǒng)基于靜態(tài)數(shù)據(jù)的壓縮感知方法難以體現(xiàn)農(nóng)業(yè)物聯(lián)網(wǎng)數(shù)據(jù)的動(dòng)態(tài)變化特征,而動(dòng)態(tài)壓縮感知算法的高復(fù)雜度又較難適用于資源受限的農(nóng)業(yè)無(wú)線傳感器網(wǎng)絡(luò)[1,5,13-16]。
綜上所述,本文結(jié)合農(nóng)業(yè)環(huán)境數(shù)據(jù)的時(shí)間、空間、參數(shù)間關(guān)聯(lián)特性研究提出一種基于正則化懲罰與時(shí)空約束的改進(jìn)KNN方法,以期提高農(nóng)業(yè)物聯(lián)網(wǎng)監(jiān)測(cè)數(shù)據(jù)的重構(gòu)準(zhǔn)確性。
農(nóng)業(yè)物聯(lián)網(wǎng)監(jiān)測(cè)應(yīng)用多采用無(wú)線傳感器網(wǎng)絡(luò)(wireless sensor network, WSN)技術(shù),對(duì)于WSN數(shù)據(jù)重構(gòu)應(yīng)用場(chǎng)景,一般采用矩陣形式表示重構(gòu)前后數(shù)據(jù),環(huán)境參數(shù)矩陣(environment matrix, EM)定義為[17-19]
式中為第個(gè)節(jié)點(diǎn),為第個(gè)時(shí)間點(diǎn)。
農(nóng)業(yè)WSN因?yàn)檐浻布收匣蛐诺梨溌返葐栴}易出現(xiàn)數(shù)據(jù)丟失或異常,其中異常數(shù)據(jù)經(jīng)異常檢測(cè)算法檢出后刪除,也可視為丟失數(shù)據(jù)一同處理[18]。此時(shí)EM矩陣中會(huì)出現(xiàn)0值項(xiàng),則定義數(shù)據(jù)丟失矩陣(data missing matrix, DMM)表征數(shù)據(jù)的丟失情況[2,20-21]
則WSN實(shí)際采集到的數(shù)據(jù)可表示為感知矩陣(perception matrix, PM)
P
=
B
.×
Y
(3)
式中.×表示矩陣對(duì)應(yīng)元素相乘。
農(nóng)業(yè)WSN數(shù)據(jù)重構(gòu)方法的目標(biāo)即是要從采集獲得的數(shù)據(jù)矩陣中恢復(fù)出數(shù)據(jù)矩陣,使其盡可能地接近原始數(shù)據(jù)矩陣[22-24]。
農(nóng)業(yè)WSN數(shù)據(jù)采集應(yīng)用的數(shù)據(jù)丟失模型主要有其下幾種[2,25]:
1)單點(diǎn)隨機(jī)丟失模型
這是最簡(jiǎn)單的數(shù)據(jù)丟失模型。矩陣中的數(shù)據(jù)是獨(dú)立隨機(jī)地丟棄的,即丟失的數(shù)據(jù)點(diǎn)是隨機(jī)分布在感知矩陣PM中的。一般WSN的信號(hào)噪聲和節(jié)點(diǎn)接入碰撞是這種模式的根本原因。
2)塊隨機(jī)丟失模型
塊隨機(jī)丟失模型表現(xiàn)為感知矩陣PM中存在部分相鄰數(shù)據(jù)同時(shí)丟失的現(xiàn)象,根據(jù)丟失數(shù)據(jù)的相鄰排列維度不同主要可分為空間序列塊丟失、時(shí)間序列塊丟失以及參數(shù)序列塊丟失模型。
其中時(shí)間序列塊丟失模型為某節(jié)點(diǎn)的數(shù)據(jù)在時(shí)間序列上出現(xiàn)頻繁丟失,可以表現(xiàn)為持續(xù)性丟失和間歇性丟失。在農(nóng)業(yè)WSN應(yīng)用場(chǎng)景中,不可靠的鏈路是常見的現(xiàn)象,當(dāng)鏈路質(zhì)量不好時(shí),感知數(shù)據(jù)易出現(xiàn)時(shí)間序列塊丟失。
空間序列塊模型為某時(shí)間節(jié)點(diǎn)上相鄰節(jié)點(diǎn)的數(shù)據(jù)一同丟失。農(nóng)業(yè)WSN網(wǎng)絡(luò)擁塞是導(dǎo)致高密度多傳感器節(jié)點(diǎn)數(shù)據(jù)丟失的主要原因。
參數(shù)序列塊丟失表現(xiàn)為某節(jié)點(diǎn)多個(gè)參數(shù)的同時(shí)丟失,農(nóng)業(yè)WSN節(jié)點(diǎn)傳感器硬件故障是造成參數(shù)序列塊丟失的主要原因。
3)混合丟失模型
在實(shí)際應(yīng)用中一般丟失都由多種因素同時(shí)造成,但由于混合模型較為復(fù)雜,在具體分析時(shí)一般分解為前2種模型進(jìn)行處理。
本文提出了一種改進(jìn)的K最近鄰回歸算法以解決農(nóng)業(yè)WSN場(chǎng)景下的缺失數(shù)據(jù)重構(gòu)問題。傳統(tǒng)的方法多采用時(shí)間、空間的關(guān)聯(lián)性進(jìn)行關(guān)聯(lián)估計(jì),農(nóng)業(yè)場(chǎng)景下的WSN除了上述關(guān)聯(lián)外,其部分參數(shù)還具有顯著的參數(shù)間相關(guān)性與周期性。因此,本文方法重點(diǎn)從參數(shù)間二階相關(guān)性方面對(duì)KNN算法進(jìn)行改進(jìn)。以恢復(fù)矩陣與原始矩陣盡可能接近為算法優(yōu)化目標(biāo),即
由于農(nóng)業(yè)環(huán)境參數(shù)的連續(xù)時(shí)空特性,農(nóng)業(yè)物聯(lián)網(wǎng)數(shù)據(jù)在時(shí)間與空間維度上展現(xiàn)出明顯的相關(guān)特性。由圖1可以看出,農(nóng)業(yè)物聯(lián)網(wǎng)不同區(qū)域節(jié)點(diǎn)數(shù)據(jù)的變化趨勢(shì)較為接近,表明WSN節(jié)點(diǎn)間參數(shù)的高度空間相關(guān)性。
圖1 相同時(shí)間段不同區(qū)域節(jié)點(diǎn)的環(huán)境溫度折線圖
由圖2可以看出,農(nóng)業(yè)WSN數(shù)據(jù)的在時(shí)間軸上呈現(xiàn)明顯的周期性特征,而且如溫度、濕度、光照強(qiáng)度等幾乎以同一周期進(jìn)行變化,表明WSN節(jié)點(diǎn)間參數(shù)的高度時(shí)間關(guān)聯(lián)性。
圖2 同節(jié)點(diǎn)多參數(shù)周期變化曲線
在KNN算法之中引入最小二乘法作為損失函數(shù),這里最小二乘模型可表示為
對(duì)于最小二乘法的損失函數(shù)而言,當(dāng)不是列滿秩,或者某些列之間的線性相關(guān)性比較大時(shí),T的行列式接近于0,即為非奇異陣,計(jì)算(T)-1時(shí)誤差會(huì)很大,難以保證有唯一的最優(yōu)解。嶺回歸是在最小二乘法的基礎(chǔ)上引入懲罰項(xiàng)約束,雖然損失了無(wú)偏性,但可獲得較高的數(shù)值穩(wěn)定性與計(jì)算精度。具體為將其主對(duì)角元素都加上一個(gè)常數(shù),可以使矩陣滿秩,滿足最優(yōu)解求解條件。在訓(xùn)練數(shù)據(jù)較少時(shí)帶正則化懲罰項(xiàng)的嶺回歸有較好的效果,于是有[29-30]
式(6)較式(5)多出的項(xiàng)即為正則化因子,其中是一個(gè)大于零的系數(shù),控制懲罰項(xiàng)的力度。采用2范數(shù)作為懲罰項(xiàng)雖然可以保證最優(yōu)解的唯一性,但得出的解未必是稀疏的,會(huì)對(duì)KNN算法中的取值造成影響而影響結(jié)果的穩(wěn)定性與可靠性,本文采用2,1范數(shù)替代。
2,1范數(shù)較好的融合了1范數(shù)的稀疏性特點(diǎn),又擁有了2范數(shù)防止損失函數(shù)過擬合的特點(diǎn),較適用于噪聲較大的高維農(nóng)業(yè)WSN數(shù)據(jù)處理。將式(7)替換式(6)中的2范數(shù),則有
由于式(8)是凸函數(shù),因此,可對(duì)w(1≤≤)求導(dǎo)并令其為0,可得
洪子誠(chéng)先生認(rèn)為“鏡頭”即詩(shī)的意象,從而對(duì)北島早期詩(shī)歌中的意象群展開分析。他提出了兩組基本的意象群。一個(gè)是作為理想世界、人道世界的象征物存在的,如天空、鮮花、紅玫瑰、橘子、土地、野百合等。另一個(gè)帶有否定色彩和批判意味,如網(wǎng),生銹的鐵柵欄,頹敗的墻,破敗的古寺等,“表示對(duì)人的正常的、人性的生活的破壞、阻隔,對(duì)人的自由精神的禁錮。”[5]北島早期的詩(shī)意象的涵義過于確定。到了《觸電》這里,我們會(huì)發(fā)現(xiàn)其意象的設(shè)置與北島早期詩(shī)歌有明顯的不同?!队|電》中的意象,如“握手”,所指不明,與日常生活和傳統(tǒng)意象都有距離和阻隔,只給讀者一模糊的感知,卻難以找到詞語(yǔ)明確地與之對(duì)應(yīng)。
將式(9)改變形式可得
基于2,1范數(shù)正則化懲罰的KNN算法具體步驟如下:1)將輸入的樣本數(shù)據(jù)進(jìn)行歸一化處理;2)按照式(10)對(duì)初始陣進(jìn)行迭代,直至不再變化,此時(shí)則為最優(yōu)的;3)根據(jù)式(8)對(duì)訓(xùn)練樣本數(shù)據(jù)進(jìn)行測(cè)算,得出最佳的值;4)在步驟3)中得出的值基礎(chǔ)上,根據(jù)式(8)對(duì)測(cè)試樣本數(shù)據(jù)進(jìn)行測(cè)算,得出缺失樣本的估計(jì)值。
如前文所述,農(nóng)業(yè)物聯(lián)網(wǎng)數(shù)據(jù)的低秩與時(shí)空參關(guān)聯(lián)特性可為數(shù)據(jù)重構(gòu)提供關(guān)聯(lián)約束,從而進(jìn)一步提高預(yù)測(cè)的準(zhǔn)確性。由農(nóng)業(yè)物聯(lián)網(wǎng)數(shù)據(jù)的低軼特性可知農(nóng)業(yè)物聯(lián)網(wǎng)數(shù)據(jù)矩陣滿足
式中和均為酉矩陣,則優(yōu)化目標(biāo)可轉(zhuǎn)化為找到一個(gè)滿足式(3)的T的使得
式中、均為初等矩陣。
進(jìn)一步,考慮到農(nóng)業(yè)生產(chǎn)環(huán)境的漸變特性,其參數(shù)在時(shí)間與空間的變化相對(duì)穩(wěn)定,即在相鄰的時(shí)間點(diǎn)或相鄰節(jié)點(diǎn)間環(huán)境參數(shù)變化較小[31-32],本文通過數(shù)據(jù)序列中相鄰節(jié)點(diǎn)間的歸一化偏差描述數(shù)據(jù)序列的穩(wěn)定性。以環(huán)境溫濕度數(shù)據(jù)為例,其時(shí)間序列相鄰數(shù)據(jù)的歸一化偏差占比如圖3所示,從圖中可以看出,環(huán)境溫度序列的相鄰節(jié)點(diǎn)歸一化偏差小于0.02的占比超過60%,歸一化偏小于0.04的占比超過90%;對(duì)于環(huán)境濕度序列,相鄰節(jié)點(diǎn)歸一化偏差小于0.02的占比超過95%,由此可以看出農(nóng)業(yè)環(huán)境數(shù)據(jù)序列具有較高的穩(wěn)定性。
基于農(nóng)業(yè)環(huán)境數(shù)據(jù)的數(shù)據(jù)序列穩(wěn)定性特征,本文采用(011)矩陣作為時(shí)域穩(wěn)定性約束,時(shí)間約束矩陣捕獲時(shí)間穩(wěn)定性特征,限定了2個(gè)連續(xù)時(shí)隙之間的變化在一小范圍內(nèi)。時(shí)間約束矩陣定義如下。
空間約束矩陣捕獲空間相似性特征,它表征了網(wǎng)絡(luò)中一跳鄰居節(jié)點(diǎn)之間值的關(guān)聯(lián)約束。*為行歸一化的得到,定義為
式中N為或的領(lǐng)居節(jié)點(diǎn)數(shù)量。
將農(nóng)業(yè)物聯(lián)網(wǎng)時(shí)間與空間約束矩陣代入式(12)可得
式中為約束均衡系數(shù)。
將式(15)、(11)代入式(10),則可得到基于農(nóng)業(yè)環(huán)境時(shí)空約束的正則化回歸KNN方法。
本文在MATLAB環(huán)境下對(duì)算法的性能進(jìn)行驗(yàn)證,選取某溫室的環(huán)境數(shù)據(jù)作為數(shù)據(jù)建模樣本,采用交叉驗(yàn)證方法,其中訓(xùn)練集與測(cè)試集之比為4:1。對(duì)測(cè)試集數(shù)據(jù)采用數(shù)據(jù)丟失模型處理后作為重構(gòu)算法的觀測(cè)數(shù)據(jù)矩陣,利用不同算法從矩陣中恢復(fù)得出。對(duì)于回歸的評(píng)價(jià)指標(biāo)本文選用文獻(xiàn)[2]中重構(gòu)誤差率(error ratio,ER),定義如下[2]
其中()=0表示只考慮數(shù)據(jù)丟失條件下的重構(gòu)誤差。
因農(nóng)業(yè)物聯(lián)網(wǎng)中節(jié)點(diǎn)碰撞、網(wǎng)絡(luò)擁塞等問題多發(fā),易出現(xiàn)短時(shí)間內(nèi)的高數(shù)據(jù)丟失率現(xiàn)象,為充分體現(xiàn)高丟失率部分的性能變化趨勢(shì),驗(yàn)證中數(shù)據(jù)丟失率取值范圍取10%~90%。
在單點(diǎn)隨機(jī)丟失模型情形下,通過改變丟失數(shù)據(jù)的比例得出不同算法對(duì)該模型下的數(shù)據(jù)重構(gòu)結(jié)果,如圖4所示。其中圖4a是農(nóng)業(yè)環(huán)境溫度數(shù)據(jù)重構(gòu)結(jié)果,當(dāng)數(shù)據(jù)丟失率為10%時(shí)4種算法的重構(gòu)誤差率均很小,約在1%以內(nèi),隨著數(shù)據(jù)丟失率的提高,數(shù)據(jù)重構(gòu)的誤差率也隨之升高,其中KNN的誤差曲線上升最快,KNN-inverse次之,KNN-RP的誤差曲線上升最慢。當(dāng)數(shù)據(jù)丟失率增加至40%~50%左右時(shí),不同算法的重構(gòu)誤差有了較明顯的區(qū)別,后續(xù)隨數(shù)據(jù)丟失率的增加,不同算法間性能進(jìn)一步擴(kuò)大,90%數(shù)據(jù)丟失率時(shí),重構(gòu)誤差率為KNN約70%,KNN-inverse約55%,DT約35%,KNN-RP約20%。圖4b是農(nóng)業(yè)環(huán)境濕度數(shù)據(jù)重構(gòu)結(jié)果,總體趨勢(shì)與圖4a類似,與之不同之處在于在低數(shù)據(jù)丟失率時(shí)重構(gòu)誤差率較環(huán)境溫度的偏高,而高數(shù)據(jù)誤差率時(shí)則與環(huán)境溫度的結(jié)果相仿。環(huán)境濕度90%數(shù)據(jù)丟失率時(shí)重構(gòu)誤差為KNN約80%,KNN-inverse約50%,DT約35%,KNN-RP約18%。DT算法在數(shù)據(jù)丟失率50%~60%附近出現(xiàn)了重構(gòu)誤差率的明顯提升,而在60%以上重構(gòu)誤差增加反而較為平緩。圖4c是農(nóng)業(yè)環(huán)境光照數(shù)據(jù)重構(gòu)結(jié)果,由于光照數(shù)據(jù)在夜間有較長(zhǎng)時(shí)段為0值,為客觀體現(xiàn)算法重構(gòu)性能,在構(gòu)建環(huán)境光照數(shù)據(jù)集時(shí)將夜間連續(xù)為0值的數(shù)據(jù)刪除。如圖4c所示,4種算法從低數(shù)據(jù)丟失率(10%)時(shí)的重構(gòu)誤差已有較明顯區(qū)別,10%數(shù)據(jù)丟失率時(shí),重構(gòu)誤差率為KNN約5%,KNN-inverse約14%,DT約8%,KNN-RP約2%,隨著數(shù)據(jù)丟失率的升高,KNN的重構(gòu)誤差率迅速升高,而其他3種算法的重構(gòu)誤差率則上升較緩,甚至DT與KNN-RP算法在數(shù)據(jù)丟失率60%~70%附近還出現(xiàn)了重構(gòu)誤差的略微下降。環(huán)境光照數(shù)據(jù)90%丟失率時(shí)重構(gòu)誤差為KNN約80%,KNN-inverse約50%,DT約35%,KNN-RP約20%。
圖4 單點(diǎn)隨機(jī)丟失模型下不同算法的重構(gòu)誤差對(duì)比
總體上看在單點(diǎn)隨機(jī)丟失模型下,隨著數(shù)據(jù)丟失率的提高,不同算法的數(shù)據(jù)重構(gòu)誤差也隨之升高;在高數(shù)據(jù)丟失率部分KNN的性能最差,KNN-RP的性能最好。但在低數(shù)據(jù)丟失率部分,不同算法在不同環(huán)境參數(shù)下的重構(gòu)性能有一定區(qū)別。分析其可能原因是,對(duì)于曲線較為平滑的環(huán)境參數(shù)采用最鄰近方法的回歸性能較好,而對(duì)于局部變化頻繁的環(huán)境參數(shù)則最鄰近方法會(huì)增加不確定性。
按照3.1中同樣方法對(duì)塊隨機(jī)丟失模型情形進(jìn)行仿真,由于丟失塊的長(zhǎng)度與位置是隨機(jī)的,則改變整體數(shù)據(jù)丟失率對(duì)不同算法在塊隨機(jī)丟失模型下的重構(gòu)性能進(jìn)行對(duì)比,如圖5所示。其中圖5a是環(huán)境溫度數(shù)據(jù)重構(gòu)結(jié)果,當(dāng)數(shù)據(jù)丟失率為10%時(shí)4種算法的重構(gòu)誤差率均小于10%,隨著數(shù)據(jù)丟失率的提高,數(shù)據(jù)重構(gòu)的誤差也隨之升高,其中KNN的誤差率上升最快,KNN-inverse次之,DT算法的誤差率上升最慢。當(dāng)數(shù)據(jù)丟失率增加到90%時(shí),不同算法的重構(gòu)誤差率為,KNN約90%,KNN-inverse和KNN-RP相當(dāng),約為70%,DT約60%,KNN-RP約20%。圖5b是環(huán)境濕度數(shù)據(jù)重構(gòu)結(jié)果,4種算法在重構(gòu)誤差率均隨數(shù)據(jù)丟失率的增加單調(diào)遞增,環(huán)境濕度數(shù)據(jù)丟失率為10%時(shí),KNN-inverse和KNN-RP算法的重構(gòu)誤差率約為10%,而KNN和DT算法的重構(gòu)誤差率均小于10%;環(huán)境濕度數(shù)據(jù)90%丟失率時(shí)重構(gòu)誤差率分別為,KNN約90%,KNN-inverse約72%,DT約58%,KNN-RP約61%。整體上看,對(duì)于塊隨機(jī)丟失模型下的環(huán)境濕度數(shù)據(jù)而言,KNN的重構(gòu)誤差率最高,KNN-inverse次之,KNN-RP與DT算法性能相當(dāng),KNN-RP重構(gòu)誤差率略高于DT算法。圖5c是環(huán)境光照數(shù)據(jù)重構(gòu)結(jié)果,對(duì)于光照數(shù)據(jù)0值處理方式同3.1節(jié)單點(diǎn)丟失模型,光照數(shù)據(jù)丟失率10%時(shí),重構(gòu)誤差率最高的KNN-inverse接近20%,其次是DT算法約10%,再是KNN約5%,KNN-RP略低于KNN;隨著數(shù)據(jù)丟失率的升高,KNN與KNN-RP的重構(gòu)誤差率呈明顯單調(diào)遞增趨勢(shì),KNN-inverse的重構(gòu)誤差率呈階梯狀變化,DT算法在丟失率大于50%后重構(gòu)誤差率出現(xiàn)波動(dòng)變化,光照數(shù)據(jù)90%丟失率時(shí)重構(gòu)誤差為KNN約78%,KNN-inverse約47%,DT約36%,KNN-RP約60%。整體上,對(duì)于同樣的數(shù)據(jù)丟失率情形,塊丟失的數(shù)據(jù)重構(gòu)性能要低于單點(diǎn)隨機(jī)丟失情形,由于KNN及其改進(jìn)算法較為依賴關(guān)聯(lián)度最高的節(jié)點(diǎn)信息,而塊丟失會(huì)導(dǎo)致最高關(guān)聯(lián)數(shù)據(jù)點(diǎn)缺失的比例提高。而DT算法在塊隨機(jī)丟失場(chǎng)景下表現(xiàn)的性能要優(yōu)于其他3種算法。從算法復(fù)雜度的角度而言,KNN算法的時(shí)間復(fù)雜度為()[3,5,8],KNN-inverse只是改變了KNN算法的距離計(jì)算方法,時(shí)間復(fù)雜度也為(),DT算法通過增量計(jì)算提高精度,其時(shí)間復(fù)雜度也隨之提高為(lg)[2-5],對(duì)于KNN-RP算法而言,由于式(10)中存在矩陣相乘、求逆運(yùn)算,所以其算法復(fù)雜度應(yīng)為(3)。
圖5 塊隨機(jī)丟失模型下不同算法的重構(gòu)誤差對(duì)比
為進(jìn)一步分析值選擇對(duì)于算法重構(gòu)性能的影響,本文以KNN-RP算法在單點(diǎn)隨機(jī)丟失模型下數(shù)據(jù)丟失率為40%情形下為例,通過改變值對(duì)比數(shù)據(jù)重構(gòu)的誤差。如圖6所示,KNN-RP算法的重構(gòu)誤差隨值的增大呈現(xiàn)先減小后增大的趨勢(shì)。
圖6 單點(diǎn)隨機(jī)丟失模型40%數(shù)據(jù)丟失率時(shí)K值對(duì)KNN-RP算法重構(gòu)誤差的影響
對(duì)于環(huán)境溫度數(shù)據(jù),重構(gòu)誤差總體隨值變化較小,在取2時(shí)算法重構(gòu)誤差約為7%,重構(gòu)誤差隨值增大而下降,當(dāng)取8時(shí)誤差降到最低約4%,隨后重構(gòu)誤差隨值增大而逐漸增大,當(dāng)取14時(shí)誤差增大為約5%。對(duì)于環(huán)境濕度數(shù)據(jù),在取2時(shí)算法重構(gòu)誤差約為19%,重構(gòu)誤差隨值增大而下降,當(dāng)取8時(shí)誤差降到最低約6%,隨后重構(gòu)誤差隨值增大基本穩(wěn)定,略有增加。對(duì)于環(huán)境光照數(shù)據(jù),在取2時(shí)算法重構(gòu)誤差約為8%,重構(gòu)誤差隨值增大而下降,當(dāng)取6時(shí)誤差降到最低約4%,隨后重構(gòu)誤差隨值增大而迅速增大,當(dāng)取14時(shí)誤差增大為約15%。從結(jié)果中可以看出,的取值對(duì)于KNN-RP算法有顯著影響,對(duì)于數(shù)據(jù)變化較穩(wěn)定的環(huán)境溫度而言,重構(gòu)誤差受值的影響相對(duì)較小,而對(duì)于變化較明顯的環(huán)境濕度與光照數(shù)據(jù)而言,重構(gòu)誤差受值的影響相對(duì)較大,總體上對(duì)于溫度、濕度和光照3種環(huán)境參數(shù)而言,最優(yōu)的值在6~8之間。
本文提出一種基于正則化懲罰的KNN重構(gòu)方法,利用農(nóng)業(yè)物聯(lián)網(wǎng)數(shù)據(jù)的時(shí)空穩(wěn)定性與相關(guān)性等特點(diǎn)建立關(guān)聯(lián)約束對(duì)損失函數(shù)進(jìn)行修正。通過對(duì)農(nóng)業(yè)物聯(lián)網(wǎng)監(jiān)測(cè)數(shù)據(jù)在不同數(shù)據(jù)丟失模型下的進(jìn)行了重構(gòu)測(cè)試。試驗(yàn)結(jié)果證明,本文方法對(duì)于單點(diǎn)隨機(jī)丟失模型的重構(gòu)性能較好,而對(duì)于塊隨機(jī)丟失模型高數(shù)據(jù)丟失率情形其重構(gòu)性能低于DT算法,整體上本文方法具有較高的準(zhǔn)確性和穩(wěn)定性,實(shí)現(xiàn)了農(nóng)業(yè)復(fù)雜環(huán)境下物聯(lián)網(wǎng)異常數(shù)據(jù)的有效重構(gòu),提高了數(shù)據(jù)質(zhì)量與可信度。
對(duì)于塊丟失模型中丟失率60%以上時(shí)KNN-RP重構(gòu)性能不佳的問題,后續(xù)考慮結(jié)合長(zhǎng)短期記憶模型,保證數(shù)據(jù)間關(guān)聯(lián)約束的穩(wěn)定。而且,本文中對(duì)于異常數(shù)據(jù)僅考慮了數(shù)據(jù)缺失與錯(cuò)誤的類型,并未將數(shù)據(jù)噪聲納入分析,后續(xù)考慮引入數(shù)據(jù)噪聲使得結(jié)果更加接近于實(shí)際數(shù)據(jù)環(huán)境,此外考慮通過矩陣降維等方式降低KNN-RP算法的時(shí)間復(fù)雜度也是后續(xù)需要研究解決的問題之一。
[1] Jesus G, Casimiro A, Oliveira A. A survey on data quality for dependable monitoring in wireless sensor networks[J]. Sensors, 2017, 17(9): 2010.
[2] Kong L, Xia M, Liu X Y, et al. Data loss and reconstruction in wireless sensor networks[J]. IEEE Transactions on Parallel & Distributed Systems, 2014, 25(11): 2818-2828.
[3] 段青玲,肖曉琰,劉怡然,等.基于改進(jìn)型支持度函數(shù)的畜禽養(yǎng)殖物聯(lián)網(wǎng)數(shù)據(jù)融合方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2017,33(增刊1):239-245.
Duan Qingling, Xiao Xiaoyan, Liu Yiran, et al. Data fusion method of livestock and poultry breeding internet of things based on improved support function[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(Supp.1): 239-245. (in Chinese with English abstract)
[4] Chen S, Zhao C, Wu M, et al. Compressive network coding for wireless sensor networks: Spatio-temporal coding and optimization design[J]. Computer Networks, 2016, 108: 345-356.
[5] Sun J, Yu Y, Wen J. Compressed-sensing reconstruction based on block sparse bayesian learning in bearing- condition monitoring[J]. Sensors, 2017, 17(6): 1454.
[6] Wu H, Suo M, Wang J, et al. A holistic approach to reconstruct data in ocean sensor network using compression sensing[J]. IEEE Access, 2018, 6(99): 280-286.
[7] Jayawardhana M, Zhu X, Liyanapathirana R, et al. Compressive sensing for efficient health monitoring and effective damage detection of structures[J]. Mechanical Systems & Signal Processing, 2017, 84: 414-430.
[8] 潘立強(qiáng),李建中,駱吉洲. 傳感器網(wǎng)絡(luò)中一種基于時(shí)-空相關(guān)性的缺失值估計(jì)算法[J]. 計(jì)算機(jī)學(xué)報(bào),2010,33(1):1-11.
Pan Liqiang, Li Jianzhong, Luo Jizhou. A temporaland spatial correlation based missing values imputational gorithm in wireless sensor networks[J]. Chinese Journal of Computers, 2010, 33(1): 1-11. (in Chinese with English abstract)
[9] Eldar Y C, Kuppinger P, Bolcskei H. Block-sparse signals: uncertainty relations and efficient recovery[J]. IEEE Transactions on Signal Processing, 2010, 58(6): 3042-3054.
[10] Morell A, Correa A, Barceló M, et al. Data aggregation and principal component analysis in WSNs[J]. IEEE Transactions on Wireless Communications, 2016, 15(6): 3908-3919.
[11] Ghazanfari-Rad S, Labeau F. Formulation and analysis of lms adaptive networks for distributed estimation in the presence of transmission errors[J]. IEEE Internet of Things Journal, 2017, 3(2): 146-160.
[12] Tan L, Wu M. Data reduction in wireless sensor networks: A hierarchical LMS prediction approach[J]. IEEE Sensors Journal, 2016, 16(6): 1708-1715.
[13] Argyriou A, ?zgü Alay. Distributed estimation in wireless sensor networks with an interference canceling fusion center[J]. IEEE Transactions on Wireless Communications, 2016, 15(3): 2205-2214.
[14] Wu M, Tan L, Xiong N. Data Prediction, Compression, and Recovery in Clustered Wireless Sensor Networks for Environmental Monitoring Applications[M]. New York: Elsevier Science Inc. 2016.
[15] Miranda K, Ramos V. Improving data aggregation in wireless sensor networks with time series estimation[J]. IEEE Latin America Transactions, 2016, 14(5): 2425-2432.
[16] 江冰,毛天,唐大衛(wèi),等.基于農(nóng)田無(wú)線傳感網(wǎng)絡(luò)的分簇路由算法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2017,33(16):182-187.
Jiang Bing, Mao Tian, Tang Dawei, et al. Clustering routing algorithm based on farmland wireless sensor network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(16): 182-187. (in Chinese with English abstract)
[17] Xu X. Data Approximation for time series data in wireless sensor networks[J]. International Journal of Data Warehousing and Mining, 2016, 12(3): 1-13.
[18] Morell A, Correa A, Barceló M, et al. Data aggregation and principal component analysis in WSNs[J]. IEEE Transactions on Wireless Communications, 2016, 15(6): 3908-3919.
[19] Panigrahi T, Panda M, Panda G. Fault tolerant distributed estimation in wireless sensor networks[J]. Journal of Network & Computer Applications, 2016, 69(C): 27-39.
[20] Li X, Tao X, Mao G. Unbalanced expander based compressive data gathering in clustered wireless sensor networks[J]. IEEE Access, 2017, 5(99): 7553-7566.
[21] Yan W, Dong Y, Zhang S, et al. An optimal CDG framework for energy efficient WSNs[J]. Chinese Journal of Electronics, 2017, 26(1): 137-144.
[22] Klis R, Chatzi E N. Vibration monitoring via spectro-temporal compressive sensing for wireless sensor networks[J]. Structure & Infrastructure Engineering, 2016, 13(1): 195-209.
[23] Chen X, Yin X, Yu B, et al. Communication channel reconstruction for transmission line differential protection: System arrangement and routing protocol[J]. Energies, 2016, 9(12): 893.
[24] Wang T Y, Yang M H, Wu J Y. Distributed detection of dynamic event regions in sensor networks with a gibbs field distribution and gaussian corrupted measurements[J]. IEEE Transactions on Communications, 2016, 64(9): 3932-3945.
[25] Zhu X F, Huang Z, Yang Y, et al. Self-tau-ght dimensionality reduction on the high-dimensional small-sized data[J]. Pattern Recognition, 2013, 46(1): 215-229.
[26] 龔永紅,宗鳴,朱永華,等. 基于混合模重構(gòu)的kNN回歸[J]. 計(jì)算機(jī)應(yīng)用與軟件,2016(2):232-236.
Gong Yonghong, Zong Ming, Zhu Yonghua, et al. Knn regression based on mixed-norm reconstruction[J]. Computer Applications & Software, 2016(2): 232-236. (in Chinese with English abstract)
[27] Geeta D D, Nalini N, Biradar R C. Fault tolerance in wireless sensor network using hand-off and dynamic power adjustment approach[J]. Journal of Network & Computer Applications, 2013, 36(4): 1174-1185.
[28] Qaisar S, Bilal R M, Iqbal W, et al. Compressive sensing: From theory to applications, a survey[J]. Journal of Communications & Networks, 2013, 15(5): 443-456.
[29] Park H, Kim B S, Kim K H, et al. A tree based broadcast scheme for (m,k)-firm real-time stream in wireless sensor networks[J]. Sensors, 2017, 17(11): 2578.
[30] Park J, Bok K, Seong D, et al. A data gathering method based on a mobile sink for minimizing the data loss in wireless sensor networks[J]. International Journal of Distributed Sensor Networks, 2014, 2014(5): 242.
[31] Nguyen N T, Pham V T, Pham V T, et al. On maximizing the lifetime for data aggregation in wireless sensor networks using virtual data aggregation trees[J]. Computer Networks the International Journal of Computer & Telecommunications Networking, 2016, 105(C): 99-110.
[32] Zhu L, Huang Z, Liu Y, et al. The Nonparametric Bayesian dictionary learning based interpolation method for WSNs missing data[J]. AEU-International Journal of Electronics and Communications, 2017, 79: 267-274.
Agricultural internet of things data reconstruction based on K-nearest neighbor reconstruction algorithm improved by regularization penalty and spatio-temporal constraints
Wu Huarui1,2, Li Qingxue1,2,Miao Yisheng1,2, Song Yuling3
(1.100097; 2.100097; 3.712100,)
The internet of things (IoT) technology has been widely applied in the agriculture production monitoring. Accurate decision-making and environment regulation can be made based on monitoring results. However, data loss in agriculture wireless sensor networks is common due to noise, collision, unreliable link, and unexpected damage, which greatly reduces the quality of data acquisition and then affects the results of decision analysis. In order to solve this problem, this paper proposed a data reconstruction method based on K nearest neighbor with regularization penalty constraints (KNN-RP). Firstly, the ridge regression method was used in order to regularize the least square factor. Secondly, there was a problem that it is difficult to get a unique solution due to the algorithmic error while the data matrix is not full-column rank. This could be improved by introducing a penalty term into the method. The combination of 1-norm and 2-norm could ensure the sparsity of the matrix as well as prevent the loss function from over-fitting. It is suitable for high-dimensional agricultural WSN (wireless sensor network) data reconstruction with high noise. Furthermore, the definition of time and space constraint matrix was determined according to the temporal and spatial stability of perceptual data in agricultural IoT. Finally, thevalue was determined by model training to achieve the better reconstruction performance. A cross-validate experiment was done to evaluate the algorithm performance according to the greenhouse data samples. KNN (K nearest neighbor), KNN-inverse and DT (delaunay triangulation) algorithms were chosen for the performance comparison. In the element random loss case, the overall reconstruction error rate of the 4 algorithms increased with the increasing of data loss rate. The KNN and KNN-inverse had higher error rate when the data loss rate above 60% compared with the other 2 algorithms. Besides, the performance of KNN-RP was superior to the DT algorithm in both high and low data loss rates. In the block loss case, the reconstruction error rates of the 4 algorithms were close to the element random loss case, but reconstruction error rates increased faster than the element random loss case while the data loss rate increased. In the block loss case, the overall performance of KNN-RP was better than KNN and KNN-inverse, but lower than that of DT algorithm when the data loss rate was above 60%. Thevalue had a significant influence on the performance of KNN-RP. The reconstruction error of KNN-RP decreased first and then increased with the increasing ofvalue. For the stable parameter like temperature, the reconstruction error rate was less affected byvalue. On the contrast, the reconstruction error rates of humidity and lightness data were more affected byvalue. The reason maybe the humidity and lightness data changed faster than temperature. Considering all 3 parameters, temperature, humidity and lightness, the optimalvalue was between 6 and 8. In summary, KNN-RP algorithm could effectively reconstruct the missing errors in the agricultural IoT, especially in element random loss case. The proposed algorithm improves the quality of perceptual data in agricultural IoT monitoring and may provide reference for agricultural production decision-making.
algorithms; models; agricultural internet of things; data reconfiguration; cluster regression
2018-11-29
2019-06-20
國(guó)家自然科學(xué)基金項(xiàng)目(61871041, 61571051);北京市自然科學(xué)基金項(xiàng)目(4172024, 4172026);農(nóng)業(yè)農(nóng)村部農(nóng)業(yè)物聯(lián)網(wǎng)重點(diǎn)實(shí)驗(yàn)室開放課題(2018AIOT-06)
吳華瑞,研究員,主要從事農(nóng)業(yè)智能系統(tǒng)與物聯(lián)網(wǎng)研究。Email:wuhr@nercita.org.cn
李慶學(xué),助理研究員,主要從事農(nóng)業(yè)物聯(lián)網(wǎng)與智能系統(tǒng)研究。Email:liqx@nercita.org.cn
10.11975/j.issn.1002-6819.2019.14.023
TN919
A
1002-6819(2019)-14-0183-07
吳華瑞,李慶學(xué),繆祎晟,宋玉玲.基于正則化與時(shí)空約束改進(jìn)K最近鄰算法的農(nóng)業(yè)物聯(lián)網(wǎng)數(shù)據(jù)重構(gòu)[J]. 農(nóng)業(yè)工程學(xué)報(bào),2019,35(14):183-189. doi:10.11975/j.issn.1002-6819.2019.14.023 http://www.tcsae.org
Wu Huarui, Li Qingxue, Miao Yisheng, Song Yuling. Agricultural internet of things data reconstruction based on K-nearest neighbor reconstruction algorithm improved by regularization penalty and spatio-temporal constraints[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(14): 183-189. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2019.14.023 http://www.tcsae.org