王麗愛,周旭東,朱新開,郭文善※(.揚(yáng)州大學(xué)江蘇省作物遺傳生理重點(diǎn)實(shí)驗(yàn)室,揚(yáng)州 5009; . 揚(yáng)州大學(xué)信息工程學(xué)院,揚(yáng)州 57)
?
基于HJ-CCD數(shù)據(jù)和隨機(jī)森林算法的小麥葉面積指數(shù)反演
王麗愛1,周旭東2,朱新開1,郭文善1※
(1.揚(yáng)州大學(xué)江蘇省作物遺傳生理重點(diǎn)實(shí)驗(yàn)室,揚(yáng)州 225009;2. 揚(yáng)州大學(xué)信息工程學(xué)院,揚(yáng)州 225127)
摘要:為給小麥長(zhǎng)勢(shì)的遙感監(jiān)測(cè)提供技術(shù)支持,該文運(yùn)用隨機(jī)森林回歸(RF,random forest)算法建立小麥葉面積指數(shù)(LAI)遙感反演模型。首先基于2010-2013年江蘇地區(qū)小麥環(huán)境減災(zāi)衛(wèi)星HJ-CCD的影像數(shù)據(jù),提取拔節(jié)、孕穗和開花3個(gè)生育期的衛(wèi)星植被指數(shù),進(jìn)而根據(jù)各生育期植被指數(shù)和相應(yīng)實(shí)測(cè)LAI數(shù)據(jù),利用RF算法構(gòu)建各期小麥LAI反演模型,并以人工神經(jīng)網(wǎng)絡(luò)(ANN,artificial neural network)模型為參比模型進(jìn)行預(yù)測(cè)精度的比較。結(jié)果表明:RF算法模型在3個(gè)生育期的預(yù)測(cè)結(jié)果均好于同期的ANN模型。拔節(jié)、孕穗和開花3個(gè)生育期RF模型預(yù)測(cè)值與地面實(shí)測(cè)值的R2分別為0.79,0.67和0.59,對(duì)應(yīng)的RMSE分別為0.57,0.90和0.78;ANN模型的R2分別為0.67,0.31和0.30,對(duì)應(yīng)的RMSE分別為0.82,1.94和1.43。該研究結(jié)果為提高大田尺度下的小麥LAI遙感預(yù)測(cè)精度提供了技術(shù)和方法。
關(guān)鍵詞:植被;神經(jīng)網(wǎng)絡(luò);算法;隨機(jī)森林;機(jī)器學(xué)習(xí);葉面積指數(shù);小麥
王麗愛,周旭東,朱新開,郭文善. 基于HJ-CCD數(shù)據(jù)和隨機(jī)森林算法的小麥葉面積指數(shù)反演[J]. 農(nóng)業(yè)工程學(xué)報(bào),2016,32(3):149-154.doi:10.11975/j.issn.1002-6819.2016.03.021http://www.tcsae.org
Wang Liai, Zhou Xudong, Zhu Xinkai, Guo Wenshan. Inverting wheat leaf area index based on HJ-CCD remote sensing data and random forest algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2016, 32(3): 149-154. (in Chinese with English abstract)doi:10.11975/j.issn.1002-6819.2016.03.021http://www.tcsae.org
葉面積指數(shù)(LAI,leaf area index)能夠反映植被長(zhǎng)勢(shì)個(gè)體特征和群體特征,是農(nóng)作物長(zhǎng)勢(shì)監(jiān)測(cè)的一個(gè)關(guān)鍵生態(tài)參數(shù)[1]。近年來,隨著遙感技術(shù)在農(nóng)業(yè)領(lǐng)域的應(yīng)用,眾多學(xué)者已對(duì)遙感反演農(nóng)作物L(fēng)AI展開深入研究[2-5]。其中基于植被指數(shù)反演LAI是非常重要的研究方向[5-10]。Tavakoli等[5]研究表明基于RGB的一些指數(shù)與小麥LAI有很好的相關(guān)性,可以使用數(shù)碼相機(jī)估測(cè)作物L(fēng)AI。趙娟等[6]研究表明由ASD光譜儀數(shù)據(jù)所提取的RVI(ratio vegetation index)適于反演所研究地區(qū)冬小麥生長(zhǎng)中期(拔節(jié)到抽穗前)的LAI,NDVI(normalized difference vegetation index)適于反演生長(zhǎng)后期(抽穗到成熟期)的LAI。植被指數(shù)可基于不同的遙感數(shù)據(jù)源提取,前人對(duì)基于衛(wèi)星數(shù)據(jù)的植被指數(shù)遙感反演LAI也進(jìn)行了研究:何亞娟等[7]利用SPOT數(shù)據(jù),構(gòu)建了基于NDVI的二次函數(shù)模型反演甘蔗全生育期的LAI;Liu等[8]分別提取小麥、玉米和大豆的4種Landsat5/7衛(wèi)星植被指數(shù),對(duì)比研究這些指數(shù)反演各作物L(fēng)AI的精度,指出反演能力最好的指數(shù)是EVI(enhanced vegetation index);郭琳等[9]基于中國(guó)自主研發(fā)的環(huán)境減災(zāi)衛(wèi)星HJ-CCD數(shù)據(jù),通過支持向量機(jī)方法建立NDVI指數(shù)與LAI的關(guān)系反演甘蔗LAI值;陳雪洋等[10]比較了4種HJ-CCD植被指數(shù)與冬小麥LAI的關(guān)系,確定反演LAI的最優(yōu)指數(shù)為RVI。已有研究多基于單個(gè)植被指數(shù)反演作物L(fēng)AI,而單一植被指數(shù)存在不同程度的飽和性,且每種指數(shù)只能包含部分波段的信息。為此,鑒于人工神經(jīng)網(wǎng)絡(luò)(ANN,artificial neural network)算法能同時(shí)利用多個(gè)植被指數(shù),并能很好地?cái)M合非線性問題,近年來,該算法已被廣泛應(yīng)用于構(gòu)建農(nóng)學(xué)參數(shù)遙感反演模型[11-13]。雖然ANN模型有一定的預(yù)測(cè)精度,但其模型參數(shù)過多,構(gòu)建模型復(fù)雜。
類似于ANN,新興的隨機(jī)森林(RF,random forest)也是一種多因子機(jī)器學(xué)習(xí)算法,可以利用多個(gè)植被指數(shù)。作為目前最精確預(yù)測(cè)方法之一,RF已廣泛應(yīng)用于遙感領(lǐng)域的分類問題[14-16],取得了優(yōu)于ANN的性能,并且模型構(gòu)建過程比ANN簡(jiǎn)單。但迄今為止,僅有少量文獻(xiàn)報(bào)道該算法在遙感監(jiān)測(cè)預(yù)報(bào)方面的應(yīng)用[17-18],尤其據(jù)我們所知,尚無基于RF算法遙感反演小麥LAI的相關(guān)研究。鑒于上述,本文首次使用RF算法并結(jié)合多個(gè)植被指數(shù)構(gòu)建小麥LAI遙感反演多因子模型,旨在為提高大田尺度下遙感定量反演小麥LAI的精度提供新技術(shù)。
結(jié)合長(zhǎng)江中下游地區(qū)小麥栽培實(shí)際,本文基于2010 -2013年際間田間試驗(yàn)數(shù)據(jù)和HJ-CCD影像數(shù)據(jù),提取拔節(jié)、孕穗和開花3生育期的小麥實(shí)測(cè)LAI和相應(yīng)時(shí)期的15個(gè)遙感植被指數(shù);進(jìn)而以小麥LAI值為因變量,以植被指數(shù)為自變量,利用RF構(gòu)建3個(gè)生育期各自的LAI值遙感反演模型。在試驗(yàn)中,將各期模型反演的LAI值與地面實(shí)測(cè)LAI值進(jìn)行擬合,采用決定系數(shù)(R2)和均方根誤差(RMSE)進(jìn)行精度檢驗(yàn),并與ANN模型進(jìn)行精度比較。
1.1研究區(qū)
本研究2010-2013年試驗(yàn)在江蘇省開展,表1所示為每年選擇的試驗(yàn)區(qū),其均為江蘇省冬小麥主產(chǎn)區(qū)。
表1 本研究試驗(yàn)區(qū)Table 1 Test regions in this study
1.2LAI數(shù)據(jù)獲取
試驗(yàn)區(qū)栽培的小麥品種為揚(yáng)麥13號(hào)、揚(yáng)麥15號(hào)、揚(yáng)麥16號(hào)和揚(yáng)輻麥2號(hào)。取樣時(shí)期分別為小麥拔節(jié)、孕穗和開花期。在每縣設(shè)置有代表性的樣點(diǎn)15~20個(gè),每個(gè)樣點(diǎn)設(shè)定取樣面積為50 cm×4行(行距15~20 cm),于小麥的各生育期取長(zhǎng)勢(shì)均勻的植株15株密封帶回實(shí)驗(yàn)室用比葉重法測(cè)定LAI。同時(shí)采用美國(guó)Trimble公司生產(chǎn)的Juno ST 手持式GPS進(jìn)行定位,獲取每個(gè)采樣點(diǎn)的經(jīng)度值和緯度值。從中國(guó)資源衛(wèi)星應(yīng)用中心網(wǎng)站下載分別與小麥拔節(jié)、孕穗和開花期準(zhǔn)同步的HJ-CCD 影像數(shù)據(jù)。
將各生育期4 a的數(shù)據(jù)集合起來,隨機(jī)分成2部分(75%和25%),75%部分作為訓(xùn)練樣本建立模型,25%部分作為測(cè)試樣本評(píng)價(jià)模型。拔節(jié)、孕穗和開花期訓(xùn)練樣本數(shù)分別為174、174和147個(gè);3期的測(cè)試樣本數(shù)則分別為58、58和49個(gè)。
1.3影像數(shù)據(jù)及預(yù)處理
本文使用的遙感數(shù)據(jù)來自于中國(guó)自主研制的環(huán)境和災(zāi)害監(jiān)測(cè)預(yù)報(bào)小衛(wèi)星系統(tǒng),它包括HJ-A和HJ-B 2顆衛(wèi)星,每顆星都裝載了空間分辨率為30 m的CCD (charge-coupled device)相機(jī),包括4個(gè)波段:藍(lán)光B1(430~520 nm),綠光B2(520~600 nm),紅光B3(630~690 nm)和近紅外光B4(760~900 nm)。
所有影像都經(jīng)過輻射定標(biāo)、大氣校正和幾何校正。輻射定標(biāo)是將所有影像通過利用HJ星CCD相機(jī)的輻射定標(biāo)參數(shù)從DN值轉(zhuǎn)化為輻亮度圖像;大氣校正是運(yùn)用ENVI4.7軟件的FLAASH模塊進(jìn)行;幾何校正是先參照江蘇地區(qū)1∶100 000地形圖進(jìn)行影像粗校正,再進(jìn)一步利用地面實(shí)測(cè)的GPS控制點(diǎn)精校正,使影像的精度能夠小于1個(gè)像元。
2.1植被指數(shù)
農(nóng)作物L(fēng)AI值與植被光譜的可見/近紅外波段存在很強(qiáng)的相關(guān)性[19]。在敏感反映LAI的同時(shí),為了削弱環(huán)境因素的干擾,可利用這些特征波段構(gòu)建的植被指數(shù)估測(cè)LAI。本研究基于HJ-CCD相機(jī)的4個(gè)波段構(gòu)建了15個(gè)已得到廣泛認(rèn)可且能較好地反演LAI[3, 20-21]的遙感植被指數(shù)(表2)。
表2 遙感植被指數(shù)計(jì)算公式Table 2 Formulas of remote sensing vegetation index
2.2隨機(jī)森林算法
RF是由Breiman于2001年[22]提出的一種集成學(xué)習(xí)方法,該算法組合多棵決策樹以提高單棵分類樹或回歸樹的性能。在RF回歸中,一棵決策樹代表一組約束條件,這些條件被分層組織并先后從樹根應(yīng)用到樹葉。RF算法的主要思想是:通過自助法(bootstrap)從原始樣本集采樣得到構(gòu)建ntree棵樹所需的ntree個(gè)子集;生成每棵樹時(shí),從規(guī)模為p的自變量集合中隨機(jī)選擇mtry個(gè)變量(mtry
2.3人工神經(jīng)網(wǎng)絡(luò)算法
在各種機(jī)器學(xué)習(xí)算法中,ANN是最常用的開發(fā)非線性回歸模型的算法[23]。訓(xùn)練一個(gè)ANN,需要選擇網(wǎng)絡(luò)結(jié)構(gòu)(隱含層數(shù)和每層節(jié)點(diǎn)的數(shù)目)、權(quán)重、學(xué)習(xí)率和訓(xùn)練算法。在本研究中,使用交叉驗(yàn)證法優(yōu)化得到基于Levenberg-Marquardt算法、隱層為tan-sigmoid函數(shù)、輸出層為log-sigmoid函數(shù)的兩層反向傳播神經(jīng)網(wǎng)絡(luò),信號(hào)在該網(wǎng)絡(luò)上的傳播過程包括正向和反向兩部分。算法基本思想是:首先正向地將信號(hào)從輸入層傳播至隱含層進(jìn)行處理得到中間信號(hào),再將此中間信號(hào)傳播至輸出層得到實(shí)際輸出結(jié)果;若該結(jié)果與期望輸出不符,則將其與期望輸出之間的誤差由輸出層沿原來的連接通路向輸入層反向地傳播并做相應(yīng)處理;交替執(zhí)行正向和反向傳播,直到實(shí)際輸出達(dá)到期望輸出,或這種學(xué)習(xí)過程達(dá)到預(yù)先設(shè)定的次數(shù)為止。
3.1小麥LAI反演模型構(gòu)建
分別利用小麥拔節(jié)、孕穗和開花期的訓(xùn)練集,使用RF和ANN算法構(gòu)建小麥LAI遙感反演模型。在各生育期的每個(gè)模型中,表2中的15個(gè)植被指數(shù)作為自變量,小麥LAI作為因變量。在RF模型構(gòu)建中,首先將此算法編制成計(jì)算機(jī)程序,然后分別確定回歸樹數(shù)目ntree及分割節(jié)點(diǎn)所需變量數(shù)目mtry的取值,最后運(yùn)行該程序進(jìn)行建模,得到的模型本身沒有明確的數(shù)學(xué)公式。根據(jù)經(jīng)驗(yàn)及多次試驗(yàn),確定3個(gè)生育期RF算法的ntree均為2000,mtry均為3。RF模型基于袋外數(shù)據(jù)集(OOB,out-of-bag data),顯示了模型中15個(gè)植被指數(shù)的重要性(圖1),有助于幫助理解每個(gè)指數(shù)對(duì)模型的影響力,植被指數(shù)對(duì)應(yīng)的RMSE數(shù)值越大表明此指數(shù)越重要。由圖1所示,拔節(jié)期除EVI的其他14個(gè)指數(shù)對(duì)應(yīng)的RMSE均在0.4左右,表明這14個(gè)指數(shù)對(duì)LAI具有相似的影響力;孕穗期NRI 和MTVI2指數(shù)對(duì)應(yīng)的RMSE明顯高于其他13個(gè)指數(shù)的RMSE,表明它們對(duì)LAI均具有較強(qiáng)的影響力;開花期NRI 和NLI指數(shù)較其他13個(gè)指數(shù)對(duì)LAI的影響力較弱。
圖1 RF模型中估計(jì)LAI的植被指數(shù)重要性Fig.1 Importance of vegetation indices in RF models for estimating LAI
3.2模型評(píng)價(jià)
基于各生育期獨(dú)立于訓(xùn)練集的測(cè)試集,將每個(gè)時(shí)期2個(gè)模型反演的LAI數(shù)據(jù)與相應(yīng)時(shí)期的實(shí)測(cè)數(shù)據(jù)比較,分析不同模型的預(yù)測(cè)精度。本文將模型預(yù)測(cè)值與實(shí)測(cè)值進(jìn)行回歸分析,采用R2和RMSE作為模型的評(píng)價(jià)指標(biāo),并繪制了模型預(yù)測(cè)值與實(shí)測(cè)值的1:1關(guān)系圖,結(jié)果見圖2。3個(gè)生育期的RF算法模型預(yù)測(cè)結(jié)果與同期的ANN模型相比較均表現(xiàn)為最佳:R2比ANN模型依次高出0.12、0.36和0.29,相應(yīng)的RMSE比ANN模型依次低0.25、1.04和0.65。上述比較結(jié)果表明RF算法構(gòu)建小麥LAI反演模型,可行且有很高的監(jiān)測(cè)精度。在本研究中,孕穗期的小麥可能由于幼穗在冠層中占有的比例增加,而開花期小麥的麥芒等在冠層中也已占有一定比例,導(dǎo)致這2個(gè)生育期的RF模型反演精度均低于拔節(jié)期的RF模型精度。
圖2 小麥LAI實(shí)測(cè)值與模型預(yù)測(cè)值關(guān)系圖Fig.2 Relational graph of measured and predicted wheat LAI
3.3討論
遙感獲得的作物冠層光譜反射率可提供作物生長(zhǎng)狀況信息,但易受背景土壤、作物冠層結(jié)構(gòu)、大氣條件等因素影響,因此前人提出使用植被指數(shù)估測(cè)作物農(nóng)學(xué)參數(shù)。以往基于植被指數(shù)的作物L(fēng)AI遙感監(jiān)測(cè):一方面,較少考慮不同物候期對(duì)作物的影響[8-9];另一方面,很少綜合不同年際間的數(shù)據(jù)進(jìn)行建模及驗(yàn)證;再者,多數(shù)研究基于單個(gè)植被指數(shù)進(jìn)行遙感監(jiān)測(cè)[24-25],僅利用單一植被指數(shù),存在不同程度的飽和性且每種指數(shù)只能包含部分波段的信息,可能會(huì)影響模型外推能力[26-27]。本研究同時(shí)涉及到2010-2013年際間小麥的3個(gè)生育期,包括拔節(jié)、孕穗和開花期,針對(duì)每個(gè)生育期,結(jié)合RF算法分別構(gòu)建了以15個(gè)植被指數(shù)為自變量的多因子模型。每個(gè)RF模型顯示了對(duì)反演LAI呈現(xiàn)明顯重要性的植被指數(shù)(圖1):拔節(jié)期為EVI、孕穗期為MTVI2、開花期為MSR。該結(jié)論表明,在估計(jì)作物生理參數(shù)時(shí),不同生長(zhǎng)階段會(huì)影響植被指數(shù)的性能,這與前人研究結(jié)果一致[28-29]。另外,文獻(xiàn)[10]也基于HJ-CCD影像所提取的植被指數(shù)遙感反演了小麥LAI,但是該研究?jī)H基于一個(gè)植被指數(shù)(RVI)反演了冬小麥一個(gè)生育期(抽穗期)的LAI,且建模與驗(yàn)證集均基于一個(gè)年度(2009年)的數(shù)據(jù),模型在時(shí)間維的普適性有待進(jìn)一步驗(yàn)證。
本研究的RF模型顯示出比ANN模型更好的反演結(jié)果,原因在于RF算法是集成學(xué)習(xí)算法,有助于將弱學(xué)習(xí)器組合起來形成強(qiáng)學(xué)習(xí)器,且2個(gè)隨機(jī)性的引入(采用bootstrap法隨機(jī)生成多個(gè)子樣本集;從整個(gè)自變量集合中隨機(jī)選取部分自變量用于分割樹的節(jié)點(diǎn))使得RF具有很好的抗噪聲能力,也不容易陷入過度擬合;而在訓(xùn)練ANN網(wǎng)絡(luò)時(shí),可能會(huì)因?yàn)閷W(xué)習(xí)能力過強(qiáng),使得到的模型已反映不出樣本所隱含的規(guī)律,最終減弱了模型的泛化能力。
事實(shí)上,本文使用的15個(gè)植被指數(shù)中大部分存在多重共線性,但RF對(duì)共線性不敏感[30],這一點(diǎn)對(duì)構(gòu)建模型很有價(jià)值,特別是針對(duì)復(fù)雜和非線性系統(tǒng),當(dāng)兩個(gè)或多個(gè)變量之間存在共線性時(shí),通常很難確定舍去哪個(gè)變量。
這2種機(jī)器學(xué)習(xí)算法本身都有自身的參數(shù),ANN需要設(shè)定多個(gè)參數(shù)(網(wǎng)絡(luò)結(jié)構(gòu)、結(jié)點(diǎn)個(gè)數(shù)、訓(xùn)練函數(shù)、學(xué)習(xí)函數(shù)、學(xué)習(xí)率等),RF算法只需要設(shè)定2個(gè)參數(shù)(ntree 和mtry),顯然增加了應(yīng)用RF的便利性。
建模算法的選擇對(duì)遙感定量反演的精度有很大影響?;谇捌赗F反演小麥葉綠素的工作基礎(chǔ)[31],本研究利用RF遙感反演了小麥LAI值,結(jié)果表明該算法顯示出較好的預(yù)測(cè)性能。后續(xù)工作將進(jìn)一步研究RF可否適用到小麥或其他作物的生物量和氮含量等作物苗情診斷關(guān)鍵農(nóng)學(xué)參數(shù)的遙感反演,以提升RF算法在農(nóng)作物長(zhǎng)勢(shì)遙感監(jiān)測(cè)中的應(yīng)用價(jià)值。
本研究基于環(huán)境HJ-CCD數(shù)據(jù)和RF算法遙感反演小麥的LAI,并與ANN的預(yù)測(cè)性能進(jìn)行比較,得出如下結(jié)論:
1)可以利用RF算法反演小麥LAI值,而且模型的預(yù)測(cè)精度要高于前人已使用的ANN模型:拔節(jié)、孕穗和開花期的RF模型預(yù)測(cè)值與地面實(shí)測(cè)值的R2依次為0.79,0.67和0.59,相應(yīng)的RMSE依次為0.57,0.90和0.78。
2)相對(duì)于建立ANN模型的過程,基于RF算法構(gòu)建模型更為簡(jiǎn)單,通常只需要優(yōu)化算法本身的2個(gè)參數(shù)(ntree和mtry)。這一優(yōu)勢(shì)有助于該回歸算法被廣泛應(yīng)用于作物長(zhǎng)勢(shì)遙感監(jiān)測(cè)預(yù)報(bào);作為一種集成學(xué)習(xí)算法,RF將多個(gè)弱學(xué)習(xí)器組合起來構(gòu)成強(qiáng)學(xué)習(xí)器,從而確保模型顯示出好的預(yù)測(cè)性能。
環(huán)境HJ-CCD數(shù)據(jù)可免費(fèi)提供給用戶使用,且數(shù)據(jù)的時(shí)效性強(qiáng),可以通過不同時(shí)期的HJ-CCD數(shù)據(jù)分析小麥LAI的差異和變化。
[參考文獻(xiàn)]
[1] 王紀(jì)華,趙春江,黃文江. 農(nóng)業(yè)定量遙感基礎(chǔ)與應(yīng)用[M].北京:科學(xué)出版社,2010.
[2] Wu Mingquan, Wu Chaoyang, Huang Wenjiang, et al. High-resolution leaf area index estimation from synthetic Landsat data generated by a spatial and temporal data fusion model[J]. Computers and Electronics in Agriculture, 2015, 115: 1-11.
[3] Kross A, McNairn H, Lapen D, et al. Assessment of RapidEye vegetation indices for estimation of leaf area index and biomass in corn and soybean crops[J]. International Journal of Applied Earth Observation and Geoinformation, 2015, 34: 235-248.
[4] Fontanelli G, Paloscia S, Zribi M, et al. Sensitivity analysis of X-band SAR to wheat and barley leaf area index in the Merguellil Basin[J]. Remote Sensing Letters, 2013, 4(11): 1107-1116.
[5] Tavakoli H, Mohtasebi S S, Alimardani R, et al. Evaluation of different sensing approaches concerning to nondestructive estimation of leaf area index for winter wheat[J]. International Journal on Smart Sensing and Intelligent Systems, 2014, 7(1): 337-359.
[6] 趙娟,黃文江,張耀鴻,等. 冬小麥不同生育時(shí)期葉面積指數(shù)反演方法[J]. 光譜學(xué)與光譜分析,2013,33(9):2546-2552. Zhao Juan, Huang Wenjiang, Zhang Yaohong, et al. Inversion of leaf area index during different growth stages in winter wheat[J]. Spectroscopy and Spectral Analysis, 2013, 33(9): 2546-2552. (in Chinese with English abstract)
[7] 何亞娟,潘學(xué)標(biāo),裴志遠(yuǎn),等. 基于SPOT遙感數(shù)據(jù)的甘蔗葉面積指數(shù)反演和產(chǎn)量估算[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2013,44(5):226-231. He Yajuan, Pan Xuebiao, Pei Zhiyuan, et al. Estimation of LAI and yield of sugarcane based on SPOT remote sensing data[J]. Transactions of the Chinese Society for Agricultural Machinery, 2013, 44(5): 226-231. (in Chinese with English abstract)
[8] Liu Jiangui, Pattey E, Jégo G. Assessment of vegetation indices for regional crop green LAI estimation from Landsat images over multiple growing seasons[J]. Remote Sensing of Environment, 2012, 123: 347-358.
[9] 郭琳,裴志遠(yuǎn),張松齡,等. 基于環(huán)境星CCD 圖像的甘蔗葉面積指數(shù)反演方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2010,26(10):201-205. Guo Lin, Pei Zhiyuan, Zhang Songling, et al. Estimationmethod of sugarcane leaf area index using HJ CCD images[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2010, 26(10): 201-205. (in Chinese with English abstract)
[10] 陳雪洋,蒙繼華,杜鑫,等. 基于環(huán)境星CCD數(shù)據(jù)的冬小麥葉面積指數(shù)遙感監(jiān)測(cè)模型研究[J]. 國(guó)土資源遙感,2010,22(2):55-62. Chen Xueyang, Meng Jihua, Du Xin, et al. The monitoring of the winter wheat leaf area index based on HJ-1 CCD data[J]. Remote Sensing for Land and Resources, 2010, 22(2): 55-62. (in Chinese with English abstract)
[11] Chen Bangqian, Wu Zhixiang, Wang Jikun, et al. Spatio-temporal prediction of leaf area index of rubber plantation using HJ-1A/1B CCD images and recurrent neural network[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2015, 102: 148-160.
[12] Verrelst J, Munoz J, Alonso L, et al. Machine learning regression algorithms for biophysical parameter retrieval: Opportunities for Sentinel-2 and -3[J]. Remote Sensing of Environment, 2012, 118(4): 127-139.
[13] 夏天,吳文斌,周清波,等. 冬小麥葉面積指數(shù)高光譜遙感反演方法對(duì)比[J]. 農(nóng)業(yè)工程學(xué)報(bào),2013,29(3):139-147. Xia Tian, Wu Wenbin, Zhou Qingbo, et al. Comparison of two inversion methods for winter wheat leaf area index based on hyperspectral remote sensing[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2013, 29(3): 139-147. (in Chinese with English abstract)
[14] Jhonnerie R, Siregar V P, Nababan B, et al. Random forest classification for mangrove land cover mapping using Landsat5 TM and Alos Palsar imageries[J]. Procedia Environmental Sciences, 2015, 24: 215-221.
[15] Nitze I, Barrett B, Cawkwell F. Temporal optimization of image acquisition for land cover classification with Random forest and MODIS time-series[J]. International Journal of Applied Earth Observation and Geoinformation, 2015, 34: 136-146.
[16] Gislason P O, Benediktsson J A, Sveinsson J R. Random Forests for land cover classification[J]. Pattern Recognition Letters, 2006, 27(4): 294-300.
[17] Liu Meiling, Liu Xiangnan, Liu Da, et al. Multivariable integration method for estimating sea surface salinity in coastal waters from in situ data and remotely sensed data using random forest algorithm[J]. Computers and Geosciences, 2015, 75: 44-56.
[18] Mutanga O, Adam E, Azong Cho M. High density biomass estimation for wetland vegetation using WorldView-2 imagery and random forest regression algorithm[J]. International Journal of Applied Earth Observation and Geoinformation, 2012, 18(1): 399-406.
[19] 延森. 遙感數(shù)字影像處理導(dǎo)論[M]. 北京:機(jī)械工業(yè)出版社,2007:288-296.
[20] Nguy-Robertson A, Gitelson A, Peng Y, et al. Green leaf area index estimation in maize and soybean: Combining vegetation indices to achieve maximal sensitivity[J]. Agronomy Journal, 2012, 104(5): 1336-1347.
[21] Liu Jiangui, Pattey E, Jego G. Assessment of vegetation indices for regional crop green LAI estimation from Landsat images over multiple growing seasons[J]. Remote Sensing of Environment, 2012, 123: 347-358.
[22] Breiman L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32.
[23] Haykin S. Neural Networks: A Comprehensive Foundation[M]. 2nd Ed. Prentice Hall, New Jersey, America, 1999.
[24] 夏天,周清波,陳仲新,等. 基于HJ-1衛(wèi)星的冬小麥葉片SPAD遙感監(jiān)測(cè)研究[J]. 中國(guó)農(nóng)業(yè)資源與區(qū)劃,2012,33(6):38-44. Xia Tian, Zhou Qingbo, Chen Zhongxin, et al. Monitoring winter wheat SPAD based on HJ-1 CCD[J]. Chinese Journal of Agricultural Resources and Regional Planning, 2012, 33(6): 38-44. (in Chinese with English abstract)
[25] 王來剛,王備戰(zhàn),馮偉,等. SOPT-5與HJ遙感影像用于冬小麥氮素監(jiān)測(cè)的效果對(duì)比[J]. 麥類作物學(xué)報(bào),2011,31(2):143-148. Wang Laigang, Wang Beizhan, Feng Wei, et al. Comparative analysis of monitoring winter wheat nitrogen with SPOT 5 and HJ image[J]. Journal of Triticeae Crops, 2011, 31(2): 143-148. (in Chinese with English abstract)
[26] 梁棟,管青松,黃文江,等. 基于支持向量機(jī)回歸的冬小麥葉面積指數(shù)遙感反演[J]. 農(nóng)業(yè)工程學(xué)報(bào),2013,29(7):117-123. Liang Dong, Guan Qingsong, Huang Wenjiang, et al. Remote sensing inversion of leaf area index based on support vector machine regression in winter wheat[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2013, 29(7): 117-123. (in Chinese with English abstract)
[27] 王大成,王紀(jì)華,靳寧,等. 用神經(jīng)網(wǎng)絡(luò)和高光譜植被指數(shù)估算小麥生物量[J]. 農(nóng)業(yè)工程學(xué)報(bào),2008,24(2):196-201. Wang Dacheng, Wang Jihua, Jin Ning, et al. ANN-based wheat biomass estimation using canopy hyperspectral vegetation indices[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2008, 24(2): 196-201.(in Chinese with English abstract)
[28] Li Fei, Mistele B, Hu Yuncai, et al. Remotely estimating aerial N status of phenologically differing winter wheat cultivars grown in contrasting climatic and geographic zones in China and Germany[J]. Field Crops Research, 2012, 138(3): 21-32.
[29] Hatfield J L, Prueger J H. Value of using different vegetative indices to quantify agricultural crop characteristics at different growth stages under varying management practices[J]. Remote Sensing, 2010, 2(2): 562-578.
[30] Cutler R D, Edwards T C, Beard K H, et al. Random forests for classification in ecology[J]. Ecology, 2007, 88(11): 2783-2792.
[31] 王麗愛,馬昌,周旭東,等. 基于隨機(jī)森林回歸算法的小麥葉片SPAD值遙感估算[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2015,46(1):259-265. Wang Liai, Ma Chang, Zhou Xudong, et al. Estimation of wheat leaf SPAD value using RF algorithmic model and remote sensing data[J]. Transactions of the Chineses Society for Agricultural Machinery, 2015, 46(1): 259-265. (in Chinese with English abstract)
Inverting wheat leaf area index based on HJ-CCD remote sensing data and random forest algorithm
Wang Liai1, Zhou Xudong2, Zhu Xinkai1, Guo Wenshan1※
(1. Key Laboratory of Crop Genetics and Physiology of Jiangsu Province, Yangzhou University, Yangzhou 225009, China; 2. Information Engineering College of Yangzhou University, Yangzhou 225127, China)
Abstract:The leaf area index (LAI) of crops is an important parameter for crop monitoring. With the remote sensing application in agriculture, inverting LAI of crops from remote sensing data has been studied. Among these studies, vegetation indices are widely used because they can reduce effect background noise on the spectral reflectance of plant canopies. In addition to using vegetation indices, modeling algorithm also plays an important role in improving the remote estimation accuracy of crop LAI. Recently, the emerging Random Forest (RF) machine-learning algorithm is regarded as one of the most precise prediction methods for regression. In this paper, we conducted studies on wheat LAI estimations utilizing RF algorithm and vegetation indices. Firstly based on China’s environmental satellite charge-coupled device (HJ-CCD) image data of wheat (Triticum aestivum) from test sites in Jiangsu province of China during 2010-2013, fifteen vegetation indices from previously reported results and related LAI were respectively calculated at the jointing, booting, and anthesis stages. Then, through utilizing RF algorithm, the LAI inverting model for each stage was respectively established based on its vegetation indices and corresponding in situ wheat LAI measured during the HJ-CCD data acquisition. For each stage, the pooled data from 2010-2013 were randomly divided into a training dataset and an independent model validation dataset (75% and 25% of the pooled data, respectively). For the training dataset, the number of samples was 174 at jointing, 174 at booting, and 147 at anthesis. For the validation dataset, the number of samples was 58 at jointing, 58 at booting, and 49 at anthesis. The training dataset was used to establish models to predict wheat LAI during each growth stage, and the validation dataset was employed to test the quality of each prediction model. The RF model of each stage for estimating wheat LAI was then established in which the 15 vegetation indices were considered to be the independent variables and wheat LAI was the dependent variable. Additionally for each stage, the model based on artificial neural network (ANN) machine-learning algorithm was employed as a reference model, which had been successfully used to invert LAI of crops in previous studies. In order to evaluate each model’s estimation accuracy and to further compare the performances of the two models for each stage, the coefficients of determination (R2) and the corresponding root mean square errors (RMSE) for the estimated-versus-measured LAI were calculated respectively on the basis of the corresponding validation data. The results indicated that RF outperformed ANN at each stage. For RF models, the R2for the estimated-versus-measured LAI values for the three stages were 0.79, 0.67, and 0.59, respectively, in contrast to 0.57, 0.90, and 0.78 from RMSE. For ANN models, the R2for the three stages was 0.67, 0.31, and 0.30, respectively, and the corresponding RMSE was 0.82, 1.94, and 1.43. Furthermore, RF showed the vegetation index of model that noticeably contributed to the LAI estimation for each stage (i.e., EVI at jointing, MTVI2 at booting, and MSR at anthesis). Thus, the RF algorithm provides an effective way to improve the prediction accuracy of LAI in wheat on a large scale.
Keywords:vegetation; neural networks; algorithms; random forest; machine-learning; leaf area index; wheat
通信作者:※郭文善,男,博士生導(dǎo)師,江蘇人,教授,研究方向?yàn)樽魑镌耘嗌砼c信息農(nóng)業(yè)。揚(yáng)州揚(yáng)州大學(xué)江蘇省作物遺傳生理重點(diǎn)實(shí)驗(yàn)室,225009。Email:guows@yzu.edu.cn
作者簡(jiǎn)介:王麗愛,女,博士,山西人,研究方向?yàn)檗r(nóng)業(yè)遙感應(yīng)用研究。揚(yáng)州揚(yáng)州大學(xué)江蘇省作物遺傳生理重點(diǎn)實(shí)驗(yàn)室,225009。Email:wla001@163.com
基金項(xiàng)目:國(guó)家自然科學(xué)基金(31271642);江蘇省高校自然科學(xué)基金(12KJB520018);省屬高校國(guó)際科技合作聘專重點(diǎn)項(xiàng)目;"六大人才高峰"高層次人才項(xiàng)目(2011-NY039);江蘇省高校優(yōu)秀科技創(chuàng)新團(tuán)隊(duì)項(xiàng)目。
收稿日期:2015-07-28
修訂日期:2015-12-23
中圖分類號(hào):S127;TP79
文獻(xiàn)標(biāo)志碼:A
文章編號(hào):1002-6819(2016)-03-0149-06
doi:10.11975/j.issn.1002-6819.2016.03.021