王丹丹,何東健
?
基于R-FCN深度卷積神經(jīng)網(wǎng)絡的機器人疏果前蘋果目標的識別
王丹丹,何東健※
(1. 西北農(nóng)林科技大學機械與電子工程學院,楊凌 712100;2. 農(nóng)業(yè)農(nóng)村部農(nóng)業(yè)物聯(lián)網(wǎng)重點實驗室,楊凌 712100;3. 陜西省農(nóng)業(yè)信息感知與智能服務重點實驗室,楊凌 712100)
疏果前期蘋果背景復雜、光照條件變化、重疊及被遮擋,特別是果實與背景葉片顏色極為相近等因素,給其目標識別帶來很大困難。為識別疏果前期的蘋果目標,提出基于區(qū)域的全卷積網(wǎng)絡(region-based fully convolutional network, R-FCN)的蘋果目標識別方法。該方法在研究基于ResNet-50和ResNet-101的R-FCN結構及識別結果的基礎上,改進設計了基于ResNet-44的R-FCN,以提高識別精度并簡化網(wǎng)絡。該網(wǎng)絡主要由ResNet-44全卷積網(wǎng)絡、區(qū)域生成網(wǎng)絡(Region Proposal Network, RPN)及感興趣區(qū)域(Region of Interest, RoI)子網(wǎng)構成。ResNet-44全卷積網(wǎng)絡為基礎網(wǎng)絡,用以提取圖像的特征,RPN根據(jù)提取的特征生成RoI,然后RoI子網(wǎng)根據(jù)ResNet-44提取的特征及RPN輸出的RoI進行蘋果目標的識別與定位。對采集的圖像擴容后,隨機選取23 591幅圖像作為訓練集,4 739幅圖像作為驗證集,對網(wǎng)絡進行訓練及參數(shù)優(yōu)化。該文提出的改進模型在332幅圖像組成的測試集上的試驗結果表明,該方法可有效地識別出重疊、被枝葉遮擋、模糊及表面有陰影的蘋果目標,識別的召回率為85.7%,識別的準確率為95.1%,誤識率為4.9%,平均速度為0.187 s/ 幅。通過與其他3種方法進行對比試驗,該文方法比Faster R-CNN、基于ResNet-50和ResNet-101的R-FCN的1值分別提高16.4、0.7和0.7個百分點,識別速度比基于ResNet-50和ResNet-101的R-FCN分別提高了0.010和0.041 s。該方法可實現(xiàn)傳統(tǒng)方法難以實現(xiàn)的疏果前蘋果目標的識別,也可廣泛應用于其他與背景顏色相近的小目標識別中。
圖像處理;算法;圖像識別;小蘋果;目標識別;深度學習;R-FCN
自然生長環(huán)境下果實目標的準確識別對實現(xiàn)果園自動化及智能化管理具有重要意義,其中疏果前期小果實目標的識別對實現(xiàn)疏果自動化、農(nóng)藥和水肥的變量噴施、果實生長情況的監(jiān)測等均具有重要意義。然而復雜的背景環(huán)境、光照條件的變化、果實重疊及被遮擋,特別是疏果前期果實與背景葉片顏色極為相近等因素,導致蔬果前期小果實的識別非常困難[1]。
國內(nèi)外針對果實的識別進行了廣泛的研究[2-9]。果實目標的識別方法主要有色差法[10-11]、K-means聚類方法[12]、模糊C均值方法[13]、K最近鄰(K-nearest neighbor, KNN)方法[14]、人工神經(jīng)網(wǎng)絡(artificial neural network, ANN)[15]及支持向量機(support vector machine, SVM)[16-17]等,上述方法雖然能將圖像中的果實目標識別出來,但均基于果實的顏色、形狀或紋理特征,當果實表面由于光照或自然環(huán)境因素引起顏色不均勻、有陰影或被遮擋等情況,識別準確率會明顯下降。
近年來,隨著深度卷積神經(jīng)網(wǎng)絡(deep convolutional neural network, DCNN)的發(fā)展,研究者開始將其應用于目標的識別與定位。DCNN可直接將原始圖像作為輸入,并且能夠實現(xiàn)圖像特征的自動提取,避免了圖像預處理和特征提取等復雜操作[18]。DCNN可與分類識別過程融為一體,且網(wǎng)絡能通過數(shù)據(jù)進行自我學習,是一種高效的識別方法[19],目前DCNN被逐步地應用于果實的識別與定位中[20-23]。為了實現(xiàn)蘋果和柑橘的準確計數(shù),Chen等[24]設計了2個深度網(wǎng)絡:一個基于BLOB檢測算子的全卷積網(wǎng)絡,主要用于提取圖像中果實目標的候選區(qū)域,另一個卷積神經(jīng)網(wǎng)絡用于實現(xiàn)果實的計數(shù)。Bargoti等[25]首先利用多尺度多層感知器和卷積神經(jīng)網(wǎng)絡將蘋果圖像分割,提取出圖像中的蘋果目標,然后用分水嶺分割和圓形Hough變換法對蘋果目標進行識別和計數(shù)。Rahnemoonfar等[26]設計了用于番茄目標識別與計數(shù)的DCNN,該網(wǎng)絡能夠準確地對表面有陰影、被枝葉遮擋及重疊的番茄目標準確計數(shù)。Liu等[27]提出了從圖像序列中識別可見的柑橘和蘋果果實并計數(shù)的方法,該方法首先用全卷積網(wǎng)絡將圖像準確地分割,然后用匈牙利算法跟蹤圖像幀中的果實,最后用運動恢復結構算法估計果實的三維位置和大小并去除假陽性。為了實現(xiàn)芒果的產(chǎn)量估計,Stein等[28]利用多視角的方法識別、跟蹤、定位并計數(shù)芒果目標,該方法中用Faster R-CNN[29]網(wǎng)絡識別圖像中的果實目標,有效地實現(xiàn)了被遮擋芒果目標的準確計數(shù)。上述研究主要針對未成熟與背景顏色相近或成熟的果實,對蔬果前小果實的識別尚未見到報道。疏果前期的蘋果,目標很小,與背景顏色極具相似性,且大多成簇存在,導致果實的識別較為困難。
基于區(qū)域的全卷積網(wǎng)絡(region-based fully convolutional network,R-FCN)是一種利用全卷積網(wǎng)絡進行目標分類識別的網(wǎng)絡[30],它適用于復雜背景中廣泛存在的小目標的識別[31]。R-FCN在整個圖像上共享計算,減少了參數(shù)冗余,并利用位置敏感分數(shù)圖,解決了圖像分類平移不變性和目標檢測平移變化之間的矛盾,在ImageNet上取得了較好的識別分類結果,目前已廣泛應用于目標識別與定位中[31-33]。
鑒于上述,本文研究并提出一種基于R-FCN的疏果前蘋果目標識別方法,通過利用深層神經(jīng)網(wǎng)絡提取蘋果目標的特征并將蘋果區(qū)域識別定位,以期克服復雜背景、果實重疊、被遮擋及表面陰影等因素的干擾,從而高效準確地識別出圖像中的蘋果目標,為實現(xiàn)果園生長監(jiān)測、疏果、農(nóng)藥變量噴施的自動化奠定基礎。
供試圖像在蘋果疏果前于西北農(nóng)林科技大學園藝學院果園采集,采集時間為2018年5月1日(晴)、5月2日(晴)和5月4日(陰)上午9:00—11:30,下午2:30—6:30,此時蘋果的橫徑在25 mm以下。為了保證樣本的多樣性,分別在順光和逆光條件下,采集自然條件下生長的蘋果樹冠圖像。圖像采集設備為iPhone7 plus,圖像分辨率為3 024×3 024像素、格式為JPEG。圖像采集設備距離每個樹冠0.5~1.2 m、按東南西北4個不同方向進行圖像采集。
試驗中,共采集蘋果圖像3 165幅,為減少后續(xù)試驗運行時間,首先將圖像分辨率縮小為500×500 像素,然后對圖像進行人工標注。圖像標注時,利用每個蘋果的最小外接矩形進行標注,以保證每個矩形標注框內(nèi)只有一個蘋果目標且盡可能少的包含背景像素。
圖像標注后,選取不同天氣和光照條件下的332幅圖像作為測試集,其余2 833幅圖像用于網(wǎng)絡訓練。選取的332幅圖像的詳細信息如表1所示。
為了豐富圖像訓練集,更好地提取圖像特征,避免出現(xiàn)過擬合,對數(shù)據(jù)集進行數(shù)據(jù)增強處理。由于光照方向及天氣等不確定因素,導致圖像采集時光照條件十分復雜,為了提高訓練模型的泛化能力,對原始圖像進行了圖像亮度增強及減弱、色度增強及減弱、對比度增強及減弱、銳度增強及減弱8種處理。其中,圖像的亮度、色度和對比度均增強為原始圖像的1.2倍,銳度增強為原始圖像的2倍,亮度、色度、對比度和銳度分別減弱為原始圖像的60%、60%、60%和10%。此外,為了模擬設備在圖像采集過程中可能產(chǎn)生的噪聲,對原始圖像添加了方差為0.01高斯噪聲。圖像擴增后,原始標注仍然有效。對如圖1a所示的原始圖像,進行9種圖像增強的結果如圖1b~1j所示。圖像增強后的28 330幅圖像用于后續(xù)深度網(wǎng)絡的訓練和參數(shù)優(yōu)化驗證。從28 330幅圖像中隨機選取4 739幅作為驗證集,其余23 591幅作為訓練集,訓練集與測試集之間無重疊。
表1 測試集圖像的樣本分布詳細信息
圖1 圖像增強結果
本文主要目標是從一幅自然果園環(huán)境下采集的圖像中識別出蘋果目標。卷積神經(jīng)網(wǎng)絡(convolutional neural network, CNN)對目標特征具有很好的自學習能力,因此可用CNN識別蘋果目標。但CNN不能定位出蘋果區(qū)域?;趨^(qū)域的CNN,在卷積層后接入了感興趣區(qū)域(region of interest, RoI)池化層,使網(wǎng)絡可以定位目標RoI,因此本文用基于區(qū)域的CNN識別蘋果?;趨^(qū)域的CNN,如SPPNet、Fast R-CNN和Faster R-CNN等,在RoI池化層后接有全連接層,每個RoI在全連接層上的計算是不共享的,造成大量的參數(shù)冗余。Dai等[30]提出的基于區(qū)域的全卷積網(wǎng)絡(R-FCN)去除了全連接層,所有RoI在整個網(wǎng)絡上可以共享參數(shù),減少了參數(shù)冗余,速度得到大幅提升。R-FCN可用于復雜背景中廣泛存在的小目標的識別[32],且可將目標定位。此外,由于蘋果目標在圖像中的位置是不確定的,因此要求識別網(wǎng)絡具有較強的平移不變性,而在蘋果目標檢測時,需要沒有平移變化,R-FCN通過利用位置敏感分數(shù)圖解決了二者之間的矛盾,故本研究采用R-FCN網(wǎng)絡進行蘋果目標的識別。
R-FCN網(wǎng)絡由基礎全卷積網(wǎng)絡(fully convolutional network, FCN)、區(qū)域生成網(wǎng)絡(region proposal network, RPN)和RoI子網(wǎng)組成?;AFCN用于提取圖像的特征,RPN根據(jù)這些特征生成感興趣區(qū)域RoI,RoI子網(wǎng)根據(jù)FCN提取的特征與RPN輸出的RoI進行目標區(qū)域的識別與定位。R-FCN網(wǎng)絡的結構如圖2所示。
隨著深度學習理論的發(fā)展,出現(xiàn)很多深層神經(jīng)網(wǎng)絡模型,主要包括AlexNet[34]、ZF[35]、VGG[36]、GoogleNet[37]和ResNet[38]等,其中每一種網(wǎng)絡可通過設計不同的權重層數(shù)建立不同深度的網(wǎng)絡模型。雖然更深的網(wǎng)絡可能帶來更高的精度,但會導致網(wǎng)絡訓練及檢測的速度降低。由于殘差結構并不增加模型參數(shù),可有效地緩解深層網(wǎng)絡訓練的梯度消失和訓練退化的問題,從而提升網(wǎng)絡 收斂性能,因此,本文用基于ResNet的FCN作為基礎網(wǎng)絡。
注:C表示目標的種類,本研究中C=1,表示蘋果,k2表示RoI池化層將RoI劃分成k2空間位置。
RPN網(wǎng)絡的輸入為卷積特征圖,本文用9個基準窗口搜索一塊區(qū)域。由于蘋果生長的自然環(huán)境比較復雜,在同一生長時期蘋果目標的大小不同,且圖像中的蘋果目標較小,在RPN產(chǎn)生感興趣區(qū)域時,根據(jù)圖像中單個果實總像素數(shù)量,設計3種面積尺度,即(32×32,64×64,128×128)。單個果實(圖3a)標注矩形框的長寬比約為1∶1,一些被遮擋的果實(圖3b),其標注矩形框的長寬比約在0.5~2之間,因此設計3種長寬比(0.5,1,2)。利用3種面積尺度和3種長寬比組合成9種基準窗口,然后用基準窗口對包含目標的窗口位置進行預測,使輸出的RoI更加準確,以期實現(xiàn)蘋果的準確識別。
為了準確提取蘋果的RoI,在基礎FCN后面連接RoI子網(wǎng)。RoI子網(wǎng)可對蘋果區(qū)域的位置參數(shù)進行回歸修正,以期實現(xiàn)蘋果目標的準確識別。
圖3 被遮擋長寬比不同的果實圖像示例
R-FCN網(wǎng)絡首先用基礎FCN生成特征映射圖,F(xiàn)CN網(wǎng)絡的最后一層卷積層為每類生成2個2(+1)維的位置敏感分數(shù)圖(2表示RoI池化層將RoI劃分成2個空間位置,為目標的種類數(shù),+1為目標種類數(shù)和背景)。對于一個由RPN網(wǎng)絡獲得的×大小的RoI,首先將其劃分為×個子區(qū)域,則每個區(qū)域的大小為/×/,RoI第(,)個子區(qū)域(0≤,≤-1)對于第個類別的池化響應為
計算RoI每個子區(qū)域的池化響應后,利用得到的2個位置敏感分數(shù)對RoI進行投票,可得到RoI屬于每一類的得分(一個+1維的向量),然后通過計算Softmax響應確定RoI屬于每個類別的概率。
經(jīng)過RoI子網(wǎng)后,可確定RoI所屬類別,并可確定其位置參數(shù)=(t,t,t,t),其中,(t,t)、t和t分別表示蘋果區(qū)域的左上角坐標以及寬和高。
蘋果識別的目標是最小化預測蘋果區(qū)域與真實蘋果標記區(qū)域之間的誤差。該網(wǎng)絡用誤差函數(shù)來衡量識別的誤差,誤差函數(shù)由分類誤差cls和定位誤差reg組成,其定義見文獻[30]。
本試驗在Ubuntu 16.04操作系統(tǒng)下,基于i7-7700HQ CPU (16 GB內(nèi)存)和NVIDIA GTX 1070 GPU (8 GB顯存)的硬件平臺上搭建Caffe深度學習框架,采用Python語言編程實現(xiàn)蘋果目標識別網(wǎng)絡模型的訓練和測試。
本文使用隨機梯度下降法對網(wǎng)絡以端對端的聯(lián)合方式進行訓練,為了提高訓練效率,使用在線難例挖掘(online hard example mining, OHEM)技術。網(wǎng)絡訓練時,首先用ImageNet上的預訓練模型初始化網(wǎng)絡參數(shù),初始學習率設置為0.001,權重衰減率設置為0.000 5,動量因子設置為0.9,驗證周期設置為5 000,即網(wǎng)絡每迭代5 000次在驗證集上測試訓練模型的準確率,當模型準確率達到收斂時停止訓練。最大迭代次數(shù)設置為50 000次。訓練過程中,前向傳播時每幅圖片計算300個RoI,并選擇128個進行反向傳播。訓練結束后保存訓練模型,并用測試集對模型進行驗證。網(wǎng)絡的最終輸出為識別到的目標及其為蘋果目標的概率,結果僅保留概率值大于0.8的 區(qū)域。
本研究蘋果目標識別過程中需同時考慮召回率和準確率,故用1值[32]對識別結果進行評價。
式中表示準確率,表示召回率;TP表示算法識別到的蘋果的個數(shù),F(xiàn)P表示將背景誤識別為蘋果的個數(shù),F(xiàn)N表示未識別到的蘋果的個數(shù)。
在研究蘋果識別的R-FCN時,首先研究了基于ResNet-50和ResNet-101的R-FCN。通過對這2種網(wǎng)絡的分析可以發(fā)現(xiàn),這2種網(wǎng)絡的不同之處在于conv4_x模塊,如圖4a和4b所示。基于ResNet-101的R-FCN的conv4_x模塊比基于ResNet-50的R-FCN多51層,然而果實的識別率無顯著提高(表4)。因此,為了簡化網(wǎng)絡,并期望提高識別率,將基于ResNet-50的R-FCN的conv4_x模塊層數(shù)分別減少3層、6層和9層,即得到基于ResNet-41、ResNet-44和ResNet-47的R-FCN。為了篩選出識別結果最優(yōu)的網(wǎng)絡模型,將3種網(wǎng)絡在332幅測試集上的識別結果進行對比,結果如表2所示,可以看出基于ResNet-44的R-FCN的1值最高,因此篩選出了識別結果最優(yōu)的網(wǎng)絡模型,即基于ResNet-44的R-FCN。
為了驗證本文改進的基于ResNet-44的R-FCN的性能,對該網(wǎng)絡在332幅測試集上的識別結果進行了進一步分析。332幅測試集中共有4 234個蘋果目標,本文方法識別到目標的個數(shù)為3 816個,其中3 630個為蘋果目標,該方法識別的召回率為85.7%,準確率為95.1%,誤識率為4.9%。本文方法的識別結果示例和具體識別結果分別如圖5和表3所示。
由表3和圖5可以看出,本文方法不僅適用于陰天采集的光照均勻的圖像,也適用于晴天強光照射條件下采集的圖像,且對于順光條件下的圖像及逆光條件下的圖像均能得到有較好的識別結果,對于晴天順光、晴天逆光、陰天順光和陰天逆光條件下的圖像,該方法可以分別識別出圖像中87.4%、89.1%、83.0%和84.1%的蘋果目標。該方法能夠有效地識別出表面光線很強及表面有陰影的蘋果目標,對于這2種影響下的果實,識別的召回率分別為93.2%和84.6%(表3)。從圖5b和圖5e可以看出,該方法對于一個目標被葉柄分為多個部分的蘋果目標識別也有效,其識別召回率為75.1%。此外,模糊的蘋果目標(圖5c)也能被準確地識別出來,該方法可識別出圖像中86.7%的模糊蘋果。強光照射時逆光條件下采集的圖像可能導致蘋果目標表面光線極弱,有時肉眼也很難將果實目標識別出來,而本文方法對此類圖像中的果實識別也有效(圖5d和5f),識別召回率為81.9%。另外,果實被嚴重遮擋情況下(圖5e)本文方法同樣適用,其識別召回率為78.8%。利用本文方法進行小果實識別時,由于處于蔬果前期的果實與背景極其相近,也會有誤識別(圖5a)和漏識別出現(xiàn)(圖5e)。
注:1、3、5分別表示表面光線極弱、有陰影及光線強的蘋果識別結果,2和9分別表示誤識別和漏識別,4、6、7、8分別表示被遮擋、重疊、被枝條或葉柄分成多個部分及模糊的蘋果識別結果。
表3 蘋果識別結果
注:SD:晴天順光條件;SB:晴天逆光條件;CD:陰天順光條件;CB:陰天逆光條件;OC:被遮擋蘋果;OL:重疊蘋果;SI:表面有強光照的蘋果;WI:表面有極弱光照的蘋果;SA:表面有陰影的蘋果;DP:被枝條或葉柄分為多個部分的蘋果;BA:模糊蘋果;IA:獨立蘋果。
Note: SD: Sunny direct sunlight condition; SB: Sunny backlight condition; CD: Cloudy direct sunlight condition; CB: Cloudy backlight condition; OC: Occluded apples; OL:Overlapped apples; SI: Apples with strong illumination on the surface; WI: Apples with weak illumination on the surface; SA: Apples with shadows on the surface; DP: Apples being divided into parts by branches or petiole; BA: Blurred apples; IA: Independent apples.
經(jīng)過上述分析可以得出,盡管存在誤識別和漏識別現(xiàn)象,本文方法能夠較準確地識別出圖像中的蘋果目標。
為了進一步分析網(wǎng)絡的性能,將本文設計的網(wǎng)絡與Faster R-CNN[29]、基于ResNet-50的R-FCN和基于ResNet-101的R-FCN進行對比。其中,F(xiàn)aster R-CNN網(wǎng)絡用ZF[32]作為基礎網(wǎng)絡。各網(wǎng)絡的識別結果及參數(shù)數(shù)量如表4所示。
表4 4種網(wǎng)絡識別性能比較
由表4可以看出,本文提出的基于ResNet-44的R-FCN網(wǎng)絡的1值最高,比Faster R-CNN的1值提高16.4個百分點,比基于ResNet-50和ResNet-101的R-FCN的1值均提高0.7個百分點,說明利用本文方法進行蘋果識別的識別結果最準確。更深的殘差網(wǎng)絡往往可以獲得更高的精度,但對于本文蘋果測試集,更深層的網(wǎng)絡如基于ResNet-50和ResNet-101的R-FCN,可能會出現(xiàn)一定程度的過擬合,導致識別準確度略低。網(wǎng)絡越深,每幅圖像運行的時間越長,基于ResNet-44的R-FCN的網(wǎng)絡層數(shù)比基于ResNet-50和ResNet-101的R-FCN的層數(shù)少,因此每幅圖像的平均運行時間最短,識別速度比基于ResNet-50和ResNet-101的R-FCN分別提高了0.010和0.041 s,但比Faster R-CNN的運行時間略長,相較于Faster R-CNN,基于ResNet-44的R-FCN的蘋果識別結果更準確?;赗esNet-44的R-FCN參數(shù)數(shù)量為2.9×107,分別為基于ResNet-50和ResNet-101的R-FCN參數(shù)數(shù)量的93.5%和58.0%,說明該網(wǎng)絡可在保證識別精度的同時,有效地簡化了網(wǎng)絡。
本文方法能從具有復雜背景的圖像中識別出蘋果目標,有望應用于蘋果果實生長信息監(jiān)測及自動化疏果中。與傳統(tǒng)的果實識別方法比較,本文方法更具挑戰(zhàn),一方面處于生長初期的蘋果目標較小,另一方面此時的蘋果目標與背景葉片顏色極具相似性,導致利用傳統(tǒng)方法已無法將圖像中的果實有效準確地識別出來。深度學習理論使蔬果前期蘋果目標的識別成為可能,它可以自動地提取圖像的特征,是一種有效的識別方法。
研究R-FCN時,先研究了基于ResNet-50和ResNet-101的R-FCN。在對這2種網(wǎng)絡的結構、識別結果及網(wǎng)絡訓練用時(表4)分析后,為了簡化網(wǎng)絡,并期望提高識別率,設計了基于ResNet-44的R-FCN。雖然基于ResNet-50和ResNet-101的R-FCN在ImageNet上取得了很好的識別分類結果,但對于本研究中蘋果目標的識別,基于ResNet-44的R-FCN識別結果更加準確。
本文提出的基于ResNet-44的R-FCN,因網(wǎng)絡足夠深,能夠提取圖像更抽象的特征,故能有效地從復雜背景中識別出蘋果。在R-FCN訓練中使用了OHEM技術,對容易識別錯的難例進行了在線學習,因此網(wǎng)絡的誤識率較低。雖然基于ResNet-44的R-FCN識別結果優(yōu)于其他3種網(wǎng)絡,但識別結果中仍存在漏識現(xiàn)象(圖5e)。需要進一步研究網(wǎng)絡結構和參數(shù)的優(yōu)化方法,以減少漏識現(xiàn)象。
本研究以疏果前期的蘋果為研究對象,提出了基于R-FCN的蘋果目標識別方法。
1)設計了基于ResNet-44的R-FCN網(wǎng)絡,該網(wǎng)絡可在保證蘋果識別精度的同時,達到精簡網(wǎng)絡的目的。
2)利用本文方法識別蘋果目標,其識別的召回率為85.7%,準確率為95.1%,誤識率為4.9%,平均速度為0.187 s/幅,該方法能夠較準確地識別出圖像中小的蘋果,且具有良好的實時性。
3)與Faster R-CNN、基于ResNet-50和ResNet-101的R-FCN識別網(wǎng)絡相比,本文方法的1值比這3種方法的1值分別提高16.4、0.7和0.7個百分點。
本文方法識別結果有了很好地提升,但模型仍然較為龐大、識別率尚待提高。今后將進一步研究簡化網(wǎng)絡結構和提高識別精度的方法。
[1] Gongal A, Amatya S, Karkee M, et al. Sensors and systems for fruit detection and localization: A review[J]. Computers & Electronics in Agriculture, 2015, 116:8-19.
[2] Jiang G Q, Zhao C J. Apple recognition based on machine vision[C]// International Conference on Machine Learning and Cybernetics. IEEE, 2012: 1148-1151.
[3] Lu J, Sang N. Detecting citrus fruits and occlusion recovery under natural illumination conditions[J]. Computers & Electronics in Agriculture, 2015, 110: 121-130.
[4] Ji W, Zhao D, Cheng F, et al. Automatic recognition vision system guided for apple harvesting robot[J]. Computers & Electrical Engineering, 2012, 38(5): 1186-1195.
[5] Rizon M, Yusri N A N, Kadir M F A, et al. Determination of mango fruit from binary image using randomized Hough transform[C]// Eighth International Conference on Machine Vision. International Society for Optics and Photonics, 2015, 9875(3): 1-5.
[6] Silwal A, Gongal A, Karkee M. Identification of red apples in field environment with over the row machine vision system[J]. Agricultural Engineering International: The CIGR Journal, 2014, 16(4): 66-75.
[7] Rakun J, Stajnko D, Zazula D. Detecting fruits in natural scenes by using spatial-frequency based texture analysis and multiview geometry[J]. Computers & Electronics in Agriculture, 2011, 76(1):80-88.
[8] Chaivivatrakul S, Dailey M N. Texture-based fruit detection[J]. Precision Agriculture, 2014, 15(6): 662-683.
[9] Feng J, Wang S, Liu G, et al. A separating method of adjacent apples based on machine vision and chain code information[C]// International Conference on Computer and Computing Technologies in Agriculture, 2012: 258-267.
[10] Arefi A, Motlagh A M, Mollazade K, et al. Recognition and localization of ripen tomato based on machine vision[J]. Australian Journal of Crop Science, 2011, 5(10):1144-1149.
[11] Zhou R, Damerow L, Sun Y, et al. Using colour features of cv. ‘Gala’ apple fruits in an orchard in image processing to predict yield[J]. Precision Agriculture, 2012, 13(5): 568-580.
[12] Wachs J P, Stern H I, Burks T, et al. Low and high-level visual feature-based apple detection from multi-modal images[J]. Precision Agriculture, 2010, 11(6): 717-735.
[13] Zhu A, Yang L. An improved FCM algorithm for ripe fruit image segmentation[C]//IEEE International Conference on Information and Automation. IEEE, 2014: 436-441.
[14] Linker R, Cohen O, Naor A. Determination of the number of green apples in RGB images recorded in orchards[J]. Computers & Electronics in Agriculture, 2012, 81(1): 45-57.
[15] Arefi A, Motlagh A M. Development of an expert system based on wavelet transform and artificial neural networks for the ripe tomato harvesting robot[J]. Australian Journal of Crop Science, 2013, 7(5): 699-705.
[16] Lv Q, Cai J R, Liu B, et al. Identification of fruit and branch in natural scenes for citrus harvesting robot using machine vision and support vector machine[J]. International Journal of Agricultural & Biological Engineering, 2014, 7(2): 115-121.
[17] Zhao C Y, Lee W S, He D J. Immature green citrus detection based on colour feature and sum of absolute transformed difference (SATD) using colour images in the citrus grove[J]. Computers & Electronics in Agriculture, 2016, 124: 243-253.
[18] 趙凱旋, 何東健. 基于卷積神經(jīng)網(wǎng)絡的奶牛個體身份識別方法[J]. 農(nóng)業(yè)工程學報, 2015, 31(5): 181-187.Zhao Kaixuan, He Dongjian. Recognition of individual dairy cattle based on convolutional neural networks[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE)), 2015, 31(5): 181-187. (in Chinese with English abstract)
[19] Guo Y, Liu Y, Oerlemans A, et al. Deep learning for visual understanding: A review[J]. Neurocomputing, 2016, 187: 27-48.
[20] 周云成, 許童羽, 鄭偉,等. 基于深度卷積神經(jīng)網(wǎng)絡的番茄主要器官分類識別方法[J]. 農(nóng)業(yè)工程學報, 2017, 33(15): 219-226.Zhou Yuncheng, Xu Tongyu, Zheng Wei, et al. Classification and recognition approaches of tomato main organs based on DCNN[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(15): 219-226. (in Chinese with English abstract)
[21] 傅隆生, 馮亞利, Elkamil Tola,等. 基于卷積神經(jīng)網(wǎng)絡的田間多簇獼猴桃圖像識別方法[J]. 農(nóng)業(yè)工程學報, 2018, 34(2): 205-211.Fu Longsheng, Feng Yali, Elkamil Tola, et al. Image recognition method of multi-cluster kiwifruit in field based on convolutional neural networks[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2018, 34(2): 205-211. (in Chinese with English abstract)
[22] Sa I, Ge Z, Dayoub F, et al. DeepFruits: A fruit detection system using deep neural networks[J]. Sensors, 2016, 16(8):1222.
[23] Bargoti S, Underwood J. Deep fruit detection in orchards[C]// IEEE International Conference on Robotics and Automation, 2017: 3626-3633.
[24] Chen S W, Skandan S S, Dcunha S, et al. Counting apples and oranges with deep learning: a data driven approach[J]. IEEE Robotics & Automation Letters, 2017, 2(2): 781-785.
[25] Bargoti S, Underwood J P. Image segmentation for fruit detection and yield estimation in apple orchards[J]. Journal of Field Robotics, 2017, 34(6): 1039-1060.
[26] Rahnemoonfar M, Sheppard C. Deep count: Fruit counting based on deep simulated learning[J]. Sensors, 2017, 17(4): 905.
[27] Liu X, Chen S W, Aditya S, et al. Robust fruit counting: combining deep learning, tracking, and structure from motion[J]. International Conference on Intelligent Robots and Systems, 2018: 1045-1052.
[28] Stein M, Bargoti S.Underwood J. Image based mango fruit detection, localisation and yield estimation using multiple view geometry[J]. Sensors, 2016, 16(11):1915.
[29] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-timeobject detection with region proposal networks[C]// International Conference on Neural Information Processing Systems, 2015:91-99.
[30] Dai J, Li Y, He K, et al. R-fcn: Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems. Curran Associates Inc. 2016.
[31] 蔣勝,黃敏,朱啟兵,等. 基于R-FCN的行人檢測方法研究[J]. 計算機工程與應用,2018, 54(18): 180-183. Jiang Sheng, Huang Min, Zhu Qibing, et al. Pedestrian detection method based on R-FCN[J]. Computer Engineering and Applications, 2018, 54(18): 180-183. (in Chinese with English abstract)
[32] 桑農(nóng),倪子涵. 復雜場景下基于R-FCN的手勢識別[J]. 華中科技大學學報: 自然科學版,2017(10):54-58. Sang Nong, Ni Zihan. Gesture recognition based on R-FCN in complex scenes[J]. Journal of Huazhong University of Science and technology: Natural Science Edition, 2017(10): 54-58. (in Chinese with English abstract)
[33] 徐逸之,姚曉婧,李祥,等. 基于全卷積網(wǎng)絡的高分辨遙感影像目標檢測[J]. 測繪通報,2018,490(1):80-85. Xu Yizhi, Yao Xiaojing, Li Xiang, et al. Object detection in high resolution remote sensing images based on fully convolution networks[J]. Bulletin of Surveying and Mapping, 2018, 490(1): 80-85. (in Chinese with English abstract)
[34] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]// International Conference on Neural Information Processing Systems. Curran Associates Inc., 2012: 1097-1105.
[35] Zeiler M D, Fergus R. Visualizing and understanding convolutional networks[C]// European Conference on Computer Vision, 2014: 818-833.
[36] Simonyan K, Zisserman A.Very deep convolutional networks for large-scale image recognition[C]// International Conference on Learning Representations, 2015: 1-14.
[37] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutionss[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2015: 1-9.
[38] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
Recognition of apple targets before fruits thinning by robot based on R-FCN deep convolution neural network
Wang Dandan, He Dongjian※
(1.712100,; 2.712100,; 3.712100,)
Before fruit thinning, factors such as complex background, various illumination conditions, foliage occlusion, fruit clustering, especially the extreme similarities between apples and background, made the recognition of small apple targets very difficult. To solve these problems, a recognition method based on region-based fully convolutional network (R-FCN) was proposed. Firstly, deep convolution neural network including ResNet-50 based R-FCN and ResNet-101 based R-FCN were studied and analyzed. After analyzing the framework of the 2 networks, it was obviously that the difference between these 2 networks was the ‘conv4’ block. The ‘conv4’ block of ResNet-101 based R-FCN was 51 more layers than that of ResNet-50 based R-FCN, but the recognition accuracy of the 2 networks was almost the same. By comparing the framework and recognition result of ResNet-50 based R-FCN and ResNet-101 based R-FCN, A R-FCN based on ResNet-44 was designed to improve the recognition accuracy and simplify the network. The main operation to simplify the network was to simplify the ‘conv4’ block, and the ‘conv4’ block of ResNet-44 based R-FCN was 6 layers less than that of ResNet-50 based R-FCN. The ResNet-44 based R-FCN consisted of ResNet-44 fully convolutional network, region proposal network (RPN) and region of interest (RoI) sub-network. ResNet-44 fully convolutional network, the backbone network of R-FCN, was used to extract features of image. The features were then used by RPN to generate RoIs. After that, the features extracted by ResNet-44 fully convolutional network and RoIs generated by RPN were used by RoI sub-network to recognize and locate small apple targets. A total of 3 165 images were captured in an experimental apple orchard in College of Horticulture, Northwest A&F University, in City of Yangling, China. After image resizing and manual annotation, 332 images, including 85 images captured under sunny direct sunlight condition, 88 images captured under sunny backlight condition, 86 images captured under cloudy direct sunlight condition, 74 images captured under cloudy backlight condition, were selected as test set, and the other 2 833 images were used to train and optimize the network. To enrich image training set, data augment, including brightness enhancement and reduction, chroma enhancement and reduction, contrast enhancement and reduction, sharpness enhancement and reduction, and adding Gaussian noise, was performed, then a total of 28 330 images were obtained with 23 591 images randomly selected as training set, and the other 4 739 images as validation set. After training, the simplified ResNet-44 based R-FCN was tested on the test set, and the experimental results indicated that the method could effectively apply to images captured under different illumination conditions. The method could recognize clustering apples, occluded apples, vague apples and apples with shadows, strong illumination and weak illumination on the surface. In addition, apples divided into parts by branched or petiole cloud also be recognized effectively. Overall, the recognition recall rate could achieve 85.7%. The recognition accuracy and false recognition rate were 95.2% and 4.9%, respectively. The average recognition time was 0.187 s per image. To further test the performance of the proposed method, the other 3 methods were compared, including Faster R-CNN, ResNet-50 based R-FCN and ResNet-101 based R-FCN. The1 of the proposed method was increased by 16.4, 0.7 and 0.7 percentage points, respectively. The average running time of the proposed method improved by 0.010 and 0.041 s compared with that of ResNet-50 based R-FCN and ResNet-101 based R-FCN, respectively. The proposed method could achieve the recognition of small apple targets before fruits thinning which could not be realized by traditional methods. It could also be widely applied to the recognition of other small targets whose features are similar to background.
image processing; algorithms; image recognition; small apple; target recognition; deep learning; R-FCN
王丹丹,何東健. 基于R-FCN深度卷積神經(jīng)網(wǎng)絡的機器人疏果前蘋果目標的識別[J]. 農(nóng)業(yè)工程學報,2019,35(3):156-163. doi:10.11975/j.issn.1002-6819.2019.03.020 http://www.tcsae.org
Wang Dandan, He Dongjian. Recognition of apple targets before fruits thinning by robot based on R-FCN deep convolution neural network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(3): 156-163. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2019.03.020 http://www.tcsae.org
2018-10-24
2019-01-21
國家高技術研究發(fā)展計劃(863 計劃)資助項目(2013AA100304)
王丹丹,博士生,主要從事農(nóng)業(yè)智能化檢測方面的研究。 Email:wdd_app@163.com
何東健,教授,博士生導師,主要從事智能化檢測與技術研究。Email:hdj168@nwsuaf.edu.cn
10.11975/j.issn.1002-6819.2019.03.020
TP391.41
A
1002-6819(2019)-03-0156-08