傅隆生,馮亞利,Elkamil Tola,劉智豪,李 瑞,崔永杰
?
基于卷積神經(jīng)網(wǎng)絡(luò)的田間多簇獼猴桃圖像識(shí)別方法
傅隆生1,2,馮亞利1,Elkamil Tola3,劉智豪1,李 瑞1,崔永杰1,2
(1. 西北農(nóng)林科技大學(xué)機(jī)械與電子工程學(xué)院,楊凌 712100; 2. 農(nóng)業(yè)部農(nóng)業(yè)物聯(lián)網(wǎng)重點(diǎn)實(shí)驗(yàn)室,楊凌 712100; 3. Precision Agriculture Research Chair, King Saud University, Riyadh 11451, Saudi Arabia)
為實(shí)現(xiàn)田間條件下快速、準(zhǔn)確地識(shí)別多簇獼猴桃果實(shí),該文根據(jù)獼猴桃的棚架式栽培模式,采用豎直向上獲取果實(shí)圖像的拍攝方式,提出一種基于LeNet卷積神經(jīng)網(wǎng)絡(luò)的深度學(xué)習(xí)模型進(jìn)行多簇獼猴桃果實(shí)圖像的識(shí)別方法。該文構(gòu)建的卷積神經(jīng)網(wǎng)絡(luò)通過(guò)批量歸一化方法,以ReLU為激活函數(shù),Max-pooling為下采樣方法,并采用Softmax回歸分類(lèi)器,對(duì)卷積神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)進(jìn)行優(yōu)化。通過(guò)對(duì)100幅田間多簇獼猴桃圖像的識(shí)別,試驗(yàn)結(jié)果表明:該識(shí)別方法對(duì)遮擋果實(shí)、重疊果實(shí)、相鄰果實(shí)和獨(dú)立果實(shí)的識(shí)別率分別為78.97%、83.11%、91.01%和94.78%。通過(guò)與5種現(xiàn)有算法進(jìn)行對(duì)比試驗(yàn),該文算法相對(duì)相同環(huán)境下的識(shí)別方法提高了5.73個(gè)百分點(diǎn),且識(shí)別速度達(dá)到了0.27 s/個(gè),識(shí)別速度較其他算法速度最快。證明了該文算法對(duì)田間獼猴桃圖像具有較高的識(shí)別率和實(shí)時(shí)性,表明卷積神經(jīng)網(wǎng)絡(luò)在田間果實(shí)識(shí)別方面具有良好的應(yīng)用前景。
圖像處理;圖像識(shí)別;算法;深度學(xué)習(xí);卷積神經(jīng)網(wǎng)絡(luò);獼猴桃
中國(guó)是獼猴桃栽培面積最大的國(guó)家[1],大都是手工收獲果實(shí)[2]。在鄉(xiāng)村勞動(dòng)力向城鎮(zhèn)轉(zhuǎn)移的大背景下,發(fā)展獼猴桃自動(dòng)化采摘技術(shù),特別是研發(fā)獼猴桃采摘機(jī)器人,具有重要的意義[3-7]。獼猴桃采摘機(jī)器人的首要及關(guān)鍵技術(shù)之一是果實(shí)的快速有效識(shí)別[7]。在自然場(chǎng)景下,由于獼猴桃果實(shí)顏色與枯葉、枝干、果柄等復(fù)雜背景的顏色相近[8],果實(shí)成簇并存在大量重疊與遮擋果實(shí)。因此,對(duì)田間環(huán)境下獼猴桃果實(shí)的特征學(xué)習(xí)從而進(jìn)行識(shí)別是獼猴桃采摘機(jī)器人急需解決的一個(gè)關(guān)鍵性問(wèn)題[9]。
近年來(lái),科研工作者對(duì)自然環(huán)境下獼猴桃果實(shí)的識(shí)別進(jìn)行了深入研究??傮w分為2部分,從果實(shí)斜側(cè)面獲取圖像進(jìn)行識(shí)別和從果實(shí)底部豎直向上采集圖像進(jìn)行識(shí)別。對(duì)于從斜側(cè)面獲取的圖像,丁亞蘭等[10]利用顏色因子,采用固定閾值進(jìn)行獼猴桃圖像分割,但無(wú)法有效識(shí)別強(qiáng)反光及暗影區(qū)的果實(shí);崔永杰等[11]利用**顏色空間*通道進(jìn)行獼猴桃圖像分割,采用橢圓形Hough變換擬合單個(gè)果實(shí)輪廓,但對(duì)田間背景下的獼猴桃果實(shí)分割效果不理想;崔永杰等[8]通過(guò)對(duì)比不同顏色空間,提出利用0.9-顏色特征,結(jié)合橢圓形Hough變換進(jìn)行果實(shí)的識(shí)別,但識(shí)別的果實(shí)針對(duì)特定類(lèi)型,在實(shí)際應(yīng)用中其適用性受到抑制;詹文田等[5]基于Adaboost算法,通過(guò)引入RGB、HIS、***顏色模型構(gòu)建識(shí)別獼猴桃果實(shí)的分類(lèi)器,但識(shí)別速度有待提高;慕軍營(yíng)等[12]利用Otsu算法在*通道進(jìn)行圖像分割,基于正橢圓Hough變換提取Canny算子獲取的獼猴桃果實(shí)邊緣圖像進(jìn)行識(shí)別,但不能很好識(shí)別遠(yuǎn)處的果實(shí)。對(duì)于豎直向上獲取的圖像,Scarfe等[13]采用固定閾值法去除背景,提取Sobel邊緣后,利用模板匹配的方法識(shí)別獼猴桃,但未利用果實(shí)的形狀信息;Fu等[14]提出1.1-顏色特性進(jìn)行夜間獼猴桃圖像分割,并結(jié)合最小外接矩形法和橢圓形Hough變換識(shí)別每個(gè)果實(shí),但只能識(shí)別單簇果實(shí);傅隆生等[9]利用豎直向上成像時(shí)果萼都顯現(xiàn)且與果實(shí)有區(qū)別的特點(diǎn),進(jìn)行基于果萼的夜間獼猴桃識(shí)別,但未涉及遮擋與重疊果實(shí)的識(shí)別,且對(duì)多果簇識(shí)別效果不佳。田間環(huán)境下的獼猴桃果實(shí)圖像特征多樣、背景復(fù)雜且形態(tài)特征差異大。已有識(shí)別方法主要根據(jù)經(jīng)驗(yàn),受樣本和人為主觀性的影響,很難具有普適性,魯棒性差,難以用一種方法同時(shí)識(shí)別所有類(lèi)型的獼猴桃果實(shí),且不能同時(shí)識(shí)別多簇果實(shí),不能滿足復(fù)雜田間環(huán)境下的應(yīng)用需求。
相比常規(guī)方法,近年興起的卷積神經(jīng)網(wǎng)絡(luò)[15](convolutional neural network,CNN)方法直接由數(shù)據(jù)本身驅(qū)動(dòng)特征及表達(dá)關(guān)系的自我學(xué)習(xí),對(duì)圖像具有極強(qiáng)的數(shù)據(jù)表征能力。CNN已在手寫(xiě)字符識(shí)別[16-18]、人臉識(shí)別[19-21]、行為識(shí)別[22-23]以及農(nóng)作物識(shí)別[24-25]等方面,獲得了較好的效果。學(xué)者們也開(kāi)始CNN在果實(shí)識(shí)別方面的研究,王前程[26]將CNN應(yīng)用于處理后的6種水果圖像數(shù)據(jù)集進(jìn)行識(shí)別,證明了CNN在水果圖像識(shí)別上的有效性;Sa等[27]基于CNN模型建立果實(shí)的深度網(wǎng)絡(luò)識(shí)別模型,對(duì)不同的果實(shí)圖像進(jìn)行測(cè)試,取得不錯(cuò)的效果。以上研究的開(kāi)展為CNN應(yīng)用于果實(shí)識(shí)別提供了參考和可行性依據(jù),同時(shí)也表明CNN在圖像識(shí)別中可以克服傳統(tǒng)方法的不足。
本文在采集大量田間樣本圖像的基礎(chǔ)上,通過(guò)CNN對(duì)復(fù)雜背景下的獼猴桃果實(shí)進(jìn)行識(shí)別,避免人為主觀因素影響識(shí)別結(jié)果。依據(jù)田間環(huán)境下獼猴桃圖像的特點(diǎn),優(yōu)化LeNet卷積神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu)與參數(shù),從而建立一種基于卷積神經(jīng)網(wǎng)絡(luò)的田間獼猴桃果實(shí)圖像的識(shí)別模型,以實(shí)現(xiàn)田間復(fù)雜環(huán)境下多簇獼猴桃果實(shí)的快速有效識(shí)別。
試驗(yàn)供試圖像于2016年10月—11月采集自陜西省眉縣獼猴桃試驗(yàn)站(34°07'39''N,107°59'50''E,海拔648 m),將數(shù)碼相機(jī)(Canon EOS 40D)通過(guò)三腳架置于獼猴桃果實(shí)下方100 cm左右對(duì)“海沃德”品種進(jìn)行拍照。共采集原始圖像700幅,晴天上午、下午2個(gè)不同時(shí)間段各350幅,圖像格式為JPEG,分辨率為2 352×1 568像素,如圖1所示。
圖1 田間自然環(huán)境下的獼猴桃圖像
獼猴桃采用棚架式栽培方式形成果實(shí)自然下垂且位于枝葉下方的特點(diǎn),底部豎直向上成像后,每個(gè)果實(shí)的果萼部分都顯現(xiàn)。該文隨機(jī)選取600幅(上午和下午各300幅)圖像,截取具有萼的單果作為目標(biāo)區(qū)域,并剔除無(wú)效的圖像區(qū)域,所截取的樣本圖像最小尺寸為74×76像素。再由人工對(duì)原始采集的圖片進(jìn)行篩選,從而避免數(shù)據(jù)樣本的錯(cuò)誤選定和單一性。最終試驗(yàn)所用數(shù)據(jù)集由正樣本(6 000幅)和負(fù)樣本(4 020幅)組成,為2個(gè)不同時(shí)間段均勻分布(上午和下午各5 010幅)。數(shù)據(jù)集均用于卷積神經(jīng)網(wǎng)絡(luò)的訓(xùn)練和參數(shù)優(yōu)化驗(yàn)證,分別從正、負(fù)樣本中隨機(jī)選擇80%樣本構(gòu)建訓(xùn)練集,20%作為驗(yàn)證集。部分正、負(fù)樣本圖樣如圖2所示。
圖2 試驗(yàn)部分?jǐn)?shù)據(jù)集樣本示例
模型訓(xùn)練完成后,將剩余的100幅獼猴桃原始圖像(上午和下午各50幅)作為模型效果驗(yàn)證的測(cè)試集,為減少計(jì)算量及運(yùn)行時(shí)間,將原圖像縮放為600×400像數(shù)進(jìn)行測(cè)試,訓(xùn)練數(shù)據(jù)集與測(cè)試圖像間不重疊。最后本文將與已有的獼猴桃識(shí)別方法進(jìn)行對(duì)比分析。由于測(cè)試數(shù)據(jù)集中兩個(gè)不同時(shí)間段樣本數(shù)量呈均衡分布,因此可將測(cè)試結(jié)果的平均準(zhǔn)確率作為本文模型的識(shí)別效果評(píng)價(jià)指標(biāo)[28]。
本文使用Matlab的MatConvnet工具箱[29]建立卷積神經(jīng)網(wǎng)絡(luò)。LeNet[30]是典型的卷積神經(jīng)網(wǎng)絡(luò),最初成功用于手寫(xiě)數(shù)字識(shí)別。由于獼猴桃果實(shí)的識(shí)別亦是對(duì)某一未知獼猴桃果實(shí)圖像進(jìn)行識(shí)別和匹配,該過(guò)程與LeNet手寫(xiě)字符的識(shí)別相似。因此可以將卷積神經(jīng)網(wǎng)絡(luò)LeNet作為基礎(chǔ)網(wǎng)絡(luò)架構(gòu),并對(duì)其重要的結(jié)構(gòu)參數(shù)和訓(xùn)練策略進(jìn)行優(yōu)化,以獲取適合獼猴桃果實(shí)圖像識(shí)別的模型架構(gòu)。LeNet卷積神經(jīng)網(wǎng)絡(luò)具體算法描述如下:
1)卷積層
卷積核的尺寸與數(shù)量對(duì)于CNN的性能至關(guān)重要。輸入圖像通過(guò)個(gè)不同的卷積核卷積,生成個(gè)不同的特征圖,卷積層如式(1)所示。
2)下采樣層
下采樣層對(duì)輸入進(jìn)行抽樣,如式(2)所示。
本文處理平臺(tái)為筆記本計(jì)算機(jī),處理器為Inter(R)Core(TM)i3,主頻為2.40 GHz,4 GB內(nèi)存,500 GB硬盤(pán),運(yùn)行環(huán)境為:Windows 7 64位,Matlab R2016a,Microsoft Visual Studio 12.0。
若將LeNet結(jié)構(gòu)直接引入獼猴桃果實(shí)圖像特征提取與分類(lèi),考慮與原始網(wǎng)絡(luò)所用樣本(手寫(xiě)字符)的差異以及獼猴桃果實(shí)圖像的成像通道,本文將所用正、負(fù)樣本圖像通過(guò)插值縮放變化為3×32×32的矩陣,并將正、負(fù)樣本分別標(biāo)記為“2”和“1”,作為網(wǎng)絡(luò)訓(xùn)練的輸入。由于獼猴桃圖像受扭轉(zhuǎn)、變形等因素影響較小,因此可以約減原始LeNet網(wǎng)絡(luò)中各卷積層中局部感受野的數(shù)量,以提高網(wǎng)絡(luò)的訓(xùn)練速度。該文對(duì)不同結(jié)構(gòu)的卷積神經(jīng)網(wǎng)絡(luò)進(jìn)行訓(xùn)練,然后通過(guò)驗(yàn)證對(duì)比不同模型識(shí)別的準(zhǔn)確率及耗時(shí)試驗(yàn),本研究最終采用的局部感受野的尺寸均為5×5,3個(gè)卷積層C1、C3、C5局部感受野個(gè)數(shù)分別是6、16和120個(gè)。
針對(duì)各層分布不均和精度彌散的問(wèn)題,該文引入批次規(guī)則化(batch normalization,BN)法減小影響,加快網(wǎng)絡(luò)收斂,防止過(guò)擬合。在原網(wǎng)絡(luò)第1、3、5卷積層后添加BN層,將輸出按照同一批次的特征數(shù)值規(guī)范化至同一分布,具體如下所示
激活函數(shù)采用非飽和線性修正單元(rectified linear units,ReLU)。由于Max-pooling作為一種非線性的下采樣方法,可以在一定的程度上降低卷積層參數(shù)誤差造成的估計(jì)均值偏移所引起的特征提取的誤差,試驗(yàn)選用Max-pooling 作為下采樣方法。網(wǎng)絡(luò)的訓(xùn)練階段采用批量隨機(jī)梯度下降法(mini-batch stochastic gradient descend)。
本文選用損失函數(shù)Softmax loss(對(duì)應(yīng)Softmax回歸分類(lèi)器)進(jìn)行網(wǎng)絡(luò)性能的對(duì)比分析。最終確定的卷積神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)可表示為32×32-6C-2S-16C-2S-120C-2,如圖3所示。
注:C1、S2、C3、S4、C5和FC分別為第1 卷積層、第2下采樣層、第3卷積層、第4下采樣層、第5卷積層和全連接層。
基于LeNet的獼猴桃果實(shí)識(shí)別步驟[31]如下所示:
1)對(duì)裁剪后的獼猴桃圖像進(jìn)行分類(lèi)并作相應(yīng)預(yù)處理,使圖像符合網(wǎng)絡(luò)訓(xùn)練的要求;
2)對(duì)1)中圖像進(jìn)行隨機(jī)采樣,獲得適量的數(shù)據(jù)集,初始化LeNet結(jié)構(gòu)得到初始化濾波器的權(quán)值;
3)將2)的濾波器與1)的訓(xùn)練集圖像卷積,獲得預(yù)定數(shù)量的特征圖,用BN法對(duì)數(shù)據(jù)進(jìn)行處理;
4)將3)中獲得的特征圖通過(guò)式(2)進(jìn)行最大化采樣,得到泛化后的圖像;
5)分別利用上邊3)和4)的方法對(duì)4)中輸出的特征圖進(jìn)行二次卷積,二次批量歸一化處理,二次下采樣,獲得所需的特征圖;
6)用同樣的方法對(duì)5)中輸出的特征圖進(jìn)行三次卷積,三次批量歸一化處理;
7)將6)中所有特征圖轉(zhuǎn)化為一個(gè)列向量,作為全連接層的輸入,計(jì)算識(shí)別結(jié)果和標(biāo)記的差異,通過(guò)反向傳播算法自頂向下調(diào)節(jié)更新網(wǎng)絡(luò)參數(shù);
8)輸入處理后的測(cè)試圖像,利用訓(xùn)練得到的網(wǎng)絡(luò)模型對(duì)測(cè)試圖像進(jìn)行分類(lèi),通過(guò)Softmax分類(lèi)器,并結(jié)合多尺度滑動(dòng)窗算法顯示識(shí)別結(jié)果。
由于田間拍攝的獼猴桃圖像中果實(shí)并非全部相互獨(dú)立,因此本文按照?qǐng)D像中果實(shí)輪廓的完整程度將果實(shí)分為4種類(lèi)型:第1類(lèi)是指果實(shí)的部分區(qū)域被遮擋而導(dǎo)致輪廓不完整的果實(shí),稱為遮擋果實(shí),如圖4a所示;第2類(lèi)是指兩個(gè)以及其以上果實(shí)區(qū)域互相遮擋不易于區(qū)分開(kāi)的果實(shí),稱為重疊果實(shí),如圖4b所示(為矩形框所標(biāo)記的果實(shí));第3類(lèi)是2個(gè)及以上果實(shí)輪廓相接,稱為相鄰果實(shí),如圖4c所示;第4類(lèi)是指果實(shí)輪廓獨(dú)立完整且相互分離的果實(shí),稱為獨(dú)立果實(shí),如圖4d所示。
圖4 獼猴桃圖像類(lèi)別
采用上文描述的CNN結(jié)構(gòu),使用訓(xùn)練集樣本來(lái)訓(xùn)練CNN,網(wǎng)絡(luò)初始權(quán)重的初始化采用標(biāo)準(zhǔn)差為0.01,均值為0的高斯分布。樣本迭代次數(shù)均設(shè)置為45次,批處理BatchSize為100,并設(shè)置權(quán)重參數(shù)的初始學(xué)習(xí)速率為0.001,動(dòng)量因子設(shè)置為0.9。對(duì)上述訓(xùn)練集進(jìn)行45次迭代的訓(xùn)練,其變化曲線如圖5所示。
結(jié)果表明,隨著迭代次數(shù)不斷增加,訓(xùn)練集和驗(yàn)證集的分類(lèi)誤差逐漸降低,當(dāng)訓(xùn)練迭代到第28次時(shí),網(wǎng)絡(luò)可以實(shí)現(xiàn)對(duì)訓(xùn)練集和驗(yàn)證集的誤識(shí)別率都降至0,之后分類(lèi)準(zhǔn)確率趨于穩(wěn)定,且從第3次迭代以后訓(xùn)練集和驗(yàn)證集兩者的誤差差值相差不大,說(shuō)明模型狀況良好。模型在經(jīng)過(guò)28次迭代后,訓(xùn)練損失基本收斂到穩(wěn)定值,表明卷積神經(jīng)網(wǎng)絡(luò)達(dá)到了預(yù)期的訓(xùn)練效果。
圖5 訓(xùn)練和驗(yàn)證誤差曲線
按照?qǐng)D3所示的網(wǎng)絡(luò)結(jié)構(gòu),使用訓(xùn)練好的模型對(duì)獼猴桃果實(shí)樣本進(jìn)行識(shí)別。圖6為輸入的獼猴桃果實(shí)圖像經(jīng)過(guò)3個(gè)卷積層所對(duì)應(yīng)的各層特征圖的輸出結(jié)果,輸出層輸出1和2分別指代背景和果實(shí)。由圖6所示的各層顯示結(jié)果可知,卷積操作能夠有效提取獼猴桃果實(shí)特征,說(shuō)明本試驗(yàn)的網(wǎng)絡(luò)結(jié)構(gòu)通過(guò)局部感受野和權(quán)值共享,能夠降低背景干擾、增強(qiáng)目標(biāo)特征。
圖6 卷積神經(jīng)網(wǎng)絡(luò)各卷積層的處理結(jié)果示例
為了驗(yàn)證模型的可靠性與穩(wěn)定性,對(duì)測(cè)試集的100幅田間獼猴桃果實(shí)圖像(上午和下午各50幅,共包含目標(biāo)獼猴桃果實(shí)5 918個(gè))進(jìn)行識(shí)別。該文選用重疊系數(shù)[32]作為試驗(yàn)結(jié)果有效性的評(píng)價(jià)指標(biāo),重疊系數(shù)是指識(shí)別后的目標(biāo)與真實(shí)目標(biāo)重合的比率。該課題組設(shè)計(jì)的末端執(zhí)行器[7]根據(jù)獼猴桃生長(zhǎng)特點(diǎn),從果實(shí)底部旋轉(zhuǎn)上升伸入毗鄰間隙,采用逐漸包絡(luò)的方式分離毗鄰果實(shí)并抓持,試驗(yàn)結(jié)果表明允許的誤差半徑為10 mm,因此只需知道果實(shí)的大部分區(qū)域(80%)即可進(jìn)行果實(shí)的采摘,避免果實(shí)實(shí)際區(qū)域難以精確定位的問(wèn)題。因此當(dāng)重疊系數(shù)大于等于80%,即為正確識(shí)別。果實(shí)識(shí)別成功率為成功識(shí)別的果實(shí)數(shù)與實(shí)際目標(biāo)果實(shí)數(shù)的比值。果實(shí)識(shí)別時(shí)間為一幅圖片的運(yùn)行時(shí)間除以該圖片中成功識(shí)別的果實(shí)數(shù)。識(shí)別結(jié)果如表1所示,識(shí)別效果如圖7所示。
表1 獼猴桃果實(shí)識(shí)別結(jié)果
由表1知,獨(dú)立果實(shí)識(shí)別率最高(94.78%),其次是相鄰果實(shí)(91.01%),再次是重疊果實(shí)(83.11%),識(shí)別效果最差的是遮擋果實(shí)(78.97%)。當(dāng)獼猴桃果實(shí)距離圖像中心較遠(yuǎn)從而發(fā)生變形或被枝葉遮擋面積較大時(shí),果實(shí)易被誤識(shí)別;當(dāng)圖像中多個(gè)果實(shí)連續(xù)重疊時(shí),易將后邊果實(shí)與前方果實(shí)判斷為一個(gè)果實(shí)或者無(wú)法識(shí)別后方重疊嚴(yán)重的果實(shí),出現(xiàn)漏識(shí)別現(xiàn)象,如圖7a所示;當(dāng)圖像中果實(shí)的部分或整體區(qū)域被陽(yáng)光直射形成強(qiáng)烈反光,該區(qū)域不易識(shí)別或無(wú)法識(shí)別,影響識(shí)別精度;圖像中的多個(gè)相鄰果實(shí)兩側(cè)相鄰部分輪廓,易被識(shí)別為一個(gè)果實(shí),出現(xiàn)誤識(shí)別,是造成識(shí)別率低的主要原因,如圖7b所示。圖7b所示的誤識(shí)別情況,可能原因是制作訓(xùn)練數(shù)據(jù)集時(shí),單個(gè)獼猴桃果實(shí)裁剪效果不理想(裁剪重疊區(qū)域時(shí)沒(méi)有處理好邊緣問(wèn)題)。
圖7 獼猴桃果實(shí)識(shí)別結(jié)果以及誤識(shí)別示例
在圖7中有一些獼猴桃果實(shí)所占的區(qū)域不能被精確的識(shí)別,識(shí)別區(qū)域(圖7中黑色的框)稍有偏離果實(shí)實(shí)際區(qū)域。但整體上而言,果實(shí)的主要區(qū)域已被識(shí)別,采用本課題組開(kāi)發(fā)的獼猴桃采摘機(jī)器人末端執(zhí)行器[7]能夠?qū)崿F(xiàn)果實(shí)的采摘。
相關(guān)文獻(xiàn)提出了基于田間環(huán)境下獼猴桃圖片的識(shí)別方法,為了驗(yàn)證本文提出的算法性能,與Scarfe[13]、詹文田等[5]、崔永杰等[8]、Fu等[14]、傅隆生等[9]5種常規(guī)方法進(jìn)行比較,結(jié)果如表2所示。
表2 不同獼猴桃果實(shí)識(shí)別方法的性能比較
從表2中可以看到,崔永杰等[8]、Fu等[14]提出的算法的識(shí)別率與本文算法的識(shí)別率相近,詹文田等[5]、傅隆生等[9]提出的算法的識(shí)別率比本文算法的識(shí)別率分別高7.41個(gè)百分點(diǎn)和5.01個(gè)百分點(diǎn)。但是Fu等[14]和傅隆生等[9]識(shí)別的獼猴桃果實(shí)圖像是近距離底部拍攝的圖像,并且只針對(duì)果實(shí)相互獨(dú)立和相鄰的單簇少果類(lèi)型,識(shí)別5果及以上效果不好;詹文田等[5]和崔永杰等[8]識(shí)別的獼猴桃果實(shí)圖像是近距離側(cè)面拍攝的圖像,圖像的獲取在很大程度上是人為拍攝特定角度的圖片,并且用于試驗(yàn)的測(cè)試圖片每一張只針對(duì)一種果實(shí)特征,在實(shí)際應(yīng)用中具有一定局限性。此外,Scarfe[13]、詹文田等[5]、崔永杰等[8]、Fu等[14]、傅隆生等[9]的識(shí)別方法均需對(duì)獼猴桃圖像提取人工選取的底層特征,對(duì)圖像進(jìn)行大量的預(yù)處理,操作復(fù)雜。另外已有常規(guī)算法缺乏高層次表達(dá),難以體現(xiàn)所選底層特征間的空間關(guān)系,因此對(duì)于識(shí)別多果相對(duì)困難。
本文提出的算法只需對(duì)圖像進(jìn)行簡(jiǎn)單的預(yù)處理,用于試驗(yàn)的測(cè)試圖片每一張都包含果實(shí)的4種特征,一幅圖片至少包含30個(gè)果實(shí)以上,并且在單個(gè)獼猴桃果實(shí)識(shí)別速度上本文算法比其他3種算法有了明顯提升。在相同的識(shí)別環(huán)境下,本文算法的識(shí)別率89.29%高于Scarfe[13]的83.56%,提高了5.73個(gè)百分點(diǎn)??傮w而言,本文提出的基于卷積神經(jīng)網(wǎng)絡(luò)的識(shí)別方法具有較強(qiáng)的抗干擾能力,可以同時(shí)識(shí)別田間復(fù)雜環(huán)境下的多簇獼猴桃果實(shí),且識(shí)別過(guò)程耗時(shí)短,對(duì)光線變化、枝葉遮擋均具有相對(duì)較好的魯棒性,更加滿足獼猴桃采摘機(jī)器人實(shí)際應(yīng)用中的采摘要求。
1)針對(duì)獼猴桃采摘的需求,提出了一種基于卷積神經(jīng)網(wǎng)絡(luò)的田間獼猴桃果實(shí)識(shí)別方法,本文對(duì)LeNet模型進(jìn)行參數(shù)優(yōu)化和結(jié)構(gòu)約簡(jiǎn),并通過(guò)試驗(yàn)驗(yàn)證,表明識(shí)別模型可以自動(dòng)從復(fù)雜數(shù)據(jù)中有效學(xué)習(xí)到獼猴桃的特征,從而避免了常規(guī)方法中由研究者主觀選取特征的不足。同時(shí),簡(jiǎn)約后的模型在很大程度滿足了在實(shí)際中的應(yīng)用,加強(qiáng)了該模型在常規(guī)性能計(jì)算平臺(tái)上的適應(yīng)性。
2)本文構(gòu)建的32×32-6C-2S-16C-2S-120C-2結(jié)構(gòu)卷積神經(jīng)網(wǎng)絡(luò),經(jīng)過(guò)訓(xùn)練后對(duì)100幅圖像中共包含5 918個(gè)獼猴桃果實(shí)的識(shí)別率達(dá)到89.29%,相對(duì)其它遠(yuǎn)距離底部成像識(shí)別多簇獼猴桃果實(shí)的識(shí)別方法提高了5.73個(gè)百分點(diǎn)。在果實(shí)識(shí)別速度上,本算法達(dá)到平均0.27 s識(shí)別一個(gè)獼猴桃果實(shí),基本上滿足獼猴桃采摘機(jī)器人的工作需求。
3)本文所用模型可以應(yīng)用于田間環(huán)境下多果獼猴桃識(shí)別,突破了大多數(shù)常規(guī)識(shí)別算法不能同時(shí)識(shí)別多簇獼猴桃果實(shí)的不足,為獼猴桃采摘機(jī)器人多機(jī)械臂作業(yè)的研究提供有力支撐。
目前,該文所用模型可以準(zhǔn)確地識(shí)別出獼猴桃果實(shí)是否存在,但對(duì)于一些遮擋和重疊果實(shí)沒(méi)有達(dá)到很好的效果,尤其是兩個(gè)或兩個(gè)以上相鄰或重疊果實(shí)兩側(cè)部分輪廓,易被識(shí)別為一個(gè)果實(shí),從而出現(xiàn)誤識(shí)別現(xiàn)象,這種現(xiàn)象有待進(jìn)一步研究。同時(shí),為了達(dá)到推廣應(yīng)用的效果,下一步將深化網(wǎng)絡(luò)結(jié)構(gòu),增加學(xué)習(xí)樣本的種類(lèi)與數(shù)量,提高分類(lèi)器的識(shí)別能力。
[1] 張計(jì)育,莫正海,黃勝男,等. 21世紀(jì)以來(lái)世界獼猴桃產(chǎn)業(yè)發(fā)展以及中國(guó)獼猴桃貿(mào)易與國(guó)際競(jìng)爭(zhēng)力分析[J]. 中國(guó)農(nóng)學(xué)通報(bào),2014,30(23):48-55.
Zhang Jiyu, Mo Zhenghai, Huang Shengnan, et al. Development of kiwifruit industry in the world and analysis of trade and international competitiveness in china entering 21st century[J]. China Agricultural Science Bulletin, 2014, 30(23): 48-55. (in Chinese with English abstract)
[2] 陳軍,王虎,蔣浩然,等. 獼猴桃采摘機(jī)器人末端執(zhí)行器設(shè)計(jì)[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2012,43(10):151-154.Chen Jun, Wang Hu, Jiang Haoran, et al. Design of end-effector for kiwifruit harvesting robot[J]. Transactions of the Chinese Society for Agricultural Machinery, 2012, 43(10): 151-154. (in Chinese with English abstract)
[3] Zhang L, Wang Y, Yang Q, et al. Kinematics and trajectory planning of a cucumber harvesting robot manipulator[J]. International Journal of Agricultural & Biological Engineering, 2009, 2(1): 1-7.
[4] Rakun J, Stajnko D, Zazula D. Detecting fruits in natural scenes by using spatial-frequency based texture analysis and multiview geometry[J]. Computers & Electronics in Agriculture, 2011, 76(1): 80-88.
[5] 詹文田,何東健,史世蓮. 基于Adaboost算法的田間獼猴桃識(shí)別方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2013,29(23):140-146.
Zhan Wentian, He Dongjian, Shi Shilian. Recognition of kiwifruit in field based on Adaboost algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2013, 29(23): 140-146. (in Chinese with English abstract)
[6] Bechar A, Vigneault C. Agricultural robots for field operations: Concepts and components[J]. Biosystems Engineering, 2016, 149: 94-111.
[7] 傅隆生,張發(fā)年,槐島芳德,等. 獼猴桃采摘機(jī)器人末端執(zhí)行器設(shè)計(jì)與試驗(yàn)[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2015,46(3):1-8.
Fu Longsheng, Zhang Fanian, Gejima Yoshinori , et al. Development and experiment of end-effector for kiwifruit harvesting robot[J]. Transactions of the Chinese Society for Agricultural Machinery, 2015, 46(3): 1-8. (in Chinese with English abstract)
[8] 崔永杰,蘇帥,王霞霞,等. 基于機(jī)器視覺(jué)的自然環(huán)境中獼猴桃識(shí)別與特征提取[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2013,44(5):247-252.
Cui Yongjie, Su Shuai, Wang Xiaxia, et al. Recognition and feature extraction of kiwifruit in natural environment based on machine vision[J]. Transactions of the Chinese Society for Agricultural Machinery, 2013, 44(5): 247-252. (in Chinese with English abstract)
[9] 傅隆生,孫世鵬,Vázquez-Arellano Manuel,等. 基于果萼圖像的獼猴桃果實(shí)夜間識(shí)別方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2017,33(2):199-204.
Fu Longsheng, Sun Shipeng, Vázquez-Arellano Manuel, et al. Kiwifruit recognition method at night based on fruit calyx[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(2): 199-204. (in Chinese with English abstract)
[10] 丁亞蘭,耿楠,周全程. 基于圖像的獼猴桃果實(shí)目標(biāo)提取研究[J]. 微計(jì)算機(jī)信息,2009,18(4):294-295.
Ding Yalan, Geng Nan, Zhou Quancheng. Research on the object extraction of kiwifruit based on images[J]. Microcomputer Information, 2009, 18(4): 294-295. (in Chinese with English abstract)
[11] 崔永杰,蘇帥,呂志海,等. 基于Hough變換的獼猴桃毗鄰果實(shí)的分離方法[J]. 農(nóng)機(jī)化研究,2012,34(12):166-169.
Cui Yongjie, Su Shuai, Lü Zhihai, et al. A method for separation of kiwifruit adjacent fruits based on Hough transformation[J]. Journal of Agricultural Mechanization Research, 2012, 34(12): 166-169. (in Chinese with English abstract)
[12] 慕軍營(yíng),陳軍,孫高杰,等. 基于機(jī)器視覺(jué)的獼猴桃特征參數(shù)提取[J]. 農(nóng)機(jī)化研究,2014,36(6):138-142.
Mu Junying, Chen Jun, Sun Gaojie, et al. Characteristic parameters extraction of kiwifruit based on machine vision[J]. Journal of Agricultural Mechanization Research, 2014, 36(6): 138-142.(in Chinese with English abstract)
[13] Scarfe A J. Development of an Autonomous Kiwifruit Harvester[D]. New Zealand, Manawatu: Massey University,2012.
[14] Fu L, Wang B, Cui Y, et al. Kiwifruit recognition at nighttime using artificial lighting based on machine vision[J]. International Journal of Agricultural and Biological Engineering, 2015, 8(4): 52-59.
[15] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]// Curran Associates Inc. International Conference on Neural Information Processing Systems. 2012: 1097-1105.
[16] Alwzwazy H, Albehadili H, Alwan Y, et al. Handwritten digit recognition using convolutional neural networks[J]. International Journal of Innovative Research in Computer & Communication Engineering, 2016, 4(2): 1101-1106.
[17] Yang W, Jin L, Tao D, et al. Dropsample : A new training method to enhance deep convolutional neural networks for large-scale unconstrained handwritten Chinese character recognition[J]. Pattern Recognition, 2016, 58(4): 190-203.
[18] Albu R D. Human face recognition using convolutional neural networks[J]. Journal of Electrical & Electronics Engineering, 2009, 2(2): 110-113.
[19] Ramaiah N P, Ijjina E P, Mohan C K. Illumination invariant face recognition using convolutional neural networks[C]// IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems, 2015: 1-4.
[20] Singh R, Om H. Newborn face recognition using deep convolutional neural network[J]. Multimedia Tools & Applications, 2017, 76(18): 19005-19015.
[21] Dobhal T, Shitole V, Thomas G, et al. Human activity recognition using binary motion image and deep learning [J]. Procedia Computer Science, 2015, 58: 178-185.
[22] Ronao C A, Cho S B. Human activity recognition with smartphone sensors using deep learning neural networks[J]. Expert Systems with Applications, 2016, 59: 235-244.
[23] 王忠民,曹洪江,范琳. 一種基于卷積神經(jīng)網(wǎng)絡(luò)深度學(xué)習(xí)的人體行為識(shí)別方法[J]. 計(jì)算機(jī)科學(xué),2016,43(s2):56-58.
Wang Zhongmin, Cao Hongjiang, Fan Lin. Method on human activity recognition based on convolutional neural networks[J]. Computer Science, 2016, 43(s2): 56-58. (in Chinese with English abstract)
[24] 高震宇,王安,劉勇,等. 基于卷積神經(jīng)網(wǎng)絡(luò)的鮮茶葉智能分選系統(tǒng)研究[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2017,48(7):53-58.
Gao Zhenyu, Wang An, Liu Yong, et al. Intelligent fresh-tea-leaves sorting system research based on convolution neural network[J]. Transactions of the Chinese Society for Agricultural Machinery, 2017, 48(7): 53-58. (in Chinese with English abstract)
[25] 周云成,許童羽,鄭偉,等. 基于深度卷積神經(jīng)網(wǎng)絡(luò)的番茄主要器官分類(lèi)識(shí)別方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2017,33(15):219-226.
Zhou Yuncheng, Xu Tongyu, Zheng Wei, et al. Classification and recognition approaches of tomato main organs based on DCNN[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(15): 219-226. (in Chinese with English abstract)
[26] 王前程. 基于深度學(xué)習(xí)的水果圖像識(shí)別算法研究[D].保定:河北大學(xué),2016.
Wang Qiancheng, The Algorithm Research of Fruit Image Recognition Based on Deep Learning[D]. Baoding: Hebei University,2016. (in Chinese with English abstract)
[27] Sa I, Ge Z, Dayoub F, et al. Deepfruits: A fruit detection system using deep neural networks[J]. Sensors, 2016, 16(8): 1-23.
[28] 楊國(guó)國(guó),鮑一丹,劉子毅. 基于圖像顯著性分析與卷積神經(jīng)網(wǎng)絡(luò)的茶園害蟲(chóng)定位與識(shí)別[J]. 農(nóng)業(yè)工程學(xué)報(bào),2017,33(6):156-162.
Yang Guoguo, Bao Yidan, Liu Ziyi. Localization and recognition of pests in tea plantation based on image saliency analysis and convolutional neural network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(6): 156-162. (in Chinese with English abstract)
[29] Vedaldi A, Lenc K. MatConvNet: Convolutional neural networks for MATLAB[C]// 23rd ACM International Conference on Multimedia, Brisbane, Australia, 2015: 689-692.
[30] Lécun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[31]李輝,石波. 基于卷積神經(jīng)網(wǎng)絡(luò)的人臉識(shí)別算法[J].軟件導(dǎo)刊, 2017,16(3):26-29.
[32] 宋懷波,張衛(wèi)園,張欣欣,等. 基于模糊集理論的蘋(píng)果表面陰影去除方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2014,30(3):135-141.
Song Huaibo, Zhang Weiyuan, Zhang Xinxin, et al. Shadow removal method of apples based on fuzzy set theory[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2014, 30(3): 135-141. (in Chinese with English abstract)
Image recognition method of multi-cluster kiwifruit in field based on convolutional neural networks
Fu Longsheng1,2, Feng Yali1, Elkamil Tola3, Liu Zhihao1, Li Rui1, Cui Yongjie1,2
(1.712100; 2.712100; 3.)
China is the largest country for cultivating kiwifruit, and Shaanxi Province provides the largest production, which accounts for approximately 70% of the production in China and 33% of the global production. Harvesting kiwifruit in this region relies mainly on manual picking which is labor-intensive. Therefore, the introduction of robotic harvesting is highly desirable and suitable. The fast and effective recognition of kiwifruit in the field under natural scenes is one of the key technologies for robotic harvesting. Recently, the study on kiwifruit recognition has been limited to a single cluster and multi clusters in the field have seldom been considered. In this paper, according to growth characteristics of kiwifruit grown on sturdy support structures, an RGB (red, green, blue) camera was placed around 100 cm underneath the canopy so that kiwifruit clusters could be included in the images. We proposed a kiwifruit image recognition system based on the convolutional neural network (CNN), which has a good robustness avoiding the subjectivity and limitation of the features selection by artificial means. The CNN could be trained end to end, from raw pixels to ultimate categories, and we optimized the critical structure parameters and the training strategy. Ultimately, the network was made up of 1 input layer, 3 convolutional layers, 2 sub-sampling layers, 1 full convolutional layer, and 1 output layer. The CNN architecture was optimized by using batch normalization (BN) method, which normalized the data distribution of the middle layer and the output data, accelerating the training convergence and reducing the training time. Therefore, the BN layers were added after the 1, 3 and 5th convolutional layer (Conv1, Conv3, and Conv5 layer) of the original LeNet network. The size of all convolutional kernels was 5×5, and that of all the sub-sampling layers was 2×2. The feature map numbers of Conv1, Conv3, and Conv5 were 6, 16 and 120, respectively. After manual selection and normalizing, the RGB image of kiwifruit was transferred into a matrix with the size of 32×32 as the input of the network, stochastic gradient descent was used to train our models with mini-batch size of 100 examples, and momentum was set as 0.9. In addition, the CNN took advantages of the part connections, the weight sharing and Max pooling techniques to lower complexity and improve the training performance of the model simultaneously. The network used rectified linear units (ReLU) as activation function, which could greatly accelerate network convergence. The proposed model for training kiwifruit was represented as 32×32-6C-2S-16C-2S-120C-2. Finally, 100 images of kiwifruit in the field (including 5918 fruits) were used to test the model, and the results showed that the recognition ratios of occluded fruit, overlapped fruit, adjacent fruit and separated fruit were 78.97%, 83.11%, 91.01% and 94.78%, respectively. The overall recognition rate of the model reached 89.29%, and it only took 0.27 s in average to recognize a fruit. There was no overlap between the testing samples and the training samples, which indicated that the network had a high generalization performance, and the testing images were captured from 9 a.m. to 5 p.m., which indicated the network had a good robustness to lightness variations. However, some fruits were wrongly detected and undetected, which included the fruits occluded by branches or leaves, overlapped to each other and the ones under extremely strong sunlight. Particularly, 2 or more fruits overlapped were recognized as one fruit, which was the main reason to the success rate not very high. This phenomenon demands a further research. By comparing with the conventional methods, it suggested that the method proposed obtained a higher recognition rate and better speed, and especially it could simultaneously identify multi-cluster kiwifruit in the field, which provided significant support for multi-arm operation of harvesting robotic. It proves that the CNN has a great potential for recognition of fruits in the field.
image processing; image recognition; algorithms; deep learning; convolutional neural network; kiwifruit
10.11975/j.issn.1002-6819.2018.02.028
TP391.41
A
1002-6819(2018)-02-0205-07
2017-08-28
2017-12-26
陜西省重點(diǎn)研發(fā)計(jì)劃一般項(xiàng)目(2017NY-164);陜西省科技統(tǒng)籌創(chuàng)新工程計(jì)劃項(xiàng)目(2015KTCQ02-12);國(guó)家自然科學(xué)基金資助項(xiàng)目(61175099);西北農(nóng)林科技大學(xué)國(guó)際合作種子基金(A213021505)
傅隆生,江西吉安人,副教授,博士,主要從事農(nóng)業(yè)智能化技術(shù)與裝備研究。Email:fulsh@nwafu.edu.cn
中國(guó)農(nóng)業(yè)工程學(xué)會(huì)會(huì)員:傅隆生(E042600025M)
傅隆生,馮亞利,Elkamil Tola,劉智豪,李 瑞,崔永杰. 基于卷積神經(jīng)網(wǎng)絡(luò)的田間多簇獼猴桃圖像識(shí)別方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2018,34(2):205-211. doi:10.11975/j.issn.1002-6819.2018.02.028 http://www.tcsae.org
Fu Longsheng, Feng Yali, Elkamil Tola, Liu Zhihao, Li Rui, Cui Yongjie. Image recognition method of multi-cluster kiwifruit in field based on convolutional neural networks[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2018, 34(2): 205-211. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2018.02.028 http://www.tcsae.org