蒼 巖,羅順元,喬玉龍
基于深層神經(jīng)網(wǎng)絡(luò)的豬聲音分類(lèi)
蒼 巖,羅順元,喬玉龍
(哈爾濱工程大學(xué)信息與通信工程學(xué)院,哈爾濱 150001)
豬的聲音能夠反映生豬的應(yīng)激狀態(tài)以及健康狀況,同時(shí)聲音信號(hào)也是最容易通過(guò)非接觸方式采集到的生物特征之一。深層神經(jīng)網(wǎng)絡(luò)在圖像分類(lèi)研究中顯示了巨大優(yōu)勢(shì)。譜圖作為一種可視化聲音時(shí)頻特征顯示方式,結(jié)合深層神經(jīng)網(wǎng)絡(luò)分類(lèi)模型,可以提高聲音信號(hào)分類(lèi)的精度?,F(xiàn)場(chǎng)采集不同狀態(tài)的豬只聲音,研究適用于深層神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)的最優(yōu)譜圖生成方法,構(gòu)建了豬只聲音譜圖的數(shù)據(jù)集,利用MobileNetV2網(wǎng)絡(luò)對(duì)3種狀態(tài)豬只聲音進(jìn)行分類(lèi)識(shí)別。通過(guò)分析對(duì)比不同譜圖參數(shù)以及網(wǎng)絡(luò)寬度因子和分辨率因子,得出適用于豬只聲音分類(lèi)的最優(yōu)模型。識(shí)別精度方面,通過(guò)與支持向量機(jī),隨機(jī)森林,梯度提升決策樹(shù)、極端隨機(jī)樹(shù)4種模型進(jìn)行對(duì)比,驗(yàn)證了算法的有效性,異常聲音分類(lèi)識(shí)別精度達(dá)到97.3%。該研究表明,豬只的異常發(fā)聲與其異常行為相關(guān),因此,對(duì)豬只的聲音進(jìn)行識(shí)別有助于對(duì)其進(jìn)行行為監(jiān)測(cè),對(duì)建設(shè)現(xiàn)代化豬場(chǎng)具有重要意思。
信號(hào)處理;聲音信號(hào);識(shí)別;深度學(xué)習(xí);豬只音頻;梅爾倒譜系數(shù);分類(lèi)
生豬的聲音信息是其重要體征之一,與豬的生長(zhǎng)狀態(tài)和健康狀況息息相關(guān)。群養(yǎng)條件下,生豬呼吸系統(tǒng)的疾病具有一定的傳染性,容易引發(fā)群體性疾病,聲音特征可以直接反應(yīng)呼吸系統(tǒng)的疾病。另外,聲音也被認(rèn)為是判斷豬只應(yīng)激狀態(tài)的一個(gè)依據(jù),豬只在運(yùn)輸或者屠宰的過(guò)程中,會(huì)產(chǎn)生應(yīng)激反應(yīng)。在這類(lèi)情況下,特別是其他應(yīng)激源特征不明顯或者不易采集時(shí),由于聲音強(qiáng)度高,特征明顯,可以作為一個(gè)應(yīng)激程度的判別條件。此外,生豬的聲音是一種較為容易獲取的一種生物學(xué)信息,并且聲音信號(hào)的采集可以與豬只保持一定距離,不會(huì)引發(fā)豬只任何應(yīng)激反應(yīng),因此聲音已經(jīng)逐漸成為一種用于分析行為、健康和動(dòng)物福利的重要方法[1]。特別是,隨著無(wú)線傳感器網(wǎng)絡(luò)技術(shù)的快速發(fā)展[2-4],圍繞著家畜[5-8],特別是豬只聲音分析的研究逐漸增多[9-14]。
早期動(dòng)物聲音領(lǐng)域中,通常使用包絡(luò)模板匹配方法識(shí)別,將待識(shí)別的動(dòng)物聲音,比如咳嗽聲音,生成包絡(luò)模板,將采集到的現(xiàn)場(chǎng)聲音信號(hào)與模板進(jìn)行逐一匹配,從而實(shí)現(xiàn)聲音的分類(lèi)識(shí)別[10]。然而,這種方法中存在著一定弊端,其他類(lèi)型的聲音也有可能在包絡(luò)特征上與模板匹配,比如屠宰過(guò)程中,由于應(yīng)激而產(chǎn)生的短促叫聲與疾病引起的咳嗽聲,二者包絡(luò)匹配度也很高[15]。后期隨著聲音信號(hào)處理技術(shù)的發(fā)展,豬只聲音處理方法也逐漸進(jìn)步。2003年,Van Hirtum等Berckmans[9]利用模糊算法對(duì)豬只聲音進(jìn)行分析,數(shù)據(jù)集包含5 319條聲音,正確識(shí)別率為79%。Moshou等[13]利用線性預(yù)測(cè)編碼(Linear Predictive Coding,LPC)譜對(duì)生豬聲音進(jìn)行了處理,聲音識(shí)別率為87%。2008年,F(xiàn)errari等[16]通過(guò)分析豬聲音信號(hào)波形的均方根及峰值頻率,發(fā)現(xiàn)了正常豬與患有呼吸道疾病的聲音信號(hào)的差異性,從而識(shí)別出患病豬,對(duì)后續(xù)進(jìn)行的養(yǎng)殖場(chǎng)多種行為狀態(tài)的動(dòng)物聲音的差異性分析,提供了理論依據(jù)。同年,Exadaktylos等[17]對(duì)豬咳嗽的聲音信號(hào)進(jìn)行了功率譜密度分析,并用歐式距離衡量聲音信號(hào)的相似性,通過(guò)對(duì)應(yīng)的閾值設(shè)定,實(shí)現(xiàn)了豬咳嗽聲音信號(hào)的監(jiān)測(cè)。2013年,Chung等[18]通過(guò)對(duì)豬聲音信號(hào)進(jìn)行梅爾頻率倒譜系數(shù)(Mel-scale Frequency Cepstral Coefficients,MFCC)提取,并用支持向量機(jī)分類(lèi)算法對(duì)患有不同疾病的豬聲音信號(hào)進(jìn)行分類(lèi)識(shí)別,實(shí)現(xiàn)了對(duì)應(yīng)疾病與不同聲音信號(hào)的匹配,這為養(yǎng)殖場(chǎng)豬的患病狀況提供了有效參考。2016年,馬輝棟等[19]用語(yǔ)音識(shí)別中的端點(diǎn)檢測(cè)法進(jìn)行豬咳嗽聲音信號(hào)的檢測(cè),提出了用雙門(mén)限檢測(cè)法對(duì)豬咳嗽聲音信號(hào)進(jìn)行端點(diǎn)檢測(cè),有效提高了豬咳嗽聲音信號(hào)的檢測(cè)效率,有利于后期對(duì)豬咳嗽聲音信號(hào)的識(shí)別。
本研究旨在通過(guò)深度學(xué)習(xí)技術(shù)實(shí)現(xiàn)對(duì)豬只聲音分類(lèi)識(shí)別,以促進(jìn)福利化養(yǎng)殖、提升豬只的健康水平。通過(guò)對(duì)現(xiàn)場(chǎng)采集的聲音進(jìn)行分析,對(duì)多種類(lèi)別的聲音進(jìn)行聲音預(yù)加重、端點(diǎn)檢測(cè)、加窗分幀后,提取豬只聲音信號(hào)的多種特征參數(shù),通過(guò)分析研究豬只聲音信號(hào)的譜圖特征,探究適用于深層神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)的最優(yōu)譜圖生成方法,最終選擇MobileNetV2網(wǎng)絡(luò)模型作為試驗(yàn)基礎(chǔ)模型,改進(jìn)了網(wǎng)絡(luò)原有的優(yōu)化策略,并利用提取的聲音特征訓(xùn)練分類(lèi)模型,建立豬只聲音識(shí)別系統(tǒng),有效地識(shí)別豬只不同狀態(tài)的聲音。
1.1.1 試驗(yàn)場(chǎng)地
本研究的試驗(yàn)地點(diǎn)在河北省承德市某試驗(yàn)豬場(chǎng),試驗(yàn)豬的類(lèi)型為三元母豬杜長(zhǎng)大。數(shù)據(jù)采集時(shí)間從2017年3月至2017年6月。豬只采用群養(yǎng)方式飼養(yǎng)于1.8 m×5 m的豬欄內(nèi)。在試驗(yàn)期間內(nèi),豬舍平均溫度為22 ℃,最高溫度25.4 ℃,最低溫度18.6 ℃。自然光照時(shí)間從早晨7時(shí)至傍晚19時(shí)。試驗(yàn)設(shè)備采用吊裝的方式安裝于豬欄的中間位置(圖1)。
圖1 試驗(yàn)現(xiàn)場(chǎng)
1.1.2 數(shù)據(jù)采集
本研究使用的聲音數(shù)據(jù)均在養(yǎng)殖場(chǎng)的實(shí)際環(huán)境下,通過(guò)使用數(shù)據(jù)采集盒、筆記本電腦等設(shè)備采集得到。采集盒內(nèi)部的主要構(gòu)造為ReSpeaker Core v2.0開(kāi)發(fā)板(圖 2a),現(xiàn)場(chǎng)聲音數(shù)據(jù)傳輸存儲(chǔ)方式如圖2b所示。
采集數(shù)據(jù)為采樣率為16 KHz的單通道音頻,并以wav. 格式儲(chǔ)存于存儲(chǔ)設(shè)備中。為保證錄音效果以及得到可靠標(biāo)簽的聲音數(shù)據(jù),錄音過(guò)程中需要實(shí)時(shí)監(jiān)測(cè),并根據(jù)現(xiàn)場(chǎng)生豬的狀態(tài)對(duì)已錄的音頻進(jìn)行初步標(biāo)記,方便后續(xù)處理。采集的基本聲音類(lèi)型分為正常的哼叫聲、受驚嚇的尖叫聲、喂食前嚎叫聲。其中,正常聲音為生豬在無(wú)應(yīng)激反應(yīng)時(shí)正常哼叫狀態(tài)下采集得到。喂食前的聲音為飼養(yǎng)員在投喂飼料時(shí),豬由于看到食物產(chǎn)生應(yīng)激反應(yīng)而發(fā)出的聲音,類(lèi)似嚎叫聲。受驚嚇的聲音為生豬在打針、咬架、被追趕時(shí)發(fā)出的聲音,在采集這類(lèi)聲音時(shí),需要進(jìn)行強(qiáng)烈的人為刺激,因此采集難度較前2種更大,實(shí)際采集中也最為費(fèi)時(shí)。
1.1.3 數(shù)據(jù)集構(gòu)建
采集盒采集的聲音數(shù)據(jù),在一段音頻中可能存在多種狀態(tài)下的聲音、無(wú)效聲音段,并且音頻長(zhǎng)短不一,因此需要進(jìn)一步的進(jìn)行手工打標(biāo)簽及批量切分操作,以構(gòu)建試驗(yàn)所需的數(shù)據(jù)集。手工標(biāo)注使用的軟件為Audacity音頻處理軟件,操作界面如圖3所示。
圖2 數(shù)據(jù)采集方案
圖3 Audacity操作界面
使用Audacity為音頻標(biāo)記后,對(duì)標(biāo)簽后的音頻按類(lèi)別進(jìn)行批量切分,切分程序由Python編程實(shí)現(xiàn)?;谡Xi叫聲的周期性(0.5~1.8 s),確定2 s為音頻切分長(zhǎng)度,即切分后的每條樣本至少包含一個(gè)聲音的完整周期。
制作數(shù)據(jù)庫(kù)如表1所示,其列出的數(shù)據(jù)庫(kù)是切分處理后的所有音頻按8∶2隨機(jī)分為訓(xùn)練集和測(cè)試集,且每類(lèi)均勻分布,得到的最終試驗(yàn)數(shù)據(jù)庫(kù)。數(shù)據(jù)庫(kù)中包含正常哼叫、受驚嚇、喂食前狀態(tài)的聲音樣本,本研究主要討論這3種聲音的識(shí)別。
表1 數(shù)據(jù)庫(kù)音頻量分類(lèi)統(tǒng)計(jì)
在聲音信號(hào)的特征提取之前,需要先對(duì)聲音信號(hào)進(jìn)行預(yù)處理,這個(gè)過(guò)程對(duì)后面的特征提取、特征識(shí)別的效果都有重要的影響[20]。生豬聲音信號(hào)的預(yù)處理與語(yǔ)音信號(hào)處理中的預(yù)處理過(guò)程相似,包括聲音信號(hào)的預(yù)加重、分幀加窗、端點(diǎn)檢測(cè)。
1.2.1 預(yù)加重
在對(duì)豬音頻信號(hào)進(jìn)行處理之前,為了增強(qiáng)聲音信號(hào)的高頻分量,去除發(fā)音過(guò)程中口唇輻射效應(yīng)的影響,需要對(duì)音頻信號(hào)進(jìn)行預(yù)加重處理。預(yù)加重就是讓聲音信號(hào)通過(guò)一個(gè)數(shù)字濾波器,通過(guò)預(yù)加重,可以補(bǔ)償豬只聲音信號(hào)的高頻特性[21]。濾波器的階數(shù)為1,其傳遞函數(shù)如式(1)所示
1.2.2 加窗分幀
1.2.3 端點(diǎn)檢測(cè)
由于采集到的豬音頻信號(hào)中存在無(wú)效的聲音片段,即有噪聲段和無(wú)聲段的干擾。因此需要對(duì)豬音頻信號(hào)進(jìn)行端點(diǎn)檢測(cè),確定聲音的起點(diǎn)和終點(diǎn),以改善數(shù)據(jù)質(zhì)量同時(shí)為后續(xù)特征提取減少了運(yùn)算量,提高了計(jì)算效率。對(duì)豬的音頻信號(hào)進(jìn)行端點(diǎn)檢測(cè),本研究借鑒了語(yǔ)音信號(hào)處理中效果較好的雙門(mén)限檢測(cè)法,利用短時(shí)過(guò)零率和短時(shí)能量進(jìn)行信號(hào)分析[22]。算法計(jì)算步驟如下:
圖4 音頻波形圖
圖5 音頻譜圖
譜圖生成后,利用圖像處理領(lǐng)域中的深度卷積神經(jīng)網(wǎng)絡(luò)模型實(shí)現(xiàn)分類(lèi)識(shí)別。本研究采用的MobileNetV2網(wǎng)絡(luò)模型[26],該模型是在殘差網(wǎng)絡(luò)和MobileNetV1[24]網(wǎng)絡(luò)模型的基礎(chǔ)上提出的輕量級(jí)的深層神經(jīng)網(wǎng)絡(luò),在保證準(zhǔn)確度的同時(shí),大幅減少乘法和加法的計(jì)算量,從而降低模型參數(shù)和內(nèi)存占用,提高計(jì)算速度。MobileNetV2網(wǎng)絡(luò)模型基本的構(gòu)建塊是殘差瓶頸深度可分離卷積塊,網(wǎng)絡(luò)包含初始的32個(gè)卷積核的全卷積層,后接7個(gè)瓶頸層,網(wǎng)絡(luò)中使用ReLU6作為非線性激活函數(shù)。MobileNetV2網(wǎng)絡(luò)采用大小為3×3的卷積核,在訓(xùn)練時(shí)候利用丟棄(dropout)[27]和批標(biāo)準(zhǔn)化(batch normalization)技術(shù)防止過(guò)擬合。本研究中dropout取0.5。在訓(xùn)練開(kāi)始時(shí),隨機(jī)地“刪除”一般的隱層單元,保持輸入層不變,更新網(wǎng)絡(luò)的權(quán)值,依次迭代,每次迭代過(guò)程中,隨機(jī)的“刪除”一般隱層單元,直至訓(xùn)練結(jié)束。MobileNetV2網(wǎng)絡(luò)模型的詳細(xì)模型結(jié)構(gòu)如表2所示。
表2 MobileNetV2 網(wǎng)絡(luò)模型結(jié)構(gòu)[26]
注:表示樣本的類(lèi)別數(shù)。
Note:represents the number of categories of samples.
在相同數(shù)據(jù)且除優(yōu)化器本身參數(shù)外其余參數(shù)相同的情況下,分別用RMSprop優(yōu)化器和Adam優(yōu)化器進(jìn)行對(duì)比試驗(yàn),其結(jié)果如圖6所示。
圖6 2種優(yōu)化算法下模型損失函數(shù)變化
在本研究試驗(yàn)中,模型訓(xùn)練的所用的軟硬件平臺(tái)如下:
CPU:Core i7-8700K
內(nèi)存:16GB DDR4
GPU:NVIDIA GeForce GTX 1080Ti
系統(tǒng)平臺(tái):Ubuntu 16.04 LTS
軟件環(huán)境:Tensorflow 1.8.0、Cuda 9.0、Cudnn 7.0、Anaconda3.
由于本試驗(yàn)的數(shù)據(jù)規(guī)模比較小,過(guò)訓(xùn)練的現(xiàn)象不會(huì)出現(xiàn),因此本研究試驗(yàn)中數(shù)據(jù)集直接分為訓(xùn)練集和測(cè)試集2個(gè)部分。為確定最優(yōu)的譜圖參數(shù),包括窗長(zhǎng)和窗移參數(shù),每次訓(xùn)練集和測(cè)試集以8∶2隨機(jī)分配,測(cè)試不同參數(shù)生成的譜圖對(duì)識(shí)別率精度的影響。每組參數(shù)都進(jìn)行了5次獨(dú)立試驗(yàn),試驗(yàn)采用標(biāo)準(zhǔn)的MobileNetV2網(wǎng)絡(luò)模型,輸入圖像尺寸為224×224,由試驗(yàn)結(jié)果可見(jiàn)(圖 7),不同類(lèi)別的譜圖對(duì)模型性能有一定影響,在多次試驗(yàn)下發(fā)現(xiàn)256點(diǎn)FFT、1/2窗移下的譜圖訓(xùn)練模型識(shí)別效果最好,進(jìn)而將譜圖類(lèi)別與平均準(zhǔn)確率繪制成折線圖,如圖8所示。
注:FFT表示信號(hào)的快速傅里葉變化,下同。
圖8 各類(lèi)譜圖平均識(shí)別率統(tǒng)計(jì)圖
由圖8可知,相同窗長(zhǎng)參數(shù),1/2窗移參數(shù)的譜圖訓(xùn)練所得模型識(shí)別率更優(yōu);相同窗移參數(shù),256點(diǎn)FFT的譜圖訓(xùn)練所得模型識(shí)別率更優(yōu),即頻率分辨率較高的譜圖表現(xiàn)更好。綜上,通過(guò)譜圖的優(yōu)化試驗(yàn),較標(biāo)準(zhǔn)MobileNetV2網(wǎng)絡(luò)模型結(jié)果,優(yōu)化后模型分類(lèi)準(zhǔn)確率提高1.8%。優(yōu)化后的模型最終總體識(shí)別率為97.3%。對(duì)最優(yōu)模型在測(cè)試集上各類(lèi)別的識(shí)別率進(jìn)行分類(lèi)統(tǒng)計(jì)(表 3),表中測(cè)試樣本數(shù)目、正確識(shí)別數(shù)目及識(shí)別率都是5次試驗(yàn)的平均值。
由于未來(lái)應(yīng)用場(chǎng)景中,豬舍聲音監(jiān)控模塊集成在一個(gè)小型手持設(shè)備中,對(duì)硬件設(shè)備的內(nèi)存和運(yùn)行速度有一定的限定條件,因此在保證精度的前提下,希望獲得更快更小的模型。本研究通過(guò)對(duì)寬度因子和分辨率因子的試驗(yàn),在標(biāo)準(zhǔn)MobileNetV2網(wǎng)絡(luò)模型的基礎(chǔ)上訓(xùn)練定義出更小更有效的模型。
表3 測(cè)試集識(shí)別結(jié)果統(tǒng)計(jì)
進(jìn)行分辨率因子調(diào)整后的網(wǎng)絡(luò)計(jì)算量如式(18)所示
進(jìn)行寬度因子調(diào)整前后的網(wǎng)絡(luò)計(jì)算量之比如式(19)所示
進(jìn)行寬度因子、分辨率因子調(diào)整前后的網(wǎng)絡(luò)計(jì)算量之比如式(21)所示
不同寬度因子和分辨率因子下的試驗(yàn)結(jié)果如表4所示。由表中試驗(yàn)結(jié)果可知,網(wǎng)絡(luò)寬度越小、譜圖的分辨率越低,模型的大小越小,速度越快,同時(shí)識(shí)別率有一定損失。分析可知,在識(shí)別率損失0.5%以?xún)?nèi)時(shí),模型運(yùn)行速度可有3到4倍的提升;識(shí)別率損失1.0%以?xún)?nèi)時(shí),模型速度可有7~10倍的提升。通過(guò)對(duì)寬度因子和分辨率因子試驗(yàn),對(duì)網(wǎng)絡(luò)結(jié)構(gòu)進(jìn)行調(diào)整,可求得模型速度和精度的權(quán)衡,滿足適應(yīng)實(shí)際應(yīng)用中的需求。實(shí)際應(yīng)用中,可根據(jù)應(yīng)用場(chǎng)景的不同需求進(jìn)行選擇。試驗(yàn)結(jié)果表明,壓縮后的模型,在損失很小的精度的情況下,模型大小大大減小,模型運(yùn)行速度顯著提升。
選取支持向量機(jī)(Support Vector Machine,SVM)以及梯度提升決策樹(shù)(Gradient Boosting Decision Tree,GBDT)、隨機(jī)森林(Random Forest,RF)、極端隨機(jī)樹(shù)(Extra Trees,ET)算法分別進(jìn)行了豬聲音數(shù)據(jù)集的訓(xùn)練和測(cè)試,將測(cè)試結(jié)果與本研究分類(lèi)網(wǎng)絡(luò)結(jié)果進(jìn)行對(duì)比分析。
為了方便分析,測(cè)試結(jié)果以多分類(lèi)混淆矩陣的形式表示(表5)?;煜仃囍械拿恳涣写順颖绢A(yù)測(cè)值,每一行代表樣本的真實(shí)值,混淆矩陣可以反映出識(shí)別模型的性能。由混淆矩陣的性質(zhì)可知,對(duì)角線元素為正確識(shí)別的樣本,非對(duì)角線元素為錯(cuò)判樣本。對(duì)測(cè)試集的識(shí)別結(jié)果進(jìn)行統(tǒng)計(jì),各算法模型下得到的混淆矩陣如下列表格所示,所有結(jié)果均取模型的最優(yōu)結(jié)果進(jìn)行對(duì)比。
表4 不同寬度因子和分辨率因子下的試驗(yàn)結(jié)果統(tǒng)計(jì)
表5 4種算法模型下試驗(yàn)結(jié)果的混淆矩陣統(tǒng)計(jì)
通過(guò)對(duì)4種模型在測(cè)試集上的混淆矩陣進(jìn)行初步分析可以看出,各個(gè)模型對(duì)正常狀態(tài)的樣本識(shí)別率高,而對(duì)受驚嚇和喂食前狀態(tài)的樣本識(shí)別率較低。
通過(guò)對(duì)以上試驗(yàn)結(jié)果進(jìn)一步分析,可以得出以下幾點(diǎn)結(jié)論:
1)較SVM,3種集成學(xué)習(xí)算法識(shí)別率更優(yōu);
2)3類(lèi)集成學(xué)習(xí)算法中,ET算法模型對(duì)豬聲音識(shí)別效果最好,RF模型次之,且優(yōu)于GBDT模型;
3)4種模型對(duì)不同類(lèi)別聲音的識(shí)別率差異明顯,對(duì)正常狀態(tài)的樣本的識(shí)別率高,對(duì)受驚嚇和喂食前狀態(tài)的樣本的識(shí)別率較低。
基于上述試驗(yàn)得出的最優(yōu)MobileNetV2網(wǎng)絡(luò)模型參數(shù)以及譜圖的最優(yōu)參數(shù)基礎(chǔ)上,利用相同的數(shù)據(jù)集合,將本研究提出的網(wǎng)絡(luò)與上述4種算法進(jìn)行了識(shí)別精度的對(duì)比分析(圖9)。
圖9 各模型對(duì)測(cè)試集樣本識(shí)別率分類(lèi)統(tǒng)計(jì)圖
針對(duì)豬只音頻識(shí)別中集成學(xué)習(xí)模型和支持向量機(jī)(Support Vector Machine,SVM)模型類(lèi)間識(shí)別率差異明顯,對(duì)受驚嚇和喂食前狀態(tài)聲音樣本識(shí)別率較低的問(wèn)題,本研究提出了深層神經(jīng)網(wǎng)絡(luò)結(jié)合譜圖的識(shí)別方法,使用手工制作的數(shù)據(jù)集對(duì)模型進(jìn)行了設(shè)計(jì)、實(shí)現(xiàn)及和優(yōu)化。模型以MobileNetV2網(wǎng)絡(luò)為基礎(chǔ),改進(jìn)了其網(wǎng)絡(luò)原有的優(yōu)化策略,提升了模型性能。此外,本研究進(jìn)一步的從譜圖的生成方式以及網(wǎng)絡(luò)結(jié)構(gòu)調(diào)整這2個(gè)方面來(lái)及進(jìn)行模型優(yōu)化。在標(biāo)準(zhǔn)MobileNetV2網(wǎng)絡(luò)上初步訓(xùn)練所得模型準(zhǔn)確率95.5%,通過(guò)譜圖優(yōu)化試驗(yàn),模型識(shí)別性能提升了1.8%,即最終訓(xùn)練得到的模型識(shí)別率為97.3%,且模型對(duì)各類(lèi)別識(shí)別率都很高,克服了集成學(xué)習(xí)分類(lèi)器存在的問(wèn)題。進(jìn)一步的,通過(guò)寬度因子和分辨率因子試驗(yàn),在標(biāo)準(zhǔn)MobileNetV2模型的基礎(chǔ)上定義了更小更有效的模型,通過(guò)損失很小精度來(lái)顯著提升模型速度,滿足實(shí)際應(yīng)用中的需求。
[1] 黎煊,趙建,高云,等. 基于深度信念網(wǎng)絡(luò)的豬咳嗽聲識(shí)別[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2018,49(3):179-186. Li Xuan, Zhao Jian, Gao Yun, et al. Recognition of pig cough sound based on deep belief nets[J]. Transactions of the Chinese Society for Agricultural Machinery, 2018, 49(3): 179-186. (in Chinese with English abstract)
[2] Gutiérrez A, González C, Jiménez-Leube J, et al. A heterogeneous wireless identification network for the localization of animals based on stochastic movements[J]. Sensors. 2009, 9(5): 3942-3957.
[3] Handcock R N, Swain D L, Bishop-Hurley G J, et al. Monitoring animal behavior and environmental interactions using wireless sensor networks, GPS collars and satellite remote sensing[J]. Sensors. 2009, 9(5): 3586-3603.
[4] Hwang J, Yoe H. Study of the ubiquitous hog farm system using wireless sensor networks for environmental monitoring and facilities control[J]. Sensors. 2010, 10(12): 10752-10777.
[5] Yeon S C, Lee H C, Chang H H, et al. Sound signature for identification of tracheal collapse and laryngeal paralysis in dogs[J]. Journal of Veterinary Medical Science. 2005, 67(1): 91-95.
[6] Jahns G, Kowalczyk W, Walter K. Sound analysis to recognize animal conditions and individuals[C]//Annual Meeting National Mastitis Council, New York, USA, 1998: 228-235.
[7] Moi M, N??s I A, Caldara F R, et al. Vocalization as a welfare indicative for pigs subjected to stress situations[J]. Arquivo Brasileiro de Medicina Veterinária e Zootecnia, 2015, 67(3): 837-845.
[8] Mucherino A, Papajorghi P, Pardalos P. Data Mining in Agriculture[M]. New York: Springer, 2009.
[9] Van Hirtum A, Berckmans D. Fuzzy approach for improved recognition of citric acid induced piglet coughing from continuous registration[J]. Journal of Sound and Vibration, 2003, 266(3): 677-686.
[10] Moreaux B, Nemmar A, Beerens D, et al. Inhibiting effect of ammonia on citric acid-induced cough in pigs: A possible involvement of substance P[J]. Pharmacology & Toxicology. 2000, 87(6): 279-285.
[11] Chedad A, Moshou D, Aerts J M, et al. AP-animal production technology: Recognition system for pig cough based on probabilistic neural networks[J]. Journal of Agricultural Engineering Research, 2001, 79(4): 449-457.
[12] Marchant J N, Whittaker X, Broom D M. Vocalizations of the adult female domestic pig during a standard human approach test and their relationships with behavioral and heart rate measures[J]. Applied Animal Behavior Science. 2001, 72(1): 23-39.
[13] Moshou D, Chedad A, Van Hirtum A, et al. An intelligent alarm for early detection of swine epidemics based on neural networks[J]. Transactions of the American Society of Agricultural and Biological Engineers, 2001, 44(1): 167-174.
[14] Moura D J, Silva W T, Naas I A, et al. Real time computer stress monitoring of piglets using vocalization analysis[J]. Computers & Electronics in Agriculture, 2008, 64(1): 11-18.
[15] Van Compernolle D, Janssens S, Geers R, et al. Welfare monitoring of pigs by automatic speech processing[C]// Proceedings 12thCongress of the International Pig Veterinary Society. Hague, Netherlands, 1992: 570-571.
[16] Ferrari S, Silva M, Guarino M, et al. Cough sound analysis to identify respiratory infection in pigs[J]. Computers and Electronics in Agriculture, 2008, 64(2): 318-325.
[17] Exadaktylos V, Silva M, Ferrari S, et al. Real-time recognition of sick pig cough sounds[J]. Computers and Electronics in Agriculture, 2008, 63(2): 207-214.
[18] Chung Yongwha, Oh S, Lee J, et al. Automatic detection and recognition of pig wasting diseases using sound data in audio surveillance systems[J]. Sensors, 2013, 13(10): 12929-12942.
[19] 馬輝棟,劉振宇. 語(yǔ)音端點(diǎn)檢測(cè)算法在豬咳嗽檢測(cè)中的應(yīng)用研究[J]. 山西農(nóng)業(yè)大學(xué)學(xué)報(bào):自然科學(xué)版, 2016, 36(6):445-449. Ma Huidong, Liu Zhenyu. Application of end point detection in pig cough signal detection[J]. Journal of Shanxi Agricultural University: Nature Science Edition, 2016, 36(6): 445-449. (in Chinese with English abstract)
[20] 張彩霞,武佩,宣傳忠,等. 母羊聲音信號(hào)處理與識(shí)別系統(tǒng)的設(shè)計(jì)[J]. 內(nèi)蒙古農(nóng)業(yè)大學(xué)學(xué)報(bào):自然科學(xué)版, 2013, 34(5):145-149. Zhang Caixia, Wu Pei, Xuan Chuanzhong, et al. Design of acoustic signal processing and recognition system for the ewe[J]. Journal of Inner Mongolia Agricultural University: Natural Science Edition, 2013, 34(5): 145-149. (in Chinese with English abstract)
[21] 胡明輝. 基于支持向量機(jī)和HMM的音頻信號(hào)分類(lèi)算法研究[D]. 長(zhǎng)春:長(zhǎng)春工業(yè)大學(xué), 2015. Hu Minghui. Automatic Audio Stream Classification Based on Hidden Markov Model and Support Vector Machine[D]. Changchun: Changchun University of Technology, 2015. (in Chinese with English abstract)
[22] 許樂(lè)靈,胡石. 一種引導(dǎo)濾波自適應(yīng)雙閾值優(yōu)化邊緣檢測(cè)算法[J]. 南京理工大學(xué)學(xué)報(bào),2018, 42(2):177-182. Xu Leling, Hu Shi. Adaptive double threshold modified edge detection algorithm for boot filtering[J]. Journal of Nanjing University of Science and Technology, 2018, 42(2): 177-182. (in Chinese with English abstract)
[23] Lipovskii A A, Shustova O V, Zhurikhina V V, et al. On the modeling of spectral map of glass-metal nanocomposite optical nonlinearity[J]. Optics Express, 2012, 20(11): 12040-12047
[24] Howard A G, Zhu M, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. 2017, https: //arxiv. org/abs/1704. 04861.
[25] Khaliq A, Ehsan S, Milford M, et al. A holistic visual place recognition approach using lightweight CNNs for severe viewpoint and appearance changes[EB/OL]. 2018, https: //arxiv. org/abs/1811. 03032.
[26] Sandler M, Howard A G, Zhu Minglong, et al. MobileNetV2: Inverted residuals and linear bottlenecks[C]//The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake, USA, IEEE, 2018: 4510-4520.
[27] Hinton G E, Srivastava N, Krizhevsky A, et al. Improving neural networks by preventing co-adaptation of feature detectors[EB/OL]. 2012, https: //arxiv. org/abs/1207. 0580.
[28] Kingma D P, Ba J. Adam: A method for stochastic optimization[EB/OL]. 2015, https: //arxiv. org/abs/1412. 6980.
Classification of pig sounds based on deep neural network
Cang Yan, Luo Shunyuan, Qiao Yulong
(,,150001,)
Pig sounds reflect the stress and health status of pigs, also it is the most easily collected biomarker through non-contact methods. To improve the classification accuracy of pig sound signals, this study used the spectrogram to visualize the time-frequency characteristics, and combined with the deep neural network classification model. Four contents were discussed as followed: 1) The sound data set was constructed. According to the different sound signals, the pig's behavior could be recognized by the classification network. When the pig was in normal statuses, the pig sounds were called as grunts. If the pig was in frightened statuses, such as injected or chased, pig sounds were defined as screams. Before the feeding, when pigs see the food, pigs made long irritable sounds. The sounds were called as howls of hunger. All pig sounds were collected on-farm by the sound collection box. On the farm, a laptop was used as a host computer to display all the working parameters of the collection box. The data transmission and storage scheme adopted the Client/Server architecture. Besides, the worker labeled sounds, according to the behavior. 2) Spectrograms of different sounds built up the training and test dataset of the image recognition network. The pig sound was a stationary signal in short time duration, therefore, continuously calculating the frequency spectrum of the sound signal in the vicinity of the selected instant of time gave rise to a time-frequency spectrum. The study discussed the optimal spectrogram parameters, which were suitable for the structure of the deep neural network. Experiment results showed that the segment length of the pig sounds was 256 samples and the overlap was 128 samples, the classification accuracy of the deep neural network was highest. The spectrogram optimization experiment results showed that the recognition accuracy was improved by 1.8%. 3) The deep neural network was designed. The study used the MobileNetV2 network to achieve recognition, which was based on an inverted residual structure where the shortcut connections were between the thin bottleneck layers. Aiming to the portable platform in the real application, the width factor and the resolution factor were introduced to define a smaller and more efficient architecture. Also, Adam optimizer formed an adequate substitute for the underlying RMSprop optimizer, and it made the loss function convergent faster. Adam optimizer calculated the adaptive parameter-learning rate based on the mean value of the first moment, making full use of the mean value of the second moment of the gradient. The result implied the width factor was chosen as 0.5, the accuracy was highest. 4) Compared experiments had been done. Support Vector Machine (SVM), Gradient Boosting Decision Tree (GBDT), Random Forest (RF), and Extra Trees (ET) algorithms were compared with the proposed pig sound recognition network. All algorithms were trained and tested on the same sound dataset. Specifically, the proposed algorithm increased the recognition accuracy of screams from 84.5% to 97.1%, and the accuracy of howls was increased from 86.1% to 97.5%. But the recognition accuracy of grunts was decreased from 100% to 97.3%. This was caused by the difference in the principle of different recognition algorithms. Furthermore, through the experiments on the width factor and resolution factor, a smaller and more efficient model was defined based on the standard MobileNetV2 model, and the running speed of the model was significantly improved to meet the needs of practical applications, however, the accuracy remained. This study showed that the abnormal pig vocalization was related to abnormal behavior, so sound recognition could help to monitor behaviors. In the future, the abnormal behaviors combined the sound recognition and video analysis would be discussed.
signal processing; acoustic signal; recognition; deep learning; pig sounds; MFCC; classification
蒼巖,羅順元,喬玉龍. 基于深層神經(jīng)網(wǎng)絡(luò)的豬聲音分類(lèi)[J]. 農(nóng)業(yè)工程學(xué)報(bào),2020,36(9):195-204.doi:10.11975/j.issn.1002-6819.2020.09.022 http://www.tcsae.org
Cang Yan, Luo Shunyuan, Qiao Yulong. Classification of pig sounds based on deep neural network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(9): 195-204. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2020.09.022 http://www.tcsae.org
2019-12-16
2020-03-16
國(guó)家自然科學(xué)基金(61871142)
蒼巖,博士,講師,主要從事智能信息處理研究。Email:cangyan@hrbeu.edu.cn
10.11975/j.issn.1002-6819.2020.09.022
TP391.4
A
1002-6819(2020)-09-0195-10