胡曉華,劉 偉,劉長虹,錢赟惠
?
基于太赫茲光譜和支持向量機快速鑒別咖啡豆產地
胡曉華1,劉 偉2,3※,劉長虹3,錢赟惠3
(1. 合肥工業(yè)大學計算機與信息學院,合肥 230009; 2. 合肥學院機器視覺與智能控制實驗室,合肥 230601; 3. 合肥工業(yè)大學食品科學與工程學院,合肥 230009)
結合太赫茲時域光譜技術和支持向量機對3種典型產地的咖啡豆進行了鑒別。選取埃塞俄比亞(Ethiopia)、哥斯達黎加(Costa Rica)以及印度尼西亞(Indonesia)3個產地咖啡豆樣品進行壓片處理,采用太赫茲透射模式獲取樣品的時域和頻域光譜信號,并用主成分分析法對太赫茲頻域光譜信號進行分析;構造了基于粒子群(partical swarm optimization,PSO)參數(shù)尋優(yōu)的支持向量機(support vector machine,SVM)鑒別模型,模型對不同產地咖啡豆樣品的綜合識別正確率達到95%。試驗結果表明,太赫茲作為新型的檢測手段結合模式識別方法可用于咖啡豆的產地鑒別。該文為一類在太赫茲波段下沒有明顯特征吸收峰的農產品/食品安全檢測和產地追溯研究提供了一種快速、準確的方法。
光譜學;模型;支持向量機;咖啡豆;太赫茲;粒子群算法
咖啡是世界3大飲料作物之一,其產量、銷售量、消費量均居世界3大飲料植物之首。近年來中國咖啡的進口量增長迅速,年均增長率超過10%,已成為重要的大宗進口消費品??Х榷故侵谱骺Х鹊闹饕牧希壳笆澜缟峡Х榷沟姆N植主要集中在拉丁美洲、非洲、亞洲等的熱帶發(fā)展中國家,如印度利西亞、埃塞俄比亞、巴西、哥倫比亞、哥斯達黎加等。不同產地的咖啡豆其外觀色澤、氣味以及內部化學成分存在較大差異,是影響咖啡品質的重要因素[1-3]。目前,咖啡豆的產地鑒別主要采用人工感官評定法或化學分析法[4-6],存在方法繁瑣、主觀性強、效率低下等缺點。因此,如何快速準確地鑒別咖啡豆產地,保障咖啡品質,規(guī)范咖啡市場,是中國咖啡產業(yè)亟待解決的重要問題之一。
太赫茲(Terahertz,THz)是指頻率在0.1~10 THz范圍內的電磁波,研究表明,大量有機大分子(DNA、蛋白質等)的振動能級和轉動能級之間的躍遷在THz波段,因此太赫茲光譜包含了檢測對象豐富的物理、化學和構象信息[7-11]。近年來太赫茲時域光譜(THz-TDS)技術作為一種迅速發(fā)展的無損檢測新技術,因其具有穿透能力強、安全性好、靈敏度高和動態(tài)范圍寬等特點,在食品安全檢測以及農產品質量控制等方面表現(xiàn)出了較強的技術優(yōu)勢和廣泛的應用前景[12-19]。但目前太赫茲在農產品/食品領域的研究多是針對具有特征吸收峰的單一化學成分的檢測,在沒有光譜特征吸收峰的復雜生物體系中,太赫茲光譜特征往往分布于某些波段范圍內,會造成光譜特征的高維性和不確定性等問題。因此,應用太赫茲時域光譜技術進行農產品/食品這一復雜生物體的檢測尚處于探索階段。
本文針對咖啡豆產地的快速鑒別問題,應用太赫茲時域光譜系統(tǒng)獲取典型產地咖啡豆樣品在太赫茲波段下的時域和頻域光譜信息,通過主成分分析(principal component analysis,PCA)法降低光譜特征維度,通過粒子群(partical swarm optimization,PSO)算法進行模型參數(shù)優(yōu)化,采用支持向量機(support vector machine,SVM)構建基于太赫茲光譜技術的鑒別模型,以期為咖啡豆產地的快速鑒別提供一種新方法,同時為太赫茲在農產品/食品中的檢測應用做出探索。
1.1 試驗裝置及原理
設備采用TAS7500TS HF1 THz光譜系統(tǒng)(Advantest Co., Ltd, JAPAN),儀器光路示意圖如圖1所示。試驗采用透射模式,激光脈沖射出后經分光分束器CBS分為泵浦光與探測光。泵浦光入射至砷化鎵(GaAs)襯底的光電導天線上,激發(fā)THz輻射;探測光與THz脈沖一同聚焦在電光晶體碲化鋅(ZnTe)上,其中THz脈沖會被吸收同時受到色散效應影響發(fā)生幅值和相位的變化,包含樣品信息的THz波將聚焦在探測晶體上。系統(tǒng)通過掃描獲取THz脈沖和探測激光脈沖的相對時間延遲,利用探測光的光電效應對THz脈沖電場強度進行取樣測量,從而獲取測量樣品的THz時域信號波形,經快速傅里葉變換(FFT)得到頻域信號。TAS7500TS HF1的頻率范圍為0.1~4 THz,光譜分辨率為7.6 GHz,激光發(fā)射器平均功率20 mW,脈沖中心波長為1 550 nm,脈沖寬度為50 fs,激光重復率為50 MHz±200 Hz。試驗在室溫下進行,溫度為25 ℃,試驗全過程使用空氣壓縮泵對測量的空間環(huán)境進行干燥,減少空氣中水分對測量結果影響,提高信噪比。
1.2 試驗材料
選取由星巴克合肥分公司提供的3個典型產區(qū)(埃塞俄比亞(Ethiopia)、哥斯達黎加(Costa Rica)以及印度尼西亞(Indonesia))的咖啡豆為試驗樣品。所有樣品均在干燥器中存放(密封、避光的環(huán)境中)。試驗編程軟件采用Matlab 2011a,試驗前將埃塞俄比亞、印度尼西亞以及哥斯達黎加3個產地的咖啡豆樣本各隨機抽取40個共120個作為建模集,剩余各20個共60個作為預測集。使用粉碎機對不同產地的所有咖啡豆樣品進行粉碎預處理,粉碎后的樣品經孔徑0.074 mm的篩子過濾,然后使用壓片機將粉末樣品進行壓片處理,用10 MPa的壓力壓制成厚度約為1 mm,直徑為13 mm、內部均勻、上下表面互相平行的薄片,每種咖啡豆各制成60個壓片樣品。
1.3 光譜獲取與分析
試驗前將TAS7500TS HF1預熱半小時,以鋼制背景板為系統(tǒng)標定板,調節(jié)螺旋測微器獲取最佳焦點。將壓制好的樣品片放置于TAS7500TS HF1系統(tǒng)的聚乙烯樣品臺上,掃描得到樣品的透射光譜圖像。為減少測量誤差,對同一樣品均從不同位置測量3次,取平均值作為樣品的光譜信號,3種咖啡豆樣品及鋼制背景的時域光譜圖如圖2所示。
從圖2a可以看出,3種樣品與背景板的太赫茲時域光譜信號在幅值與相位上均有明顯差異,但樣品之間的差異相對較小。在太赫茲透射的頻域幅值上,哥斯達黎加咖啡豆的幅值整體上高于埃塞俄比亞和印度尼西亞的咖啡豆,而埃塞俄比亞與印度尼西亞咖啡豆在幅值上相當;在相位上,相對于參考背景哥斯達黎加咖啡豆約在16.3 ps產生波峰,滯后最小,印度尼西亞咖啡豆約在16.9 ps產生波峰,埃塞俄比亞咖啡豆約在17.3 ps產生波峰,滯后最大。通過對時域光譜信號進行快速傅里葉變換(FFT)得到樣品的太赫茲透射頻域光譜,如圖2b所示。由圖2b可知,太赫茲信號的有效光譜頻域區(qū)域位于0.2~1.5 THz內,3種咖啡豆的頻譜曲線趨勢一致,不同頻率點下的透射能量有所差異,但在部分頻率點上存在交叉,與很多復雜生物體一樣,咖啡豆也沒有明顯特征吸收峰[20-21]。在0.2~1.5 THz范圍內,共有171個頻率點,高維的太赫茲光譜特征在帶來豐富信息的同時,部分與樣品品質相關性較弱甚至無關的信息會影響建模效果。因此,本文首先應用主成分分析方法,對不同產地咖啡豆的太赫茲光譜特征進行降維并對鑒別效果進行定性分析。
1.4 基于主成分的咖啡豆產地鑒別分析
主成分分析法是對多個變量間相關性進行分析的一種多元統(tǒng)計方法,通過正交變換將一組可能存在相關性的變量轉換為一組線性不相關的變量。通過主成分分析所得新變量在減少變量數(shù)目的同時,盡可能保持了原有的特征信息。本文運用主成分分析法對3個不同產區(qū)共180個咖啡豆在0.2~1.5 THz頻域范圍內的光譜數(shù)據進行處理,選取前3個主成分所得三維得分分布圖如圖3所示。
Fig3 3D principal component analysis diagram of 3 kinds coffee bean samples from different producing areas
從圖3可以看出,3種咖啡豆具有較好的聚類效果,其中埃塞俄比亞與印度尼西亞咖啡豆之間沒有相互交錯,而哥斯達黎加咖啡豆與前兩者均有交錯,這與咖啡豆在太赫茲波段下沒有特征吸收峰,光譜特征分布較廣有關。從圖3還可以看出前3個主成分的累積貢獻率為68.75%(PC1、PC2、PC3的貢獻率分別為31.22%、30.31%、7.22%),不能完全包含太赫茲有效波段下的信息。為得到基于太赫茲光譜的咖啡豆產地識別最優(yōu)主成分數(shù),本文參考文獻[22],應用偏最小二乘判別(PLSDA)方法,選取前3、4、…、50個主成分(前50個主成分的累計貢獻率可達98.96%)分別進行咖啡豆產地鑒別。結果表明,在選取前3至20個主成分時,鑒別正確率處于上升趨勢,大于20后鑒別效果開始下降。因此,本文選取前20個主成分(累積貢獻率為96.36%)作為建模的特征輸入量。
2.1 支持向量機
支持向量機是一種基于有限樣本統(tǒng)計學習理論的有監(jiān)督機器學習方法,通過非線性映射將輸入變量映射到一個高維的特征向量空間,并在高維空間構造最優(yōu)分類超平面,較好解決了小樣本、非線性、高維數(shù)、局部極小點等問題[23-25]。SVM回歸用一個非線性映射函數(shù)將數(shù)據映射到高維特征空間,在高維特征空間進行線性回歸,依據結構風險最小化(structural risk minimization, SRL)原則,將其學習過程轉化為凸優(yōu)化問題,即
(2)
式中為核寬度參數(shù),>0。
對于RBF核函數(shù)的SVM,有2個參數(shù)需要優(yōu)化,即邊界參數(shù)和核參數(shù),這2個參數(shù)對SVM的分類性能具有相當大的影響[25]。其中邊界參數(shù)是SVM模型對結構風險和樣本無誤差的折中,與可容忍的誤差相關;核參數(shù)反映了數(shù)據樣本在高維特征空間中分布的復雜程度,決定了線性分類面的復雜度。目前在采用交叉驗證(cross validation,CV)的方法下,用網格劃分能夠找到CV意義下的最高預測準確率,即全局最優(yōu)解,但過程比較耗時。粒子群優(yōu)化算法基于群體智能優(yōu)化理論,通過群體中粒子間的合作與競爭產生的群體智能指導優(yōu)化搜索。在本文中為了能夠在更大范圍內尋找最佳的參數(shù)和,提高搜索效率,采用了基于粒子群尋優(yōu)的支持向量機建模方法。
2.2 粒子群算法
粒子群優(yōu)化算法[26]是一種具有很強全局尋優(yōu)能力的群智能優(yōu)化算法,在一個維的目標搜索空間,由個粒子組成一個種群{1,2,…,Z},其中每個粒子所處的位置Z={Z1,Z2,…,Z}都表示問題的潛在的一個解,并依據目標函數(shù)計算每個粒子的適應度。然后每個粒子都在解空間中迭代搜索,不斷調整自己的位置搜索新解[27]。在每次尋優(yōu)迭代過程中,粒子根據式(4)和(5)進行位置Z和速度V={V1,V2…, V}的更新。
(5)
2.3 基于PSO參數(shù)優(yōu)化的支持向量機分類模型
構建基于PSO參數(shù)優(yōu)化的支持向量機分類模型的具體步驟如下。
1)采用對3類咖啡豆太赫茲光譜進行主成分分析所得前20個主成分變量作為咖啡豆產地鑒別的特征向量,設置SVM模型參數(shù)的搜索范圍和初始化粒子群的相關參數(shù),如種群規(guī)模、學習因子、慣性權重、最大迭代次數(shù)等;
2)初始化粒子群。隨機產生邊界參數(shù)和核寬度參數(shù)的值作為每個粒子的初始位置,同時隨機初始化每個粒子的初始速度。
3)計算每個粒子的當前適應度。定義適應度函數(shù)如式(5),通過對訓練樣本的學習訓練,得到各個粒子的正確分類數(shù),用以計算各個粒子的適應度函數(shù)值。
式中T和分別表示正確分類的樣本個數(shù)和樣本總數(shù)。
4)計算每個粒子的當前適應值(Z),并與該粒子當前自身的最優(yōu)適應值(P)進行比較,如果(Z)<(P),則調整(P)(Z),將當前位置作為此刻該粒子的最優(yōu)位置。
5)將每一個粒子自身當前最優(yōu)位置的適應值(P)與所有粒子當前最優(yōu)位置的適應值(P)進行比較,若(P)(P),則調整(P)(P),將調整后的位置作為所有粒子的最優(yōu)位置。
6)利用PSO的進化方程(4)、(5)調整粒子的速度和位置,進而得到支持向量機的參數(shù)。
7)判斷是否滿足給定的最大迭代次數(shù),如果滿足則停止尋優(yōu),并返回當前最優(yōu)的SVM模型參數(shù)和;否則轉到步驟3)。
8)將最優(yōu)參數(shù)代入SVM模型,對測試樣本集進行有效的分類。
輸入特征向量選用0.2~1.5 THz太赫茲頻域光譜的前20個主成分,模型參數(shù)選擇采用粒子群算法進行優(yōu)化。試驗過程中先對粒子群進行參數(shù)初始化,參考文獻[28-30]中的研究結果,PSO算法中的種群粒子設為50,學習因子1=2=1.5;設定變權重取為起始值strat=0.9,終止值end=0.4;,的搜索范圍為?[2-2,22],?[2-2,22],步長為20.5;終止迭代次數(shù)為100。最終通過試驗,經過粒子群尋優(yōu)算法得到支持向量機的最優(yōu)參數(shù)結果為=1.393 66,=0.01。
為驗證PSO-SVM分類方法的優(yōu)越性,將PSO-SVM方法與最小二乘支持向量機(least-square-support vector machine, LS-SVM)[31]、反向神經網絡算法(back propagation neural network, BPNN)[32]進行比較,結果如表1所示。從表中可以看出,3種算法對不同產地咖啡豆的鑒別效果都在80%以上,說明不同產地的咖啡豆在太赫茲波段下存在明顯差異,太赫茲光譜技術可用于咖啡豆產地的鑒別;同時3種模型中,支持向量機的鑒別效果明顯優(yōu)于BPNN,而經過PSO參數(shù)優(yōu)化的SVM分類效果優(yōu)于LS-SVM。其中通過PSO-SVM所得的最優(yōu)模型預測結果在建模集中的正確率可達100%,在預測集中的正確率可達95%。對BPNN學習算法來說,造成鑒別效果不佳的原因可能是BPNN學習算法對訓練樣本數(shù)量要求較高,高維輸入特征會對神經網絡的訓練結果精度帶來影響。粒子群算法對支持向量機參數(shù)的優(yōu)化是連續(xù)的,而支持向量機本身具有小樣本學習和解決高維特征的能力,所以最后能得到使分類精度更好的優(yōu)化參數(shù),獲取最優(yōu)的鑒別模型。
表1 3種建模方法的分類結果比較
本文以太赫茲時域光譜為檢測手段,研究了不同產地咖啡豆的快速鑒別問題。試驗樣本選取埃塞俄比亞、印度尼西亞以及哥斯達黎加3個典型產地的咖啡豆。采用透射式太赫茲光譜系統(tǒng)獲取咖啡豆壓片樣品的太赫茲時域光譜和頻域光譜信息,并結合主成分分析進行光譜特征的降維和提取,利用粒子群算法對支持向量機進行參數(shù)尋優(yōu),建立了基于太赫茲光譜特征的咖啡豆產地鑒別模型。試驗結果中本論文所提方法對不同產地咖啡豆的鑒別準確率在建模集和預測集中分別高達100%和95%,優(yōu)于BPNN和LS-SVM算法。本文的研究表明太赫茲光譜技術可用于不同產地咖啡豆的快速鑒別,采用PSO優(yōu)化的SVM方法結合太赫茲光譜技術能夠獲得理想的鑒別模型。本文為咖啡豆產地鑒別提供了一種新方法,也為太赫茲光譜技術在其他復雜農產品/食品中的檢測應用提供了思路。
[1] 胡雙芳,衛(wèi)亞西,邢精精,等. 咖啡豆的化學組分差異與感官品質的相關性分析[J]. 食品工業(yè)科技,2013,34(24):125-129.
Hu Shuangfang, Wei Xiya, Xin Jingjing, et al. Correlation analysis between chemical components and sensory quality of coffee[J]. Science and Technology of Food industry, 2013, 34(24): 125-129. (in Chinese with English abstract)
[2] 顧文佳,李兆階. 我國焙炒咖啡行業(yè)質量調研報告[J]. 質量與標準化,2014(12):35-37.
Gu Wenjia, Li Zhaojie. A Survey report on the quality of roasted coffee in China[J]. Quality and Standardization, 2014(12): 35-37. (in Chinese with English abstract)
[3] Semmelroch P, Laskawy G, Blank I, et al. Determination of potent odorants in roasted coffee by stable isotope dilution assay[J]. Flavour & Fragrance Journal, 1995, 10(1): 1-7.
[4] Piccino S, Boulanger R, Descroix F, et al. Aromatic composition and potent odorants of the “specialty coffee” brew “Bourbon Pointu” correlated to its three trade classifications[J]. Food Research International, 2014, 61(61): 264-271.
[5] 何余勤,胡榮鎖,張海德,等. 基于電子鼻技術檢測不同焙烤程度咖啡的特征性香氣[J]. 農業(yè)工程學報,2015,31(18):247-255.
He Yuqin, Hu Rongsuo, Zhang Haide, et al. Characteristic aroma detection of coffee at different roasting degree based on electronic nose[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2015, 31(18): 247-255. (in Chinese with English abstract)
[6] Cho J S, Bae H J, Cho B K, et al. Qualitative properties of roasting defect beans and development of its classification methods by hyperspectral imaging technology[J]. Food Chemistry, 2017, 220: 505-509.
[7] Chen T, Li Z, Yin X, et al. Discrimination of genetically modified sugar beets based on terahertz spectroscopy y [J]. Spectrochimica Acta Part A Molecular & Biomolecular Spectroscopy, 2016, 153: 586-590.
[8] Lian F, Ge H, Xia S, et al. Identification of wheat quality using THz spectrum[J]. Optics Express, 2014, 22(10): 12533-12544.
[9] Gente R, Busch S F, Stübling E M, et al. Quality control of sugar beet seeds with THz time-domain spectroscopy[J]. IEEE Transactions on Terahertz Science & Technology, 2016, 6(5): 754-756.
[10] 楊靜琦,李紹限,趙紅衛(wèi),等. L-天冬酰胺及其一水合物的太赫茲光譜研究[J]. 物理學報,2014,63(13):105-111.
Yang Jingqi, Li Shaoxian, Zhao Hongwei, et al. Terahertz study of L-asparagine and its monohydrate[J]. Acta Physica Sinica, 2014, 63(13): 105-111. (in Chinese with English abstract)
[11] Liu J, Li Z. The terahertz spectrum detection of transgenic food[J]. Optik - International Journal for Light and Electron Optics, 2014, 125(23): 6867-6869.
[12] Gowen A A, O’Sullivan C, O’Donnell C P. Terahertz time domain spectroscopy and imaging: Emerging techniques for food process monitoring and quality control[J]. Trends in Food Science & Technology, 2012, 25(1): 40-46.
[13] Liu W, Liu C, Chen F, et al. Discrimination of transgenic soybean seeds by terahertz spectroscopy[J]. Scientific Reports, 2016, doi: 10.1038/srep35799.
[14] Liu W, Liu C, Hu X, et al. Application of terahertz spectroscopy imaging for discrimination of transgenic rice seeds with chemometrics[J]. Food Chemistry, 2016, 210: 415-421.
[15] Qin J Y, Ying Y B, Xie L J. The detection of agricultural products and food using terahertz spectroscopy: A Review[J]. Applied Spectroscopy Reviews, 2013, 48(6): 439-457.
[16] Redo-Sanchez A, Laman N, Schulkin B, et al. Review of terahertz technology readiness assessment and applications[J]. Journal of Infrared, Millimeter, and Terahertz Waves, 2013, 34(9): 500-518.
[17] 謝麗娟,徐文道,應義斌,等. 太赫茲波譜無損檢測技術研究進展[J]. 農業(yè)機械學報,2013,44(7):246-255.
Xie Lijuan, Xu Wendao, Ying Yibin, et al. Advancement and trend of terahertz spectroscopy technique for non-destructive detection[J]. Transactions of The Chinese Society for Agricultural Machinery, 2013, 44(7): 246-255. (in Chinese with English abstract)
[18] Su T F, Zhao G Z, Ren T B, et al. Characterizations of physico-chemical changes of corn biomass by steam explosion[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2015, 31(6): 253-256.
[19] 沈曉晨,李斌,李霞,等. 基于太赫茲時域光譜的轉基因與非轉基因棉花種子鑒別[J]. 農業(yè)工程學報,2017,33(增刊1):288-292.
Shen Xiaochen, Li Bin, Li Xia, et al. Identification of transgenic and non-transgenic cotton seed based on terahertz range spectroscopy[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(Supp.1): 288-292. (in Chinese with English abstract)
[20] Ge H, Jiang Y, Lian F, et al. Characterization of wheat varieties using terahertz time-domain spectroscopy[J]. Sensors, 2014, 15(6): 12560-12572.
[21] Liu J, Li Z, Hu F, et al. A THz spectroscopy nondestructive identification method for transgenic cotton seed based on GA-SVM[J]. Optical and Quantum Electronics, 2015, 47(2): 313-322.
[22] 郝勇,孫旭東,高榮杰,等. 基于可見/近紅外光譜與SIMCA和PLS-DA的臍橙品種識別[J]. 農業(yè)工程學報,2010,26(12):373-377.
Hao Yong, Sun Xudong, Gao Rongjie, et al. Application of visible and near infrared spectroscopy to identification of navel orange varieties using SIMCA and PLS-DA methods[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2010, 26(12): 373-377. (in Chinese with English abstract)
[23] Vapnik V N. An overview of statistical learning theory[J]. IEEE Transactions on Neural Networks, 1999, 10(10): 988-999.
[24] Burges C J C. A Tutorial on Support Vector Machines for Pattern Recognition[J]. Data Mining and Knowledge Discovery, 1998, 2(2): 121-167.
[25] V David S A. Advanced support vector machines and kernel methods[J]. Neurocomputing, 2003, 55(1/2): 5-20.
[26] 焦有權,趙禮曦,鄧歐,等. 基于支持向量機優(yōu)化粒子群算法的活立木材積測算[J]. 農業(yè)工程學報,2013,29(20):160-167.
Jiao Youquan, Zhao Lixi, Deng Ou, et al. Calculation of live tree timber volume based on partical swarm optimization and support vector regression[J]. Transaction of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2013, 29(20): 160-167. (in Chinese with English abstract)
[27] Venter G, Sobieszczanskisobieski J. Particle Swarm Optimization [J]. Aiaa Journal, 2013, 41(8):129-132.
[28] 劉偉,王建平,劉長虹,等. 基于粒子群尋優(yōu)的支持向量機番茄紅素含量預測[J]. 農業(yè)機械學報,2012,43(4):143-147.
Liu Wei, Wang Jianping, Liu Changhong, et al. Lycopene content prediction based on support vector machine with particle swarm optimization[J]. Transactions of the Chinese Society for Agricultural Machinery, 2012, 43(4): 143-147. (in Chinese with English abstract)
[29] 劉曉峰,陳通. PSO算法的收斂性及參數(shù)選擇研究[J]. 計算機工程與應用,2007,43(9):14-17. Liu Xiaofeng, Chen Tong. Study on convergence analysis and parameter choice of Particle Swarm Optimization[J]. Computer Engineering and Applications, 2007, 43(9): 14-17. (in Chinese with English abstract)
[30] Shi Y, Eberhart R C. Parameter Selection in Particle Swarm Optimization[C]. // Proceeding EP '98 Proceedings of the 7th International Conference on Evolutionary Programming VII. 1998: 591-600.
[31] Borin A, Ferr?o M F, Mello C, et al. Least-squares support vector machines and near infrared spectroscopy for quantification of common adulterants in powdered milk[J]. Analytica Chimica Acta, 2006, 579(1): 25-32.
[32] Dai H, MacBeth C. Effects of learning parameters on learning procedure and performance of a BPNN[J]. Neural Networks, 1997, 10(8): 1505-1521.
Rapid identification of producing area of coffee bean based on terahertz spectroscopy and support vector machine
Hu Xiaohua1, Liu Wei2,3※, Liu Changhong3, Qian Yunhui3
(1.230009; 2.230009; 3.230009)
Coffee is a very popular beverage in many countries. Coffee bean from different producing area has different flavour and functional properties, and thus the identification of producing area of coffee bean is important to assure the quality of coffee bean. The feasibility of a rapid and precise determination method of producing area of coffee bean was examined by using the terahertz (THz) time-domain spectra system (TAS7500TS HF1, Advantest Co., Ltd, Japan). Coffee bean samples from 3 different typical producing areas (Ethiopia, Costa Rica, and Indonesia) were collected and pressed into pellets for THz measurements. A total of 180 pellet samples (3 classes, each had 60 pellet samples) were randomly divided into calibration set (40 pellet samples for each class) and prediction set (20 pellet samples for each class). THz time-domain spectroscopy system worked with the TAS7500TS equipment in transmission mode. Before the experiment, the dry air was injected until the relative humidity reached below 3% to reduce the absorption of the THz waves by water in air. The parameters of THz system were as follow: frequency range was from 0.1 to 4 THz, the resolution was 7.6 GHz, the short pulse width was less than 50 fs and the average power was 20 mW. For each sample, the THz time-domain spectra were measured for 3 times at different position and then the average values were obtained. The frequency-domain spectra were acquired by a fast Fourier transform (FFT). Principal component analysis (PCA) with frequency-domain spectral data was performed to examine the qualitative difference of these 3 classes of coffee beans using the first 3 score vectors. The 3 groups of different class of coffee beans were almost apart from each other in the space of the first 3 principal components (PCs), although there was some overlap among the groups, which may be due to that the first 3 PCs only accounted for the all spectral variations of 68.75%. Thus, to reduce the dimension of the model features and retain more information of the THz spectra of samples, the first 20 components were selected as the spectral characteristics for the determination of producing area of coffee bean. The support vector machine (SVM), as a learning algorithm used for classification and regression tasks, was used to get the identification model. During the iteration for the optimum parameters selection, the particle swarm optimization (PSO) was designed, which could enlarge search space and improve search efficiency. The identification results of the PSO-SVM were compared with the least squares - support vector machine (LS-SVM) and back propagation neural network (BPNN). From the comparison, it was showed that the discrimination accuracy of all 3 classes of coffee beans using the PSO-SVM was up to 95% in prediction set and 100% in calibration set, respectively, which was the best model among the 3 methods. It can be concluded that the THz frequency spectra can be used as important features to identify the producing area of the coffee bean. The model with SVM method based on PSO can get better parameters of SVM to improve the identification ability than the traditional LS-SVM. THz spectra system combined with the proposed algorithm has been proved to be a very powerful and attractive tool for identification of producing area of coffee bean.
spectroscopy; models; support vector machine; coffee bean; terahertz; particle swarm optimization
10.11975/j.issn.1002-6819.2017.09.040
TP274+.3; TP391.44
A
1002-6819(2017)-09-0302-06
2017-02-22
2017-04-16
國家重點研發(fā)計劃項目(2016YFD0401104)
胡曉華,男,江西婺源人,主要從事太赫茲光譜無損檢測研究。合肥 合肥工業(yè)大學計算機與信息學院,230009。 Email:xiaohuahu@mail.hfut.edu.cn
劉 偉,男,安徽壽縣人,高級實驗師,博士,主要從事檢測技術與模式識別研究。合肥合肥學院機器視覺與智能控制實驗室,230601。Email:lwei1524@163.com
胡曉華,劉 偉,劉長虹,錢赟惠. 基于太赫茲光譜和支持向量機快速鑒別咖啡豆產地[J]. 農業(yè)工程學報,2017,33(9):302-307. doi:10.11975/j.issn.1002-6819.2017.09.040 http://www.tcsae.org
Hu Xiaohua, Liu Wei, Liu Changhong, Qian Yunhui. Rapid identification of producing area of coffee bean based on terahertz spectroscopy and support vector machine[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(9): 302-307. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2017.09.040 http://www.tcsae.org