于 霜,劉國海,夏榮盛,江 輝
1. 蘇州工業(yè)職業(yè)技術(shù)學(xué)院機(jī)電工程系,江蘇 蘇州 215000 2. 南京航空航天大學(xué)機(jī)電學(xué)院,江蘇 南京 210016 3. 江蘇大學(xué)電氣信息工程學(xué)院,江蘇 鎮(zhèn)江 212013
基于Adaboost及譜回歸判別分析的近紅外光譜固態(tài)發(fā)酵過程狀態(tài)識(shí)別
于 霜1, 2,劉國海3*,夏榮盛3,江 輝3
1. 蘇州工業(yè)職業(yè)技術(shù)學(xué)院機(jī)電工程系,江蘇 蘇州 215000 2. 南京航空航天大學(xué)機(jī)電學(xué)院,江蘇 南京 210016 3. 江蘇大學(xué)電氣信息工程學(xué)院,江蘇 鎮(zhèn)江 212013
為了實(shí)現(xiàn)固態(tài)發(fā)酵過程狀態(tài)的快速監(jiān)測(cè),以飼料蛋白固態(tài)發(fā)酵為實(shí)驗(yàn)對(duì)象,開展了基于近紅外光譜分析技術(shù)的飼料蛋白固態(tài)發(fā)酵過程狀態(tài)定性識(shí)別研究。首先利用Antaris Ⅱ型傅里葉變換近紅外光譜儀采集140個(gè)固態(tài)發(fā)酵物樣本的近紅外光譜,并采用標(biāo)準(zhǔn)正態(tài)變換(SNV)光譜預(yù)處理方法對(duì)獲得的原始光譜進(jìn)行預(yù)處理;其次,采用譜回歸判別分析(SRDA)法對(duì)預(yù)處理后的近紅外光譜進(jìn)行特征提??;最后,采用最近鄰(NN)分類算法作為弱分類器建立固態(tài)發(fā)酵過程狀態(tài)識(shí)別模型,并對(duì)測(cè)試集樣本進(jìn)行識(shí)別。結(jié)果顯示,與利用主成分分析(PCA)法和線性判別分析(LDA)法提取的光譜特征建立的識(shí)別模型結(jié)果相比較,SRDA-NN識(shí)別模型獲得的結(jié)果最佳,在測(cè)試集中的正確識(shí)別率達(dá)到94.28%;為了進(jìn)一步提高識(shí)別模型的準(zhǔn)確率,將自適應(yīng)提升法(Adaboost)與SRDA-NN方法結(jié)合,提出了Adaboost-SRDA-NN集成學(xué)習(xí)算法來建立飼料蛋白固態(tài)發(fā)酵過程狀態(tài)的在線監(jiān)測(cè)模型。通過Adaboost算法提升后的SRDA-NN模型預(yù)測(cè)性能得到了進(jìn)一步增強(qiáng),Adaboost-SRDA-NN模型在測(cè)試集中的正確識(shí)別率達(dá)到100%。試驗(yàn)結(jié)果表明:在近紅外光譜定性分析模型校正過程中,SRDA方法能有效地對(duì)近紅外光譜數(shù)據(jù)進(jìn)行特征提取,以實(shí)現(xiàn)維數(shù)約簡;另外,Adaboost算法能很好地提升最終分類模型的預(yù)測(cè)精度。
光譜分析;近紅外;特征提??;譜回歸判別分析;Adaboost
近紅外光譜分析技術(shù)作為一門快速、無損的現(xiàn)代分析技術(shù),已在生物發(fā)酵領(lǐng)域取得了一些研究成果[1-6]。我們采用近紅外光譜分析技術(shù)對(duì)飼料蛋白固態(tài)發(fā)酵過程進(jìn)行狀態(tài)識(shí)別。但是由于近紅外光譜數(shù)據(jù)具有高維性和易干擾性的特點(diǎn),在利用光譜數(shù)據(jù)建立狀態(tài)識(shí)別模型前,通常會(huì)采用相應(yīng)的特征提取方法對(duì)數(shù)據(jù)進(jìn)行降維,從而降低模型的復(fù)雜度和提高模型的識(shí)別準(zhǔn)確率[7]。
主成分分析法(principal component analysis, PCA)是一種常用的無監(jiān)督算法,在進(jìn)行特征提取時(shí)易導(dǎo)致樣本間有用類別信息丟失;線性判別分析法(linear discrimination analysis, LDA)作為一種經(jīng)典的有監(jiān)督算法,能夠充分利用樣本間已知的類別信息,使投影后的樣本具備最佳的可分離性,但是LDA算法面臨復(fù)雜的廣義特征分解問題[8];而譜回歸判別分析法(spectral regression discriminant analysis, SRDA)作為一種新型的特征提取方法,將LDA算法中廣義特征分解問題轉(zhuǎn)化為解決一系列正則化的最小二乘問題,大大簡化了計(jì)算過程[9]。本工作采用SRDA方法對(duì)近紅外光譜數(shù)據(jù)進(jìn)行特征提取,對(duì)降維后的光譜數(shù)據(jù)利用最近鄰(nearest neighbors,NN)算法建立識(shí)別模型。為了進(jìn)一步提高模型的識(shí)別準(zhǔn)確率,將自適應(yīng)提升算法(Adaboost)與SRDA-NN方法相結(jié)合,提出了Adaboost-SRDA-NN集成學(xué)習(xí)算法,并成功應(yīng)用于固態(tài)發(fā)酵過程的狀態(tài)識(shí)別,與單一的LDA-NN與SRDA-NN兩種方法的識(shí)別準(zhǔn)確率相比,提升效果顯著。
1.1 樣本
在GTG-100固態(tài)發(fā)酵裝置中進(jìn)行飼料蛋白固態(tài)發(fā)酵實(shí)驗(yàn),每隔12 h采集4個(gè)樣本,一個(gè)批次的發(fā)酵過程結(jié)束之后可獲得28個(gè)樣本。相同材料和條件下發(fā)酵5個(gè)批次,共獲得140個(gè)樣本。每獲得一個(gè)樣本,便利用AntarisⅡ光譜儀采集該樣本的近紅外光譜數(shù)據(jù)和測(cè)量樣本的pH值。圖1所示為不同發(fā)酵時(shí)間采集樣本中具有代表性的一個(gè)樣本的實(shí)測(cè)pH值與其發(fā)酵時(shí)間之間的關(guān)系。
Fig.1 Relation between incubation time and pH value of fermented substrate
依據(jù)圖1中所示pH值的變化趨勢(shì),可將整個(gè)固態(tài)發(fā)酵過程分為三個(gè)階段:延滯期(采樣點(diǎn)0和12 h)、指數(shù)期(采樣點(diǎn)24,36和48 h)、穩(wěn)定期(采樣點(diǎn)60和72 h)。每個(gè)階段分別采集的樣本數(shù)為40,60和40。建模前,按照3∶1的比例從三個(gè)階段中選取訓(xùn)練集樣本105個(gè),測(cè)試集樣本35個(gè)。定義穩(wěn)定期階段的樣本為發(fā)酵完成狀態(tài),延滯期和指數(shù)期兩個(gè)階段的樣本為發(fā)酵未完成狀態(tài)[5, 10]。
1.2 近紅外光譜采集
使用Thermo Scientific公司的Antaris Ⅱ傅里葉變換近紅外光譜儀,配備InGaAs檢測(cè)器,以內(nèi)置參比為背景,利用漫反射式積分球附件采集發(fā)酵物樣本的近紅外光譜。實(shí)驗(yàn)室溫度保持在25 ℃左右,濕度基本恒定。掃描波數(shù)范圍為10 000~4 000 cm-1,掃描次數(shù)為16次,分辨率為8 cm-1。將發(fā)酵物樣本裝入儀器配套的樣品杯中,充分壓實(shí),每個(gè)樣本在不同位置采集3次,并將其平均光譜作為該樣本的原始光譜。
1.3 方法原理
1.3.1 SRDA特征提取算法
(1)
式(1)中的最優(yōu)解可以轉(zhuǎn)化成下列廣義特征向量求解問題
Sba=λSta
(2)
(3)
關(guān)于快速解決式(3)的特征分解問題,文獻(xiàn)[9]提出了譜回歸判別分析(SRDA)法來有效解決這一問題。
1.3.2 Adaboost-SRDA-NN算法
Adaboost是一種迭代提升算法,基本思想是把多個(gè)不同的弱分類器加以集成,構(gòu)成一個(gè)強(qiáng)分類器。我們采用的Adaboost-SRDA-NN算法是把SRDA特征子空間的最近鄰分類器作為弱分類器,然后通過Adaboost算法得到由多個(gè)弱分類器組成的強(qiáng)分類器。具體算法過程描述如下[11]:
Step 1:訓(xùn)練集樣本{(x1,y1), …, (xm,ym)},其中xi為樣本點(diǎn),yi為對(duì)應(yīng)類別標(biāo)簽,初始化訓(xùn)練集樣本數(shù)據(jù)的分布權(quán)值W1(i)=1/m,i=1, 2…m,假定進(jìn)行T次循環(huán)迭代,初始化迭代次數(shù)t=1。
Step 2:按照當(dāng)前權(quán)重分布從訓(xùn)練樣本中選取p(p Step 6:令迭代次數(shù)t=t+1。 Step 7:若t≤T,返回Step 2;若t=T+1, 執(zhí)行Step 8。 2.1 近紅外光譜數(shù)據(jù)預(yù)處理 為了消除原始光譜數(shù)據(jù)中所包含的無關(guān)信息,在進(jìn)行特征提取之前,采用標(biāo)準(zhǔn)正態(tài)變換(standard normal variate transformation, SNV)方法對(duì)原始光譜數(shù)據(jù)進(jìn)行預(yù)處理,預(yù)處理過后的光譜如圖2所示。 Fig.2 Spectrum preprocessed by SNV 2.2 特征提取過程 分別采用主成分分析法(PCA),線性判別分析法(LDA),譜回歸判別分析法(SRDA)對(duì)預(yù)處理過后的數(shù)據(jù)進(jìn)行特征提取,其中PCA方法以累積貢獻(xiàn)率達(dá)到99%作為特征提取標(biāo)準(zhǔn)得到9個(gè)特征變量;LDA 和SRDA方法分別獲得1個(gè)特征變量。然后采用最近鄰分類(NN)算法對(duì)特征提取過后的數(shù)據(jù)分別建立狀態(tài)識(shí)別模型,并利用測(cè)試集樣本進(jìn)行校驗(yàn)。三種方法的識(shí)別結(jié)果如表1所示。其中SRDA與傳統(tǒng)的PCA和LDA兩種特征提取方法相比,能更好的對(duì)近紅外光譜數(shù)據(jù)進(jìn)行特征提取,建立的SRDA-NN模型對(duì)測(cè)試樣本的識(shí)別準(zhǔn)確率達(dá)到了94.28%,要高于其他兩種模型。 Table 1 Recognition results of different models 2.3 Adaboost-SRDA-NN算法建模 為了進(jìn)一步提高模型的識(shí)別準(zhǔn)確率,以SRDA算法結(jié)合最近鄰分類算法構(gòu)建的模型作為弱分類器,通過Adaboost提升算法構(gòu)建一個(gè)強(qiáng)分類器模型,最后采用測(cè)試集數(shù)據(jù)對(duì)強(qiáng)分類器模型進(jìn)行校驗(yàn)。圖3所示為該算法迭代次數(shù)與模型識(shí)別正確率之間的關(guān)系。從圖3中可以看出,隨著迭代次數(shù)的增加,該模型的正確識(shí)別率也逐漸增加。其中,在2輪迭代之后,該模型對(duì)固態(tài)發(fā)酵過程狀態(tài)的識(shí)別準(zhǔn)確率已經(jīng)達(dá)到94.28%以上,逐漸趨于穩(wěn)定,5輪迭代結(jié)束之后,狀態(tài)識(shí)別的準(zhǔn)確率達(dá)到了100%,算法提升效果明顯。因此,針對(duì)本工作的研究對(duì)象,該算法迭代次數(shù)最終選擇為5,所建模型效果最佳,對(duì)測(cè)試集獨(dú)立樣本的正確識(shí)別率為100%。 Fig.3 Discrimination rates of SRDA-Adaboost-NN model according to different interations 采用近紅外光譜分析技術(shù)結(jié)合Adaboost-SRDA-NN算法對(duì)飼料蛋白固態(tài)發(fā)酵過程狀態(tài)進(jìn)行定性識(shí)別。所采用的SRDA特征提取算法相比傳統(tǒng)的PCA和LDA兩種算法,能更有效地對(duì)近紅外光譜數(shù)據(jù)進(jìn)行特征提取,降低模型的復(fù)雜度;提出的Adaboost-SRDA-NN算法能很好地對(duì)飼料蛋白固態(tài)發(fā)酵過程狀態(tài)進(jìn)行識(shí)別,具有較高的識(shí)別正確率。 [1] HU Yao-hua, LIU Cong, HE Yong(胡耀華,劉 聰,何 勇). Spectroscopy and Spectral Analysis(光譜學(xué)與光譜分析), 2014, 34(4): 922. [2] Reboucas M V, Santos J B, Pimentel M F, et al. Chemometrics and Intelligent Laboratory Systems, 2011, 107(1): 186. [3] Jiang H, Liu G, Mei C, et al. Analytical Methods, 2013, 5(7): 1872. [4] Jiang H, Liu G, Mei C, et al. Analytical and Bioanalytical Chemistry, 2012, 404(2): 603. [5] Jiang H, Liu G, Xiao X, et al. Microchemical Journal, 2012, 102: 68. [6] Jiang H, Liu G, Mei C, et al. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2012, 97: 277. [7] LEI Meng, LI Ming(雷 萌,李 明). CIESC Journal(化工學(xué)報(bào)), 2012, 63(12): 3991. [8] Seetohul L N, Scott S M, O’Hare W T, et al. Journal of the Science of Food and Agriculture, 2013, 93(9): 2308. [9] Gui J, Sun Z N, Cheng J, et al. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24(2): 211. [10] JIANG Hui, LIU Guo-hai, MEI Cong-li , et al(江 輝,劉國海,梅從立,等). Transactions of the Chinese Society for Agricultural Machinery(農(nóng)業(yè)機(jī)械學(xué)報(bào)), 2012, 43(10): 114. [11] Wang J, Zhang Y. Technology for Education and Learning. Springer Berlin Heidelberg, 2012. 259. *Corresponding author State Recognition of Solid Fermentation Process Based on Near Infrared Spectroscopy with Adaboost and Spectral Regression Discriminant Analysis YU Shuang1, 2,LIU Guo-hai3*,XIA Rong-sheng3,JIANG Hui3 1. Mechanical and Electrical Engineering, Suzhou Institute of Industrial Technology, Suzhou 215000, China 2. Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China 3. School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China In order to achieve the rapid monitoring of process state of solid state fermentation (SSF), this study attempted to qualitative identification of process state of SSF of feed protein by use of Fourier transform near infrared (FT-NIR) spectroscopy analysis technique. Even more specifically, the FT-NIR spectroscopy combined with Adaboost-SRDA-NN integrated learning algorithm as an ideal analysis tool was used to accurately and rapidly monitor chemical and physical changes in SSF of feed protein without the need for chemical analysis. Firstly, the raw spectra of all the 140 fermentation samples obtained were collected by use of Fourier transform near infrared spectrometer (Antaris Ⅱ), and the raw spectra obtained were preprocessed by use of standard normal variate transformation (SNV) spectral preprocessing algorithm. Thereafter, the characteristic information of the preprocessed spectra was extracted by use of spectral regression discriminant analysis (SRDA). Finally, nearest neighbors (NN) algorithm as a basic classifier was selected and building state recognition model to identify different fermentation samples in the validation set. Experimental results showed as follows: the SRDA-NN model revealed its superior performance by compared with other two different NN models, which were developed by use of the feature information form principal component analysis (PCA) and linear discriminant analysis (LDA), and the correct recognition rate of SRDA-NN model achieved 94.28% in the validation set. In this work, in order to further improve the recognition accuracy of the final model, Adaboost-SRDA-NN ensemble learning algorithm was proposed by integrated the Adaboost and SRDA-NN methods, and the presented algorithm was used to construct the online monitoring model of process state of SSF of feed protein. Experimental results showed as follows: the prediction performance of SRDA-NN model has been further enhanced by use of Adaboost lifting algorithm, and the correct recognition rate of the Adaboost-SRDA-NN model achieved 100% in the validation set. The overall results demonstrate that SRDA algorithm can effectively achieve the spectral feature information extraction to the spectral dimension reduction in model calibration process of qualitative analysis of NIR spectroscopy. In addition, the Adaboost lifting algorithm can improve the classification accuracy of the final model. The results obtained in this work can provide research foundation for developing online monitoring instruments for the monitoring of SSF process. Spectral analysis; Near infrared spectroscopy; Feature extraction; Adaboost Oct. 27, 2014; accepted Feb. 4, 2015) 2014-10-27, 2015-02-04 國家中小型企業(yè)創(chuàng)新基金項(xiàng)目(12C26213202207),中國博士后科學(xué)基金面上項(xiàng)目(2014M550273)資助 于 霜,女,1981年生,蘇州工業(yè)職業(yè)技術(shù)學(xué)院機(jī)電工程系講師 e-mail: szyushuang@126.com *通訊聯(lián)系人 e-mail: ghliu@ujs.edu.cn O657.33, Q815 A 10.3964/j.issn.1000-0593(2016)01-0051-042 結(jié)果與分析
3 結(jié) 論