陳一唐等
摘要對因子分析法在質譜成像數(shù)據(jù)分析中的應用進行了研究。本方法分析的質譜成像數(shù)據(jù)來源于空氣動力輔助離子源質譜成像技術,所用樣品為含有3種不同顏料(紅色、藍色、黑色)的筆跡樣品。對該樣品的成像數(shù)據(jù)進行因子分析后,將成像數(shù)據(jù)分為了背景、黑色、藍色和紅色因子。分析結果顯示, m/z 4432, 4784, 3222(3442)分別在紅色、藍色、黑色因子中的貢獻值遠大于其它質荷比,因此是3種顏料的特征質荷比。此結果與實際情況相符,證明使用因子分析方法對質譜成像數(shù)據(jù)進行分析和特征提取是可行的。對因子分析與主成分分析的成像數(shù)據(jù)處理結果進行了比較,結果顯示,因子分析可以更簡單和定量地對特征質荷比進行取舍,在生物標志物提取、疾病診斷、藥理分析等方面有較大的應用潛力。
關鍵詞因子分析; 質譜成像; 空氣動力輔助離子源; 多元統(tǒng)計
1引言
近年來,質譜成像技術(Imaging mass spectrometry, IMS)作為質譜研究中的熱點領域迅速發(fā)展,在了解組織病理特征、疾病診斷、藥物療效及發(fā)現(xiàn)生物標志物等臨床應用中發(fā)揮越來越重要的作用\[1~5\]。
隨著質譜成像技術的不斷發(fā)展\[6~8\],其質量分辨率和空間分辨率都不斷提高,這導致原始成像的數(shù)據(jù)量變得非常龐大,通過人工篩選的方式對其進行處理已經(jīng)越來越難。近年來,研究人員開始使用多元統(tǒng)計的方法\[9~12\],對質譜成像數(shù)據(jù)進行降維和特征提取。多元統(tǒng)計是一類數(shù)學方法的統(tǒng)稱,如何從中找出一個適合質譜成像數(shù)據(jù)分析應用的具體模型,成為質譜成像領域的研究內(nèi)容之一\[13,14\]。
目前,常用的應用于質譜成像數(shù)據(jù)處理的多元統(tǒng)計方法包括主成分分析(Principal component analysis,PCA)\[15,16\]、聚類分析(Hierarchical cluster analysis, HCA)\[17\],偏最小二乘判別分析(Partial least square discriminate analysis,PLSDA)\[18\]等,這些方法成功地對大量質譜數(shù)據(jù)進行了降維和特征提取,推進了質譜成像技術在各領域的應用。但是作為統(tǒng)計學的方法,這些常用方法所得到的結果數(shù)學意義偏多,往往較難對其給出符合實際意義的解釋。另外,相比使用其它技術確立的生物標志物,這些方法提取的標志物(質荷比)通常較少,有可能遺漏掉有重要意義的特殊質荷比。
本研究基于空氣動力輔助離子源質譜成像技術(Air flowassisted ionization imaging mass spectrometry,AFAIIMS)\[19\],對因子分析(Factor analysis,F(xiàn)A)在質譜成像數(shù)據(jù)分析中應用的方法進行了研究。選取一組混合筆跡樣品進行了質譜成像分析,獲得了原始質譜成像數(shù)據(jù),使用因子分析法對該數(shù)據(jù)進行統(tǒng)計分析,將成像數(shù)據(jù)分為了背景、黑色、藍色和紅色因子。分析結果顯示, m/z 4432, 4784, 3222(3442)分別在紅色、藍色、黑色因子中的貢獻值遠大于其它質荷比,因此是3種顏料的特征質荷比。此結果與實際情況相符,證明使用因子分析方法對質譜成像數(shù)據(jù)進行分析和特征提取是可行的。
本研究還對因子分析與主成分分析的成像數(shù)據(jù)處理結果進行了對比,結果表明,因子分析可以更簡單和定量地對質荷比進行正確和全面的取舍,判斷和提取出多個質荷比作為目標樣品成分的綜合標志物。相比目前常用的多元統(tǒng)計方法,因子分析法可以有效地對特殊因子進行提取和反應,在生物標志物提取、疾病診斷、藥理分析等方面有較大的應用潛力。
3結果與討論
31對樣品進行因子分析
對樣品進行AFAIIMS質譜成像數(shù)據(jù)采集,并對采集到數(shù)據(jù)進行因子分析。根據(jù)上文所述,由于需要預先設定將原始數(shù)據(jù)分類為多少個因子,因此,對不同數(shù)量因子的分析結果進行了初步計算。結果顯示,將原始數(shù)據(jù)分類為4個因子將保留996%的信息,而設置更多的因子,保留信息增加的幅度較小,因此,將成像數(shù)據(jù)分類為4個因子。
應用因子分析方法,原始質譜成像數(shù)據(jù)經(jīng)過處理后可以獲得4個因子,為了探索不同因子所代表的含義,以達到使用這4個因子解釋原始質譜數(shù)據(jù)基本結構的目的,計算了不同因子在樣品所有采樣點上的得分值。根據(jù)因子分析的數(shù)學特性,該得分值越大,說明該因子對該樣品點的影響越大。
類似于質譜成像以某個質荷比在樣品點上獲得的離子信號強度作為質譜成像圖的顏色值,本研究以對應樣品點的因子得分值作為顏色值,完成不同因子在不同樣品點上的因子得分圖,如圖1(E~H)所示。
對比圖1A和圖1E可以發(fā)現(xiàn),因子1得分值大的樣品點的分布同有筆跡的樣品點的分布恰好相反,即同背景的分布一致。根據(jù)因子得分的數(shù)學意義,因子1對背景樣品點的影響大,對有筆跡的樣品點影響小,這說明因子1主要影響了背景成分,因此,可以命名因子1為“背景因子”。
使用因子分析得到的每個因子在數(shù)學上是一個1×n的矩陣,n與質譜掃描范圍內(nèi)的質荷比的個數(shù)相同。此因子矩陣中的每個值與不同的質荷比一一對應,代表了該質荷比在該因子中的影響大小。
32因子分析與主成分分析的對比
主成分分析是目前最常用的對質譜成像數(shù)據(jù)進行多元數(shù)據(jù)統(tǒng)計方法。本研究對樣品的原始質譜成像數(shù)據(jù)進行了主成分分析,并與因子分析結果對比,所得結果如圖2所示。在主成分分析中,選擇在主成分上得分值大的點作為特征點,該點對應的質荷比為特征質荷比。如4結論
對因子分析方法在質譜成像數(shù)據(jù)分析中的應用進行了研究,證明因子分析可以對質譜成像數(shù)據(jù)進行降維和特征提取。所用原始質譜成像數(shù)據(jù)由AFAIIMS技術獲得,使用因子分析對該數(shù)據(jù)進行分析后,質譜成像數(shù)據(jù)可以使用4個因子進行分類。每個樣品成分,即每種顏料樣品依賴一種因子的影響,能清晰地觀察各個因子在整個樣品上的作用。確定不同因子的意義后,通過觀察不同質荷比在因子中的貢獻值大小,成功提取出了樣品成分的特征質荷比。
與目前常用的主成分分析等多元統(tǒng)計方法相比,因子分析能得到符合實際背景和意義的結果。因子分析法可以對不同質荷比在因子數(shù)組中的比重進行定量分析,并據(jù)此對特征質荷比進行正確和全面的取舍,有利于提取影響較低, 但不可忽略的特征質荷比。使用因子分析的方法,可以提取多種質荷比作為樣品成分的綜合標志物,在癌癥標志物提取等樣品成分復雜的領域中有較大的應用潛力。
2Pevsner P H, Melamed J, Remsen T, Kogos A, Francois F, Kessler P, Stern A, Anand S Biomakers Med, 2009, 3(1): 55-69
3Seeley E H, Caprioli R M Trends Biotechnol, 2011, 29(3): 136-143
4YANG ShuiPing, CHEN HuanWen, YANG YuLing, HU Bin, ZHANG Xie, ZHOU YuFang, ZHANG LiLi, GU HaiWei Chinese J Anal Chem, 2009, 37(3): 315-318
楊水平, 陳煥文, 楊宇玲, 胡 斌, 張 燮, 周瑜芬, 張麗麗, 顧海威 分析化學, 2009, 37(3): 315-318
5WEI KaiHua, ZHANG XueMin,YANG SongCheng Journal of Instrumental Analysis, 2007, 26(S1): 12-14
魏開華, 張學敏, 楊松成 分析測試學報, 2007, 26(S1): 12-14
6Ifa D R, Wiseman J M, Song Q, Cooks R G Int J Mass Spectrom, 2007, 259(1): 8-15
7Harris G A, Nyadong L, Fernandez F M Analyst, 2008, 133(10): 1297-1301
8YANG ShuiPing, HU Bin, LI JianQiang, HAN Jing, ZHANG Xie, CHEN HuanWen Chinese J Anal Chem, 2009, 37(5): 691-694
楊水平, 胡 斌, 李建強, 韓 京, 張 燮, 陳煥文 分析化學, 2009, 37(5): 691-694
9Jones E A, Remoortere A, Zeijl R J M, Hogendoorn P C W, Bovée J V M G, Deelder A M, McDonnell L A PloS one, 2011, 6(1): 1-14
10Bonnel D, Longuespee R, Franck J, Roudbaraki M, Gosset P, Day R, Salzet M, Fournier I Anal Bioanal Chem, 2011, 401(1): 149-165
11Reindl W, Bowen B P, Balamotis M A, Greenc J E, Northen T R Integr Biol, 2011, 3(4): 460-467
12Dill A L, Eberlin L S, Zheng C, Costa A B, Ifa D R, Cheng L, Masterson T A, Koch M O, Vitek O, Cooks R G Anal Bioanal Chem, 2010, 398(7): 2969-2978
13Fonville J M, Carter C, Cloarec O, Nicholson J K, Lindon J C, Bunch J, Holmes E Anal Chem, 2012, 84(3): 1310-1319
14Trede D, Kobarg J H, Oetjen J, Thiele H, Maass P, Alexandrov T J Integrative Bioinformatics, 2012, 9(1): 189
15Pan Z Z, Gu H W, Talaty N, Chen H W, Shanaiah N, Hainline B E, Cooks R G, Raftery D Anal Bioanal Chem, 2007, 387(2): 539-549
16Gu H W, Pan Z Z, Xi B W, Asiago V, Musselman B, Raftery D Anal Chim Acta, 2011, 686(1): 57-63
17Bonnel D, Longuespee R, Franck J, Roudbaraki M, Gosset P, Day R, Salzet M, Fournier I Anal Bioanal Chem, 2011, 401(1): 149-165
18Pirro V, Eberlin L S, Oliveri P, Cooks R G Analyst, 2012, 137(10): 2374-2380
19Luo Z, He J, Chen Y, He J, Gong T, Tang F, Wang X, Zhang R, Huang L, Zhang L, Lv H, Ma S, Fu Z, Chen X, Yu S, Abliz Z Anal Chem, 2013, 85(5): 2977-2982
20He J, Tang F, Luo Z, Chen Y, Xu J, Zhang R, Wang X, Abliz Z Rapid Commun Mass Spectrom, 2011, 25(7): 843-850AbstractThe factor analysis method applied in imaging mass spectrometry data analysis was studied The imaging mass spectrometric data were obtained by air flowassisted ionization imaging mass spectrometry method The sample contained some symbols which were drawn on slides using three different inks (red, blue, black) The imaging data analyzed by factor analysis method were divided into the background, black, blue and red factor The results showed that the scores of m/z=4432, 4784, 3222(3442) in red, blue, black factor respectively were much larger than others Therefore, they were markers of three inks The results accorded with actual condition well and proved that the application of factor analysis in imaging mass spectrometric data analysis was feasible The data analysis results of factor analysis and principal component analysis were compared The results showed that the target sample markers could be extracted by factor analysis simply and quantitatively It was of great potential in biomarker extraction, diseases diagnose and pharmacological analysis
KeywordsFactor analysis; Imaging mass spectrometry; Air flowassisted ionization; Multiple statistical analysis
20He J, Tang F, Luo Z, Chen Y, Xu J, Zhang R, Wang X, Abliz Z Rapid Commun Mass Spectrom, 2011, 25(7): 843-850AbstractThe factor analysis method applied in imaging mass spectrometry data analysis was studied The imaging mass spectrometric data were obtained by air flowassisted ionization imaging mass spectrometry method The sample contained some symbols which were drawn on slides using three different inks (red, blue, black) The imaging data analyzed by factor analysis method were divided into the background, black, blue and red factor The results showed that the scores of m/z=4432, 4784, 3222(3442) in red, blue, black factor respectively were much larger than others Therefore, they were markers of three inks The results accorded with actual condition well and proved that the application of factor analysis in imaging mass spectrometric data analysis was feasible The data analysis results of factor analysis and principal component analysis were compared The results showed that the target sample markers could be extracted by factor analysis simply and quantitatively It was of great potential in biomarker extraction, diseases diagnose and pharmacological analysis
KeywordsFactor analysis; Imaging mass spectrometry; Air flowassisted ionization; Multiple statistical analysis
20He J, Tang F, Luo Z, Chen Y, Xu J, Zhang R, Wang X, Abliz Z Rapid Commun Mass Spectrom, 2011, 25(7): 843-850AbstractThe factor analysis method applied in imaging mass spectrometry data analysis was studied The imaging mass spectrometric data were obtained by air flowassisted ionization imaging mass spectrometry method The sample contained some symbols which were drawn on slides using three different inks (red, blue, black) The imaging data analyzed by factor analysis method were divided into the background, black, blue and red factor The results showed that the scores of m/z=4432, 4784, 3222(3442) in red, blue, black factor respectively were much larger than others Therefore, they were markers of three inks The results accorded with actual condition well and proved that the application of factor analysis in imaging mass spectrometric data analysis was feasible The data analysis results of factor analysis and principal component analysis were compared The results showed that the target sample markers could be extracted by factor analysis simply and quantitatively It was of great potential in biomarker extraction, diseases diagnose and pharmacological analysis
KeywordsFactor analysis; Imaging mass spectrometry; Air flowassisted ionization; Multiple statistical analysis