侯麗 劉琦
【摘 要】跨攝像機(jī)行人因光照、視角、姿態(tài)的差異,會使其外觀變化顯著,給行人再識別的研究帶來嚴(yán)峻挑戰(zhàn)。文中提出基于深度學(xué)習(xí)和度量學(xué)習(xí)的行人再識別方法。首先采用手工特征和深度特征融合網(wǎng)絡(luò)FFN提取行人圖像特征,然后將核矩陣應(yīng)用于KISSME距離度量學(xué)習(xí)中,獲取更優(yōu)的距離度量模型。在具有挑戰(zhàn)的VIPeR和PRID450S兩個公開數(shù)據(jù)集上進(jìn)行仿真實驗,實驗結(jié)果表明所提出的行人再識別算法的有效性。
【關(guān)鍵詞】行人再識別;特征融合網(wǎng)絡(luò);深度學(xué)習(xí);距離度量學(xué)習(xí)
中圖分類號: TP391.41文獻(xiàn)標(biāo)識碼: A文章編號: 2095-2457(2019)29-0112-002
DOI:10.19694/j.cnki.issn2095-2457.2019.29.051
Deep Learning and Metric Learning Based Person Re-identification
HOU Li LIU Qi
(School of Information Engineering,Huangshan University,Huangshan Anhui 245041,China)
【Abstract】Pedestrian may vary greatly in appearance due to differences in illumination, viewpoint, and poses across cameras, which can bring serious challenges in person re-identification. A deep learning and metric learning based algorithm is proposed for person re-identification in this paper. Features of human images are first extracted by a feature fusion net (FFN) composed of handcraft features and deep features, and then a kernel matrix is applied to KISSME distance metric learning to obtain a better distance metric model. Experimental results have shown that the proposed algorithm effectively improves recognition rates on two challenging datasets (VIPeR, PRID450s).
【Key words】Person re-identification; Feature fusion net; Deep learning; Distance metric learning
0 引言
行人再識別屬于一種智能視頻分析技術(shù),對行人目標(biāo)的跨攝像頭跟蹤以及行人行為分析等具有重要的研究意義。行人再識別技術(shù),是指讓計算機(jī)去判斷不同攝像頭拍攝的行人圖像是否具有相同身份,通過行人的外觀去匹配不同攝像頭拍攝的行人圖像。因監(jiān)控場景的多變性和跨攝像機(jī)行人外觀變化的復(fù)雜性,對行人再識別的研究極具挑戰(zhàn)性。
當(dāng)前對行人再識別的研究主要集中于兩方面:一是提取具有辨識力的特征來描述行人外觀[1-11],二是探索具有辨識力的距離度量學(xué)習(xí)方法[12-18]。然而,大多數(shù)手工提取的特征(顏色/紋理/形狀等)在進(jìn)行跨攝像機(jī)行人匹配時,或者辨識力不夠,或者對視角變化不具有魯棒性。深度特征在一定程度上彌補(bǔ)了手工提取特征的不足,但需要通過大量樣本的監(jiān)督學(xué)習(xí)才能獲取更優(yōu)的特征模型。而距離度量學(xué)習(xí)在一定程度上減輕了跨攝像機(jī)行人匹配時的外觀差異,然而因有限的訓(xùn)練樣本數(shù)據(jù),可能無法獲取跨攝像機(jī)行人更優(yōu)的距離度量。
為了更好地解決跨攝像機(jī)行人外觀的顯著變化,文中結(jié)合深度學(xué)習(xí)技術(shù)和度量學(xué)習(xí)技術(shù)進(jìn)行行人再識別,其算法流程如圖1所示。首先采用手工特征和深度特征融合網(wǎng)絡(luò)FFN對行人的訓(xùn)練樣本進(jìn)行辨識特征提取,然后將核矩陣K應(yīng)用于KISSME距離度量學(xué)習(xí)中,以獲取更優(yōu)的距離度量模型,從而提高行人再識別的準(zhǔn)確率和魯棒性。
圖1 算法流程
1 辨識特征提取
為了更準(zhǔn)確地描述行人外觀,文中采用手工特征和深度特征融合網(wǎng)絡(luò)FFN提取行人圖像特征[3],如圖2所示。FFN由兩個子網(wǎng)絡(luò)組成。第一個子網(wǎng)絡(luò)使用傳統(tǒng)的CNN(卷積、池化、激活函數(shù))來處理輸入行人圖像;第二個子網(wǎng)絡(luò)使用額外的手工特征(RGB, HSV, LAB, YCbCr, YIQ顏色特征和Gabor紋理特征)來表示相同的行人圖像。兩個子網(wǎng)絡(luò)共同作用形成更加充分的行人圖像描述。第二個子網(wǎng)絡(luò)在特征學(xué)習(xí)過程中用于調(diào)整第一個子網(wǎng)絡(luò)的學(xué)習(xí)方向。最終,在融合層產(chǎn)生4096維的FFN特征向量。
圖2 FFN特征提取圖解[3]
2 核距離度量學(xué)習(xí)
為了減輕跨攝像機(jī)行人外觀的變化,在行人匹配階段,采用基于核技巧的KISSME[12]距離度量學(xué)習(xí)方法,獲取最優(yōu)的馬氏距離度量學(xué)習(xí)模型。
給定一對樣本(xi,xj),其馬氏距離定義如公式(1)所示:
d■■(xi,xj)=(xi-xj)TM(xi-xj)(1)
式中:M=∑■■-∑■■為正的半正定馬氏距離矩陣,能夠很容易地從訓(xùn)練樣本中學(xué)習(xí)?!芐=■∑■(x■-x■)(x■-x■)■和∑D=■∑∑■(x■-x■)(x■-x■)分別表示行人圖像相似對S和不相似對D的協(xié)方差矩陣。
文中通過核技巧將樣本特征向量從輸入特征空間映射到高維核空間,樣本特征向量之間借助核函數(shù)的映射獲取核矩陣K,即:K=ΦT(X)Φ(X)表示。X表示樣本特征,Φ(X)表示輸入特征空間到核空間的非線性映射。核函數(shù)的引入避免“維數(shù)災(zāi)難”,可大大減少計算量,也可通過自由的選取合適的核函數(shù)改善算法的性能。
3 實驗結(jié)果
文中應(yīng)用具有挑戰(zhàn)性的兩個公開數(shù)據(jù)集:VIPeR和PRID450S,估計所提出的行人再識別算法的累計匹配特性(CMC)。通過隨機(jī)選取行人數(shù)的一半作為訓(xùn)練樣本集,另一半作為測試樣本集。訓(xùn)練集中的樣本用于學(xué)習(xí)距離度量模型,測試集中的樣本用于衡量跨攝像機(jī)行人圖像的特征距離。
表1和圖3給出了VIPeR和PRID450S兩個數(shù)據(jù)集的實驗結(jié)果。由表1可知,基于相同特征FFN,在PRID450S數(shù)據(jù)集中有更優(yōu)的識別率。在VIPeR數(shù)據(jù)集排序為1時識別率僅為26.9%,而在PRID450S數(shù)據(jù)集排序為1時識別率為49.33%。
表1VIPeR和PRID450S兩個數(shù)據(jù)集的最高識別率(%)。列出了排序為1,5,10,20的累積匹配分?jǐn)?shù)。
表1
圖3 VIPeR和PRID450S兩個數(shù)據(jù)集的最高識別率(%)
4 結(jié)論
文中提出了基于深度學(xué)習(xí)和度量學(xué)習(xí)的行人再識別算法。采用深度特征和手工特征融合網(wǎng)絡(luò)FFN提取行人圖像特征,并將核矩陣K應(yīng)用于KISSME距離度量學(xué)習(xí)中,獲取更優(yōu)的距離度量模型。在具有挑戰(zhàn)的VIPeR和PRID450S兩個行人再識別數(shù)據(jù)集上的實驗結(jié)果展示了文中提出的行人再識別算法的有效性。
【參考文獻(xiàn)】
[1]S. Liao, Y. Hu, X. Zhu, and S. Z. Li, “Person re-identification by local maximal occurrence representation and metric learning,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA, 2015.6.7-2015.6.12.
[2]T. Xiao, H. Li, W. Ouyang, and X. Wang, “Learning deep feature representations with domain guided dropout for person re-identification,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, USA, 2016.6.26-2016.7.1.
[3]S. Wu, Y. C. Chen, X. Li, A. C. Wu, J. J. You, W. S. Zheng, “An enhanced deep feature representation for person re-identification,” IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 2016.3.7-2016.3.9
[4]D. Cheng, Y. Gong, S. Zhou, J. Wang, and N. Zheng, “Person re-identification by multi-channel parts-based CNN with improved triplet loss function,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, USA, 2016.6.26-2016.7.1.
[5]Y. Chen, X. Zhu, and S. Gong, “Person re-identification by deep learning multi-scale representations,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, 2017.7.21-2017.7.26
[6]H. Zhao, et al., “Spindle net: Person re-identification with human body region guided feature decomposition and fusion,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, 2017.7.21-2017.7.26.
[7]X. Liu, et al., “Hydraplus-net: Attentive deep features for pedestrian analysis,” IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017.10.22-2017.10.29.
[8]Y. Sun, et al., “Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),” European Conference on Computer Vision (ECCV), Munich, Germany, 2018.9.8-2018.9.14.
[9]L.Zhao,et al.,“Deeply-learned part-aligned representations for person re-identification,”IEEE International Conference on Computer Vision (ICCV),Venice,Italy,2017.10.22-2017.10.29.
[10]L. He, et al., “Deep spatial feature reconstruction for partial person re-identification: Alignment-free approach,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, 2018.6.18-2018.6.22.
[11]X. Chang, et al., “Multi-level factorisation net for person re-identification,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, 2018.6.18-2018.6.22.
[12]M. Koestinger, et al., “Large scale metric learning from equivalence constraints,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, USA, 2012.6.16-2012.6.21.
[13]S. Pedagadi, et al., “Local fisher discriminant analysis for pedestrian re-identification,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, Oregon, USA, 2013.6.23-2013.6.28.
[14]F. Xiong, M. Gou, O. Camps, M. Sznaier, “Person re-identification using kernel-based metric learning methods,” European conference on computer vision (ECCV), Zurich, Switzerland, 2014.9.6-2014.9.12.
[15]S. Paisitkriangkrai, C. Shen, A. Hengel, “Learning to rank in person re-identification with metric ensembles,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA, 2015.6.7-2015.6.12.
[16]Y. Yang, S. Liao, Z. Lei, S. Z. Li, “Large scale similarity learning using similar pairs for person verification,” AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona, USA, 2016.2.12-2016.2.17.
[17]L. Hou, K. Han, W. G. Wan, J-N Hwang, H. Y. Yao, “Normalized Distance Aggregation of Discriminative Features for Person Re-identification,” Journal of Electronic Imaging, 2018, 27(2): 023006.
[18]X. Yang, M. Wang, and D. Tao, “Person re-identification with metric learning using privileged information,” IEEE Transactions on Image Processing, 2018, 27(2),791-805.