李南云 王旭光 吳華強(qiáng) 何青林
摘 要:對(duì)于復(fù)雜非配合情況下,視頻拼接中特征匹配對(duì)的數(shù)目和特征匹配準(zhǔn)確率無(wú)法同時(shí)達(dá)到后續(xù)穩(wěn)像和拼接的要求這一問(wèn)題,提出一種基于灰度塔對(duì)特征點(diǎn)進(jìn)行評(píng)分后構(gòu)建匹配模型來(lái)進(jìn)行精準(zhǔn)特征匹配的方法。首先,利用灰度級(jí)壓縮后相近灰度級(jí)合并這一現(xiàn)象,建立灰度塔來(lái)實(shí)現(xiàn)對(duì)特征點(diǎn)的評(píng)分;而后,選取評(píng)分高的特征點(diǎn)建立基于位置信息的匹配模型;最后,依據(jù)匹配模型的定位進(jìn)行區(qū)域分塊匹配來(lái)避免全局特征點(diǎn)的干擾和大誤差噪點(diǎn)匹配,選擇誤差最小的特征匹配對(duì)作為最終結(jié)果匹配對(duì)。另外,在運(yùn)動(dòng)的視頻流中,可通過(guò)前后幀信息建立掩模進(jìn)行區(qū)域特征提取,匹配模型也可選擇性遺傳給后幀以節(jié)約算法時(shí)間。實(shí)驗(yàn)結(jié)果表明,在運(yùn)用了基于灰度塔評(píng)分的匹配模型后,特征匹配對(duì)準(zhǔn)確率在95%左右。相同幀特征匹配對(duì)的數(shù)目相較于隨機(jī)采樣一致性有近10倍的提升,在兼顧匹配數(shù)目和匹配準(zhǔn)確率的同時(shí)且無(wú)大誤差匹配結(jié)果,對(duì)于環(huán)境和光照有較好的魯棒性。
關(guān)鍵詞:特征提取;特征匹配;視頻拼接;灰度塔;匹配模型;分塊匹配
中圖分類(lèi)號(hào):TP391.4
文獻(xiàn)標(biāo)志碼:A
Abstract: Concerning the problem that in complex and noncooperative situations the number of matching feature pairs and the accuracy of feature matching results in video splicing can not meet the requirements of subsequent image stabilization and splicing at the same time, a method for constructing matching model based on grading of feature points by gradation tower is proposed. In the method of feature extraction, the phenomenon of merging gray scales after gray level compression is firstly used to establish the gray pyramid to realize the scoring of feature points. Then, the highscoring feature points are selected to establish the matching model based on position information. Finally, according to the positioning of the matching model, regional block matching is performed to avoid the influence of global interference and large error noise, and the feature matching pair with the smallest error is selected as the final result of matching pair. In addition, in the motion video stream, the feature extraction can be performed by using the previous frame information to establish a mask, and the matching model can be selectively passed to the back frame to save the algorithm time. The simulation test results show that the feature matching accuracy is about 95% after using this matching model based on the grayscale tower score. The number of matching feature pairs of the same frame is nearly 10 times higher than that of the traditional algorithm. The matching result has better robustness to the environment and illumination while keeping accuracy of matching result and getting rid of large error.
Concerning the problem that in complex and noncooperative situations the number of matching feature pairs and the accuracy of feature matching results in video stitching can not meet the requirements of subsequent image stabilization and stitching at the same time, a method of constructing matching model to match features accurately after feature points being scored by grayscale tower was proposed. Firstly, the phenomenon that the similiar grayscales would merged together after grayscale compression was used to establish a grayscale tower to realize the scoring of feature points. Then, the feature points with high score were selected to establish the matching model based on position information. Finally, according to the positioning of the matching model, regional block matching was performed to avoid the influence of global feature point interference and large error noise matching, and the feature matching pair with the smallest error was selected as the final result of matching pair. In addition, in a motion video stream, regional feature extraction could be performed by using the information of previous and next frames to establish a mask, and the matching model could be selectively passed on to the next frame to save the computation time. The simulation results show that after using this matching model based on grayscale tower score, the feature matching accuracy is about 95% and the number of matching feature pairs of the same frame is nearly 10 times higher than that of the traditional method. The proposed method has good robustness to environment and illumination while guaranteeing the matching number and the matching accuracy without large error matching result.
英文關(guān)鍵詞Key words: feature extraction; feature matching; video stitching; grayscale tower; matching model; block matching
0 引言
隨著近年圖像處理領(lǐng)域的飛速發(fā)展,圖像處理以及視頻處理在社會(huì)上的應(yīng)用也更加廣泛。單純的靜態(tài)圖像拼接已經(jīng)難以滿(mǎn)足目前社會(huì)的需求,靜態(tài)圖像難以讓人們獲取真實(shí)場(chǎng)景中的每個(gè)物體在時(shí)間流上的信息,而時(shí)間流上的信息在社會(huì)以及國(guó)防中是非常重要的一部分,因此近年來(lái)人們?cè)谝曨l拼接領(lǐng)域中探討也愈發(fā)深入。
特征提取和匹配作為圖像拼接[1]中非常重要的一部分,也廣泛應(yīng)用于圖像識(shí)別[2]、三維模型構(gòu)建[3]等領(lǐng)域。提取出來(lái)的特征點(diǎn)的質(zhì)量直接影響著特征匹配的結(jié)果;而特征匹配的結(jié)果又直接關(guān)系到后續(xù)拼接、識(shí)別和模型構(gòu)建等最終的準(zhǔn)確率。特征匹配對(duì)作為圖像處理這個(gè)大廈中的基石,目前主要的難點(diǎn)為匹配對(duì)的數(shù)目以及匹配準(zhǔn)確率和匹配時(shí)間。
基于傳統(tǒng)圖像算法的特征提取方式主要有尺度不變特征變換(Scale Invariant Feature Transform, SIFT)[4]、加速穩(wěn)健特征(Speeded Up Robust Feature, SURF)[5]、ORB(Oriented FAST and Rotated BRIEF)[6]、BRISK(Binary Robust Invariant Scalable Keypoints)[7]、BRIEF(Binary Robust Independent Element Feature)[8]、FAST(Features form Accelerated Segment Test),其中SIFT和SURF的描述符具有尺度不變性和旋轉(zhuǎn)不變性,可以獲得較高的匹配準(zhǔn)確度,但是計(jì)算量較大,效率較低。而ORB、BRISK、FAST則是通過(guò)比較特征像素點(diǎn)周?chē)袼氐牟町悂?lái)形成二值描述子,通過(guò)計(jì)算兩個(gè)特征描述子之間的Hamming距離[9]來(lái)比較兩個(gè)特征點(diǎn)之間的相關(guān)性。
特征匹配的篩選方式主要有K最近鄰(KNearest Neighbors, KNN)匹配和暴力匹配,然后加以隨機(jī)采樣一致性(RANdom Sample Consensus, RANSAC)[10]篩選出內(nèi)點(diǎn)集?;谌謫螒?yīng)矩陣的RANSAC從圖像全局進(jìn)行考慮,保留圖像主平面的匹配結(jié)果,去除次平面以及視差平面的匹配對(duì)。也就是當(dāng)存在視差時(shí),相機(jī)獲取的成像圖片中依據(jù)景物深度可以分為多個(gè)平面,RANSAC的篩選方式就是保留占比最大的深度平面的匹配結(jié)果,從原理上來(lái)說(shuō)這種匹配方式會(huì)丟失一些平面的信息,在應(yīng)對(duì)存在視差較大情況下配準(zhǔn)能力不足。KNN匹配方式可以有效避免由于視差所造成的匹配對(duì)之間的差異,直接獲取匹配結(jié)果,但是由于不同成像圖片的不同光照信息和圖像信息,KNN的閾值一成不變會(huì)造成較大的匹配誤差,無(wú)法直接應(yīng)用于視頻拼接。上面兩種匹配方式無(wú)法避免由于噪聲影響造成的無(wú)規(guī)律錯(cuò)誤匹配。
本文通過(guò)在視頻拼接中尋找特征匹配對(duì)數(shù)目較多且均勻分布的特征匹配方式,來(lái)解決視頻拼接中特征點(diǎn)數(shù)目不夠均勻無(wú)法考慮全局深度的問(wèn)題。
3 提出的算法
由前文的視頻拼接介紹可知:想要獲得相機(jī)路徑和網(wǎng)格變換關(guān)系,就要一定程度上保證特征點(diǎn)對(duì)匹配的數(shù)目較多且盡量均勻分布,還要保證特征點(diǎn)對(duì)匹配的正確率,傳統(tǒng)的特征匹配方法難以達(dá)到這個(gè)要求, 所以為了達(dá)到這個(gè)目的,本文提出了基于特征點(diǎn)評(píng)分的匹配模型(Matching Model based on Grayscale Tower score, MMGT)構(gòu)建的特征匹配方式。
3.1 特征點(diǎn)評(píng)分
對(duì)輸入圖像進(jìn)行特征提取時(shí),優(yōu)秀的特征點(diǎn)提取方式提取的特征點(diǎn)具有尺度不變性和旋轉(zhuǎn)不變性,但是提取出的特征點(diǎn)包含噪點(diǎn),數(shù)目巨大,因?yàn)楣庹沼绊懖](méi)有完整的評(píng)分機(jī)制,從而對(duì)最終的匹配結(jié)果可信度和正確率造成影響。如果可以對(duì)特征點(diǎn)進(jìn)行評(píng)分,那么評(píng)分較高的特征點(diǎn)匹配結(jié)果的可信度上升,在后續(xù)構(gòu)建匹配模型時(shí),匹配模型的可信度也會(huì)提高,從而作用到最終的匹配結(jié)果上。引入灰度級(jí)下降的特征點(diǎn)的評(píng)分方式可以在一定程度上解決光照和錯(cuò)誤特征點(diǎn)的影響。
灰度圖像中一般具有256個(gè)灰度級(jí),在傳統(tǒng)的特征點(diǎn)提取過(guò)程中,首先構(gòu)建高斯金字塔,在高斯差分圖像中與其周?chē)噜彽?個(gè)特征點(diǎn)和上下相鄰空間中的18個(gè)特征點(diǎn)比較灰度值,極值點(diǎn)就是最終獲得的特征點(diǎn),此時(shí)的特征點(diǎn)具有尺度不變性。特征點(diǎn)定義為圖像中特殊的點(diǎn),理想的特征點(diǎn)不會(huì)隨著尺度和光照的變換而消失,也就是說(shuō)當(dāng)壓縮灰度級(jí)時(shí),例如整體圖像的灰度級(jí)由250變成128,鄰近的灰度級(jí)會(huì)合并,但是優(yōu)秀的特征點(diǎn)仍然存在。
首先構(gòu)建高斯差分金字塔,比較其不同尺度空間的像素值后可以獲取到初始提取的特征點(diǎn)集P(n)和P′(n)。然后在這個(gè)特征點(diǎn)集中的每個(gè)特征點(diǎn)周?chē)∫粋€(gè)半徑為R的圓,圓內(nèi)的像素稱(chēng)之為光照關(guān)聯(lián)像素,此時(shí)需要構(gòu)建一個(gè)尺度不變的光照金字塔:1)獲取特征點(diǎn)周?chē)庹招畔㈥P(guān)聯(lián)區(qū)域的直方圖;2)模擬光照,在不同的倍數(shù)下壓縮光照信息關(guān)聯(lián)區(qū)域的整體灰度級(jí)數(shù)目;3)比較不同灰度級(jí)金字塔上每一層的關(guān)聯(lián)區(qū)域的灰度下降梯度信息,當(dāng)梯度變換低于一定閾值后認(rèn)為該特征點(diǎn)消失;4)特征點(diǎn)P存在在金字塔上的層數(shù)越多,則認(rèn)為這個(gè)特征點(diǎn)的評(píng)分越高,同樣該特征點(diǎn)的質(zhì)量越高。如圖5所示。
3.2 匹配模型的構(gòu)建
經(jīng)過(guò)上述評(píng)分的步驟可以獲得帶有評(píng)分的特征點(diǎn)集P(n)和P′(n),評(píng)分越高代表特征點(diǎn)越可靠。下一步需要依據(jù)這個(gè)評(píng)分來(lái)構(gòu)建一個(gè)匹配模型,增加特征匹配對(duì)的數(shù)目和正確率,避免由于噪點(diǎn)引起的大誤差匹配。
5 結(jié)語(yǔ)
本文對(duì)多節(jié)點(diǎn)的UAV網(wǎng)絡(luò)采取高空視頻并進(jìn)行實(shí)時(shí)拼接成全景視頻特征匹配進(jìn)行了深入的研究,提出了基于灰度塔進(jìn)行特征點(diǎn)評(píng)分,并構(gòu)建匹配模型的特征匹配方法,很好地解決了特征點(diǎn)提取的數(shù)目不夠且分布不夠均勻的問(wèn)題,并利用視頻流中信息的遺傳機(jī)制,將匹配模型和感興趣區(qū)域進(jìn)行遺傳,取得了很好的效果。但是該方法在實(shí)時(shí)性上還有一些不足,今后的研究應(yīng)該是如何對(duì)匹配模型匹配過(guò)程中的一些互相不相關(guān)的過(guò)程進(jìn)行多線(xiàn)程并行處理加快匹配速度。另外在灰度塔的應(yīng)用上還可以進(jìn)行擴(kuò)展,但是要注意在圖像分割中偽輪廓的處理。
參考文獻(xiàn) (References)
[1] IVAN S K, OLEG V P. Spherical video panorama stitching from multiple cameras with intersecting fields of view and inertial measurement unit[C]// Proceedings of the 2016 International Siberian Conference on Control and Communications. Piscataway, NJ: IEEE, 2016: 1-6.
[2] ZHANG L, HE Z, LIU Y. Deep object recognition across domains based on adaptive extreme learning machine[J]. Neurocomputing, 2017,239:194-203.
[3] YANG B, DONG Z, LIANG F, et al. Automatic registration of largescale urban scene point clouds based on semantic feature points[J]. ISPRS Journal of Photogrammetry & Remote Sensing, 2016, 113: 43-58.
[4] LOWE D G. Distinctive image features from scaleinvariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110.
[5] HSU W Y, LEE Y C. Rat brain registration using improved speeded up robust features[J]. Journal of Medical and Biological Engineering, 2017, 37(1): 45-42.
[6] RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB: an efficient alternative to SIFT or SURF[C]// Proceedings of the 2011 International Conference on Computer Vision. Washington, DC: IEEE Computer Society, 2011:2564-2571.
[7] LEUTENEGGER S, CHLI M, SIEGWART Y. BRISK: binary robust invariant scalable keypoints[C]// Proceedings of the 2011 International Conference on Computer Vision. Washington, DC: IEEE Computer Society, 2011:2548-2555.
[8] CALONDER M, LEPETIT V, STRECHA C, et al. BRIEF: binary robust independent elementary features[C]// Proceedings of the 11th European Conference on Computer Vision. Berlin: Springer, 2010: 778-792.
[9] MOR M, FRAENKEL A S. A hash code method for detecting and correcting spellingerrors[J]. Communications of the ACM, 1982, 25(12): 935-938.
[10] SANTHA T, MOHANA M B V. The significance of realtime,biomedical and satellite image processing in understanding the objects & application to computer vision[C]// Proceedings of the 2nd IEEE International Conference on Engineering & Technology. Piscataway, NJ: IEEE, 2016: 661-670.
[11] BROWN M, LOWE D G. Automatic panoramic image stitching using invariant features[J]. International Journal of Computer Vision, 2007, 74(1):59-73.
[12] GUO H, LIU S, HE T, et al. Joint video stitching and stabilization from moving cameras[J]. IEEE Transactions on Image Processing, 2016, 25(11):5491-5503.
[13] 倪國(guó)強(qiáng), 劉瓊. 多源圖像配準(zhǔn)技術(shù)分析與展望[J].光電工程, 2004, 31(9):1-6.(NI G Q,LIU Q. Analysis and prospect of multisource image registration techniques[J].OptoElectronic Engineering, 2004,31(9):1-6.)
[14] ZARAGOZA J, CHIN T J, BROWN M S, et al. Asprojectiveaspossible image stitching with moving DLT[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7):1285-1298.
[15] 朱云芳, 葉秀清, 顧偉康.視頻序列的全景圖拼接技術(shù)[J].中國(guó)圖象圖形學(xué)報(bào),2006,11(8):1150-1155.(ZHU Y F,YE X Q,GU W K. Mosaic panorama technique for videos[J]. Journal of Image and Graphics,2006,11(8):1150-1155.)