韓亞茹,閆連山,姚濤
基于元學(xué)習(xí)的深度哈希檢索算法
韓亞茹1,閆連山2*,姚濤1
(1.魯東大學(xué) 信息與電氣工程學(xué)院, 山東 煙臺(tái) 264025; 2.西南交通大學(xué) 信息科學(xué)與技術(shù)學(xué)院,成都 611756)( ? 通信作者電子郵箱 lsyan@home.swjtu.edu.cn)
隨著移動(dòng)互聯(lián)網(wǎng)技術(shù)的發(fā)展,圖像數(shù)據(jù)的規(guī)模越來越大,大規(guī)模圖像檢索任務(wù)已經(jīng)成為了一個(gè)緊要的問題。由于檢索速度快和存儲(chǔ)消耗低,哈希算法受到了研究者的廣泛關(guān)注。基于深度學(xué)習(xí)的哈希算法要達(dá)到較好的檢索性能,需要一定數(shù)量的高質(zhì)量訓(xùn)練數(shù)據(jù)來訓(xùn)練模型。然而現(xiàn)存的哈希方法通常忽視了數(shù)據(jù)集存在數(shù)據(jù)類別非平衡的問題,而這可能會(huì)降低檢索性能。針對(duì)上述問題,提出了一種基于元學(xué)習(xí)網(wǎng)絡(luò)的深度哈希檢索算法。所提算法可以直接從數(shù)據(jù)中自動(dòng)學(xué)習(xí)加權(quán)函數(shù)。該加權(quán)函數(shù)是只有一個(gè)隱含層的多層感知機(jī)(MLP),在少量無偏差元數(shù)據(jù)的指導(dǎo)下,加權(quán)函數(shù)的參數(shù)可以和模型訓(xùn)練過程中的參數(shù)同時(shí)進(jìn)行優(yōu)化更新。元學(xué)習(xí)網(wǎng)絡(luò)參數(shù)的更新方程可以解釋為:較符合元學(xué)習(xí)數(shù)據(jù)的樣本權(quán)重將被提高,而不符合元學(xué)習(xí)數(shù)據(jù)的樣本權(quán)重將被減小。基于元學(xué)習(xí)網(wǎng)絡(luò)的深度哈希檢索算法可以有效減少非平衡數(shù)據(jù)對(duì)圖像檢索的影響,并可以提高模型的魯棒性。在CIFAR-10等廣泛使用的基準(zhǔn)數(shù)據(jù)集上進(jìn)行的大量實(shí)驗(yàn)表明,在非平衡比率較大時(shí),所提算法的平均準(zhǔn)確率均值(mAP)最佳;在非平均比率為200的條件下,所提算法的mAP比中心相似度量化算法、非對(duì)稱深度監(jiān)督哈希(ADSH)算法和快速可擴(kuò)展監(jiān)督哈希(FSSH)算法分別提高0.54個(gè)百分點(diǎn),30.93個(gè)百分點(diǎn)和48.43個(gè)百分點(diǎn)。
深度學(xué)習(xí);哈希算法;非平衡數(shù)據(jù);元學(xué)習(xí);圖像檢索
在過去的二十年里,互聯(lián)網(wǎng)的快速發(fā)展為人類開創(chuàng)了一個(gè)全新的時(shí)代。特別是移動(dòng)互聯(lián)網(wǎng)的迅速發(fā)展,越來越多的用戶開始用手機(jī)或筆記本上網(wǎng),這也表明了移動(dòng)互聯(lián)網(wǎng)正逐漸滲透到人們生活、工作的各個(gè)領(lǐng)域。數(shù)據(jù)的來源有很多種,如天氣感應(yīng)器、社交媒體網(wǎng)站、網(wǎng)上銀行和手機(jī)信號(hào)。微信、支付寶、位置服務(wù)等豐富多彩的移動(dòng)互聯(lián)網(wǎng)應(yīng)用迅猛發(fā)展,正在深刻改變信息時(shí)代的社會(huì)生活。近幾年,更是實(shí)現(xiàn)了3G經(jīng)4G到5G的跨越式發(fā)展。隨著互聯(lián)網(wǎng)行業(yè)的快速發(fā)展,各行各業(yè)積累的數(shù)據(jù)都呈現(xiàn)出爆炸式增長趨勢,這些數(shù)據(jù)中不僅有文本,還有圖像、音頻、視頻。據(jù)統(tǒng)計(jì),每天網(wǎng)友在互聯(lián)網(wǎng)上傳超過10億張圖片;淘寶網(wǎng)的會(huì)員每天上傳的圖片數(shù)量超過1億;Facebook注冊(cè)用戶超過10億,每月上傳超過10億的圖片。上述例子都充分證實(shí)各行各業(yè)產(chǎn)生了大量數(shù)據(jù),人類進(jìn)入了“大數(shù)據(jù)”時(shí)代。在這個(gè)時(shí)代里,用戶如何在海量、高維的數(shù)據(jù)中高效而精確地檢索到需要的信息,成為研究的一個(gè)重點(diǎn)問題。
近幾年深度學(xué)習(xí)也被應(yīng)用于大規(guī)模圖像哈希算法中,基于深度的哈希算法要達(dá)到較好的檢索性能,需要有一個(gè)龐大且質(zhì)量好的數(shù)據(jù)集來訓(xùn)練模型;但是現(xiàn)實(shí)生活中大部分?jǐn)?shù)據(jù)集會(huì)存在數(shù)據(jù)偏差。最為典型的三種偏差為:1)類別非平衡,即極少數(shù)類別很容易被采集到,但大多數(shù)類很難被采集到;2)數(shù)據(jù)噪聲,數(shù)據(jù)本身會(huì)帶有噪聲;3)標(biāo)簽噪聲,由于獲取標(biāo)簽的代價(jià)太高而進(jìn)行簡單的網(wǎng)絡(luò)搜索來標(biāo)注數(shù)據(jù),所以會(huì)存在很多錯(cuò)誤的標(biāo)簽,也會(huì)造成訓(xùn)練上的困難。
其中,非平衡數(shù)據(jù)在現(xiàn)實(shí)生活中普遍存在,如癌癥患者診斷、破產(chǎn)預(yù)測[4]和信用卡欺詐檢測[5]等。如果數(shù)據(jù)集中的類別分布極不均衡,達(dá)到了一個(gè)或多個(gè)數(shù)據(jù)類別的樣本數(shù)量遠(yuǎn)遠(yuǎn)多于另一個(gè)或多個(gè)數(shù)據(jù)類別的樣本數(shù)量,這樣的數(shù)據(jù)就叫作類別非平衡數(shù)據(jù)。例如在醫(yī)療圖像分析中,出于診斷目的搜索相似的圖像(就相似的解剖結(jié)構(gòu)而言)可充當(dāng)“虛擬同行評(píng)審”[6]。從過去病例的檔案中檢索相似圖像對(duì)診斷圖像非常有益,但是大部分現(xiàn)有的真實(shí)數(shù)據(jù)集都存在數(shù)量大、數(shù)據(jù)類別非平衡等特點(diǎn)。當(dāng)面對(duì)非平衡數(shù)據(jù)集,模型在訓(xùn)練過程中會(huì)更傾向于關(guān)注多數(shù)類樣本,忽略少數(shù)類樣本,很容易將少數(shù)類樣本劃分為多數(shù)類,這樣會(huì)導(dǎo)致最終模型的效果下降。一般少數(shù)類樣本往往才是研究的重點(diǎn)對(duì)象,這意味著模型正確預(yù)測少數(shù)類樣本的類別標(biāo)簽的能力比多數(shù)類樣本更為重要。所以如何有效處理非平衡數(shù)據(jù)在圖像檢索中的產(chǎn)生的影響對(duì)相關(guān)研究者是一個(gè)具有挑戰(zhàn)的課題。
元學(xué)習(xí)(meta-learning),又叫l(wèi)earning to learn,即學(xué)習(xí)如何學(xué)習(xí)。良好的機(jī)器學(xué)習(xí)模型通常需要使用大量樣本進(jìn)行訓(xùn)練。相比之下,人類能夠更快、更有效地學(xué)習(xí)新的概念和技能。元學(xué)習(xí)旨在通過訓(xùn)練一些少量樣本來學(xué)習(xí)新技能。元學(xué)習(xí)的誕生促使機(jī)器學(xué)習(xí)向另一側(cè)面突進(jìn),用更接近人類和更具有效率的方式來實(shí)現(xiàn)人工智能。
受元學(xué)習(xí)[7-8]發(fā)展的啟發(fā),近年來人們提出了一些從數(shù)據(jù)中學(xué)習(xí)自適應(yīng)加權(quán)方法的方法,使學(xué)習(xí)更加自動(dòng)化和可靠[9-10]。針對(duì)數(shù)據(jù)集類別非平衡問題,本文提出了一種基于元學(xué)習(xí)(meta-learning)的深度哈希檢索算法,該算法可以直接從數(shù)據(jù)中自動(dòng)學(xué)習(xí)加權(quán)函數(shù)。該加權(quán)函數(shù)是只有一個(gè)隱含層的多層感知機(jī)(Multi-Layer Perceptron, MLP),在少量無偏差元數(shù)據(jù)的指導(dǎo)下,加權(quán)函數(shù)的參數(shù)可以和模型訓(xùn)練過程中的參數(shù)同時(shí)進(jìn)行優(yōu)化更新。它可以有效地改善模型的魯棒性,減小非平衡數(shù)據(jù)對(duì)圖像檢索效率的影響。
本文的主要工作如下:
1) 提出了一種基于元學(xué)習(xí)的哈希檢索算法,利用元學(xué)習(xí)從數(shù)據(jù)中自動(dòng)學(xué)習(xí)一個(gè)權(quán)重?fù)p失函數(shù),由于該權(quán)重網(wǎng)的通用逼近能力,它可以很好地?cái)M合權(quán)重函數(shù)。
2) 所提算法分為兩路并行網(wǎng)絡(luò)。一路是元學(xué)習(xí)網(wǎng)絡(luò)模型,一路是圖像檢索網(wǎng)絡(luò)模型。在少量無差別的元數(shù)據(jù)的指導(dǎo)下,加權(quán)函數(shù)的參數(shù)可以和模型訓(xùn)練的參數(shù)同時(shí)進(jìn)行優(yōu)化更新。通過該算法可以有效降低類別非平衡對(duì)圖像檢索準(zhǔn)確率的影響。
3) 實(shí)驗(yàn)結(jié)果表明,所提算法在基準(zhǔn)數(shù)據(jù)集上的性能優(yōu)于大部分現(xiàn)有的圖像檢索算法,證明了所提算法的有效性。
近年來,哈希算法以其在存儲(chǔ)空間和計(jì)算時(shí)間上的優(yōu)勢引起了眾多研究者的關(guān)注。目前,研究者們已經(jīng)提出許多圖像檢索哈希算法,這些算法可以分為兩大類,無監(jiān)督哈希算法和監(jiān)督哈希算法。
正文內(nèi)容無監(jiān)督哈希方法主要通過保持原始數(shù)據(jù)的幾何機(jī)構(gòu)學(xué)習(xí)哈希函數(shù),在訓(xùn)練過程中不適用任何監(jiān)督信息。Weiss等[11]在2008年提出的譜哈希(Spectral Hashing, SH)是經(jīng)典的簡潔哈希碼之一。譜哈希對(duì)圖像特征向量的編碼過程可看做是圖分割問題,首先它借助對(duì)相似圖的拉普拉斯矩陣特征值和特征向量的分析對(duì)圖分割問題提供一個(gè)松弛解,然后通過對(duì)特征向量進(jìn)行閾值化產(chǎn)生二進(jìn)制哈希碼。Gong等[12]在2011年提出的迭代量化(ITerative Quantization, ITQ)哈希[12]利用主成分分析(Principal Component Analysis, PCA)降維后,通過最小化量化誤差,學(xué)習(xí)一個(gè)旋轉(zhuǎn)矩陣,得到性能更好的哈希函數(shù)。最近,一些基于深度學(xué)習(xí)的無監(jiān)督哈希被相繼提出。Shen等[13]在2018年提出了相似性自適應(yīng)離散優(yōu)化哈希(Similarity-Adaptive and Discrete optimization Hashing, SADH)。該方法交替地保留數(shù)據(jù)相似性并加強(qiáng)哈希碼和深度哈希函數(shù)的兼容性。Greedy Hash設(shè)計(jì)了一個(gè)哈希編碼層,使特征從歐幾里得空間編碼到漢明空間編碼時(shí)的余弦距離差最小。它采用貪心原則解決優(yōu)化問題。結(jié)合深度表示和哈希學(xué)習(xí),無監(jiān)督深度哈希方法可以提高圖像哈希碼的表示能力。然而,目前的非監(jiān)督深度哈希學(xué)習(xí)的二值哈希碼仍存在判別語義不足的問題。
對(duì)于超參數(shù)的選擇,一般監(jiān)督哈希算法通過利用監(jiān)督信息(例如類標(biāo)簽、成對(duì)相似性或數(shù)據(jù)點(diǎn)的相對(duì)相似性)來學(xué)習(xí)二進(jìn)制哈希碼。傳統(tǒng)學(xué)習(xí)中為了處理線性不可分的問題,Kulis等[14]提出了二進(jìn)制重建嵌入(Binary Reconstructive Embeddings, BRE)。有內(nèi)核的監(jiān)督哈希(Supervised Hashing with Kernels, KSH)[15]的設(shè)計(jì)理念是讓相似的數(shù)據(jù)對(duì)應(yīng)的哈希碼之間的漢明距離盡可能地小。在優(yōu)化的時(shí)候,KSH采用一次優(yōu)化哈希碼的一位的按位優(yōu)化策略,最終生成簡短而有效的哈希碼。監(jiān)督離散哈希(Supervised Discrete Hashing, SDH)[16]通過設(shè)計(jì)新的目標(biāo)函數(shù),并使用循環(huán)坐標(biāo)下降法來離散地求解哈希碼。傳統(tǒng)方法中特征提取主要依賴人工設(shè)計(jì)的提取器,需要有專業(yè)知識(shí)及復(fù)雜的調(diào)參過程,同時(shí)每個(gè)方法都是針對(duì)具體應(yīng)用,泛化能力及魯棒性較差,因此研究者提出了基于深度學(xué)習(xí)的跨媒體哈希方法。深層語義排列哈希(Deep Semantic-preserving and Ranking-based Hashing, DSRH)[17]提出了深度多標(biāo)簽圖像檢索任務(wù)中的語義排序問題,設(shè)計(jì)了一個(gè)采用三元排序損失函數(shù)進(jìn)行訓(xùn)練的深度哈希方法。深度監(jiān)督哈希(Deep Supervised Hashing, DSH)[18]設(shè)計(jì)了一種卷積神經(jīng)網(wǎng)絡(luò)架構(gòu)。采用成對(duì)的圖像作為模型訓(xùn)練的輸入,同時(shí)對(duì)實(shí)值輸出進(jìn)行正則化以逼近所需的離散值。深度監(jiān)督離散哈希(Deep Supervised Discrete Hashing, DSDH)[19]設(shè)計(jì)了一種同時(shí)利用分類信息和相似關(guān)系作為監(jiān)督信息進(jìn)行哈希學(xué)習(xí)的深度哈希算法。在這些方法中,在明確語義標(biāo)簽的監(jiān)督下,學(xué)習(xí)到的哈希碼可以獲得識(shí)別能力。雖然這些有監(jiān)督信息的哈希算法已經(jīng)在檢索方面取得了較好的效果,但是它們并未考慮到圖像檢索中的非平衡數(shù)據(jù)的問題,在檢索任務(wù)中出現(xiàn)非平衡數(shù)據(jù)集時(shí)有可能會(huì)降低圖像檢索的性能。本文針對(duì)圖像檢索中的非平衡問題展開研究,利用元學(xué)習(xí)算法來減小非平衡數(shù)據(jù)在圖像檢索中產(chǎn)生的影響。
面對(duì)非平衡數(shù)據(jù)集,常用的辦法是對(duì)樣本進(jìn)行重加權(quán)[7,20]。該方法是給每一個(gè)樣本誤差前面加上一個(gè)權(quán)重,權(quán)重的作用就是放大或縮小樣本發(fā)揮的作用。其次還可以對(duì)樣本進(jìn)行重采樣[21],對(duì)樣本數(shù)量較少的類別進(jìn)行過采樣,但容易擬合到少數(shù)類別的樣本,無法學(xué)到更魯棒易泛化的特征,往往在非常不平衡數(shù)據(jù)集上表現(xiàn)更差。對(duì)樣本數(shù)量較多的類別進(jìn)行欠采樣,但這樣會(huì)造成該類別的信息損失嚴(yán)重,導(dǎo)致欠擬合的發(fā)生。單類學(xué)習(xí)的主要思想是只訓(xùn)練多數(shù)類樣本,進(jìn)而形成一個(gè)對(duì)該類別的數(shù)據(jù)模型,最后從測試樣本中識(shí)別出多數(shù)類樣本。單類支持向量機(jī)(One-class Support Vector Machine, One-class SVM)[22]在高維特征空間中得到一個(gè)最優(yōu)超平面實(shí)現(xiàn)多數(shù)類別與坐標(biāo)原點(diǎn)的最大分離,僅需要多數(shù)類別數(shù)據(jù)集作為訓(xùn)練樣本,在一定程度上可以減少時(shí)耗,但容易引起對(duì)訓(xùn)練集中少數(shù)類樣本的過擬合而導(dǎo)致泛化能力下降。
非平衡數(shù)據(jù)在現(xiàn)實(shí)生活中是常見的,在分類領(lǐng)域中有許多工作關(guān)注這個(gè)問題[23-24],但據(jù)作者所知,在檢索領(lǐng)域很少有人關(guān)注。以下章節(jié)會(huì)詳細(xì)介紹我們關(guān)于非平衡數(shù)據(jù)問題所做的工作。
為了解決圖像檢索中的非平衡數(shù)據(jù)集問題,本文提出了一種基于元學(xué)習(xí)網(wǎng)絡(luò)的深度哈希檢索算法。通過設(shè)計(jì)一個(gè)從訓(xùn)練損失到樣本權(quán)值的加權(quán)函數(shù),該方法不需要手動(dòng)預(yù)先指定權(quán)重函數(shù)及額外的超參數(shù),可以直接從數(shù)據(jù)中自適應(yīng)地學(xué)習(xí)顯式加權(quán)函數(shù)。
本文算法的目標(biāo)是在元學(xué)習(xí)過程中自動(dòng)學(xué)習(xí)超參數(shù)。為了實(shí)現(xiàn)這一目標(biāo),本文算法將當(dāng)作一個(gè)只有一層隱藏層的多層感知機(jī),該多層感知機(jī)包含100個(gè)節(jié)點(diǎn),如圖1所示。
本文稱這個(gè)權(quán)重網(wǎng)絡(luò)為元學(xué)習(xí)權(quán)重網(wǎng)絡(luò),其中每一個(gè)隱藏節(jié)點(diǎn)使用ReLU(Rectified Linear Unit)激活函數(shù),輸出使用Sigmoid激活函數(shù),保證輸出位于[0,1]區(qū)間。盡管簡單,這個(gè)網(wǎng)絡(luò)被認(rèn)為是幾乎任何連續(xù)函數(shù)的通用逼近器,因此可以適應(yīng)廣泛的權(quán)重函數(shù),包括傳統(tǒng)研究中使用的權(quán)重函數(shù)。
其中是步長。
圖像檢索生成哈希碼過程 文獻(xiàn)[25-26]中的研究表明,輸入圖像的全連接層的6~8層的特征激活可以作為視覺特征,該特征在小規(guī)模的圖像檢索、類別中取得不錯(cuò)的性能。當(dāng)面對(duì)大規(guī)模數(shù)據(jù)時(shí),由于該特征是高維向量,會(huì)大幅度影響檢索的效率和性能。研究者提出將特征向量轉(zhuǎn)化為二進(jìn)制編碼的方法,可以降低計(jì)算代價(jià),減少存儲(chǔ)空間。轉(zhuǎn)化為二進(jìn)制編碼后,可以利用漢明距離或哈希進(jìn)行快速比較。
代表檢索數(shù)據(jù)集。每幅圖像對(duì)應(yīng)的二進(jìn)制編碼,。
Fig .3Image retrieval process
在本章中,首先介紹3個(gè)常見的圖像數(shù)據(jù)集,然后在公共數(shù)據(jù)集CIFAR-10、CIFAR-100和STL-10上展示了本文算法的實(shí)驗(yàn)結(jié)果,并與幾種方法作比較。
CIFAR-100數(shù)據(jù)集與CIFAR-10數(shù)據(jù)集相似,不同的是它有100類圖像。CIFAR-100中的100個(gè)類被分成20個(gè)超類。每個(gè)圖像都帶有一個(gè)“精細(xì)”標(biāo)簽(它所屬的類)和一個(gè)“粗糙”標(biāo)簽(它所屬的超類)。
依據(jù)鄰域內(nèi)相關(guān)工作,本文采用廣泛使用的評(píng)價(jià)標(biāo)準(zhǔn):平均準(zhǔn)確率均值(mean Average Precision, mAP),現(xiàn)已廣泛應(yīng)用在哈希檢索研究中[27]。準(zhǔn)確率(Precision)僅考慮返回樣本中正確的樣本數(shù)量,并沒有考慮正確樣本的順序。準(zhǔn)確率的定義如下:
對(duì)于一個(gè)檢索系統(tǒng)來說,返回的樣本是有先后順序的,而且越相似的樣本排序越靠前越好。因此學(xué)者們提出了平均查準(zhǔn)率(Average Precision, AP)的概念,AP的定義如下:
本文選取幾種較為先進(jìn)的方法進(jìn)行對(duì)比實(shí)驗(yàn)。這些方法包括快速可擴(kuò)展監(jiān)督哈希(Fast Scalable Supervised Hashing, FSSH)算法[28]、非對(duì)稱深度監(jiān)督哈希(Asymmetric Deep Supervised Hashing, ADSH)算法[29]、中心相似度量化(Central Similarity Quantization for efficient image and video retrieval, CSQ)算法[30]等。對(duì)于上述方法,本文使用原作者提供的代碼。所有方法的參數(shù)均按照其論文中的建議設(shè)置。對(duì)于CIFAR-10和CIFAR-100數(shù)據(jù)集,本文及對(duì)比方法均使用50 000幅訓(xùn)練圖像和10 000幅測試圖像。對(duì)于STL-10數(shù)據(jù)集使用5 000幅訓(xùn)練圖像和8 000幅測試圖像。所有實(shí)驗(yàn)均進(jìn)行多次,最終的實(shí)驗(yàn)結(jié)果是在多次實(shí)驗(yàn)的基礎(chǔ)上取平均得到。
實(shí)驗(yàn)環(huán)境是一臺(tái)服務(wù)器,該機(jī)器的相關(guān)信息如下:Intel Xeno CPU E5-2609 v4@1.70 GHz,32 GB內(nèi)存。
由于本文基于元學(xué)習(xí)網(wǎng)絡(luò)的深度哈希算法主要針對(duì)圖像檢索中的非平衡問題,在3個(gè)常用數(shù)據(jù)集(CIFAR-10、STL-10和CIFAR-100)上,先在平衡數(shù)據(jù)分布下測試4種碼長(16 bit,32 bit,48 bit,64 bit)的碼的mAP@all;其次在非平衡數(shù)據(jù)分布下將哈希碼位數(shù)設(shè)置為32位,數(shù)據(jù)數(shù)量設(shè)置為5種情況,非平衡比率分別為200、100、50、20、10。非平衡比率指的是數(shù)據(jù)量最大的類別和數(shù)據(jù)量最小的類別的比率,實(shí)驗(yàn)設(shè)置完畢后再測試32 bit的mAP@all。
為了證明所提出的深度哈希算法的有效性,本文在平衡數(shù)據(jù)集上測試4種碼長(16 bit,32 bit,48 bit,64 bit)的碼的mAP@all。實(shí)驗(yàn)結(jié)果如表1所示。
從表1的實(shí)驗(yàn)結(jié)果可以觀察到如下的現(xiàn)象:
1) 本文提出的基于元學(xué)習(xí)的深度哈希算法在大部分情況下取得最佳的檢索效果,這證明了本文算法是有效的。
2) 在對(duì)比算法中FSSH的代碼是Matlab版本,ADSH、CSQ及本文算法的代碼是Python版本。在訓(xùn)練時(shí)間方面,F(xiàn)SSH需要最少的時(shí)間就可以完成訓(xùn)練,但是在CIFAR-10和STL-10(類別數(shù)是10)數(shù)據(jù)集上的檢索效果比其他算法差很多,在分類數(shù)較多的CIFAR-100數(shù)據(jù)集(類別數(shù)是100)上的檢索效果比ADSH的效果好。
3)盡管ADSH、FSSH和CSQ的訓(xùn)練時(shí)間比本文提出的算法短,但是它們的檢索結(jié)果遠(yuǎn)差于基于元學(xué)習(xí)的深度哈希算法。因此,在犧牲一點(diǎn)額外時(shí)間的代價(jià)下,本文算法可以更好地完成在3個(gè)基準(zhǔn)數(shù)據(jù)集上的檢索任務(wù)。
表1平衡數(shù)據(jù)集上4種碼長的碼的mAP@all 單位:%
Tab.1 mAP@all of four hash codes with different lengths on balanced datasets unit:%
在非平衡數(shù)據(jù)集上進(jìn)行了對(duì)比,實(shí)驗(yàn)結(jié)果如表2所示。
從表2的實(shí)驗(yàn)結(jié)果可以觀察到如下的現(xiàn)象:
1) 在CIFAR-10和STL-10數(shù)據(jù)集上,ADSH在非平衡率為10的情況下的檢索結(jié)果優(yōu)于其他3種算法,但是隨著非平衡比率的增大,可以清晰看出本文算法的檢索結(jié)果明顯優(yōu)于其他3種對(duì)比方法。
2) 在CIFAR-100數(shù)據(jù)集上,ADSH、FSSH、CSQ以及本文算法的檢索數(shù)據(jù)結(jié)果都比較低,造成這種結(jié)果的原因可能是CIFAR-100分類數(shù)較多,該數(shù)據(jù)集有100個(gè)類別。在這種情況下,本文算法的實(shí)驗(yàn)結(jié)果依然優(yōu)于其他對(duì)比算法,可以說明基于元學(xué)習(xí)的深度哈希算法可以有效減小非平衡數(shù)據(jù)在圖像檢索中產(chǎn)生的影響。
互聯(lián)網(wǎng)的快速發(fā)展帶來了大規(guī)模的圖像數(shù)據(jù),如何從海量的數(shù)據(jù)中搜索到用戶需要的圖像成為一個(gè)迫切需要解決的問題。近年來,基于深度學(xué)習(xí)的哈希算法被廣泛應(yīng)用于圖像檢索?;谏疃葘W(xué)習(xí)的哈希算法要想達(dá)到較好的檢索性能,需要一定數(shù)量的高質(zhì)量訓(xùn)練數(shù)據(jù)來訓(xùn)練模型。但是大部分真實(shí)數(shù)據(jù)集存在數(shù)據(jù)類別非平衡問題,即樣本數(shù)量較少的類別容易被忽略。為減小非平衡數(shù)據(jù)在圖像檢索中產(chǎn)生的影響,本文提出了一種基于元學(xué)習(xí)網(wǎng)絡(luò)的深度哈希算法,該算法可以直接從數(shù)據(jù)中自動(dòng)學(xué)習(xí)加權(quán)函數(shù),該權(quán)重函數(shù)是只有一個(gè)隱含層的多層感知機(jī),在少量無偏差元數(shù)據(jù)的指導(dǎo)下,加權(quán)函數(shù)的參數(shù)可以和模型訓(xùn)練過程的參數(shù)同時(shí)進(jìn)行優(yōu)化更新。在算法的理論基礎(chǔ)上,本文進(jìn)行了大量的實(shí)驗(yàn)證明,并與多種對(duì)比算法進(jìn)行比較。實(shí)驗(yàn)結(jié)果表明,基于元學(xué)習(xí)網(wǎng)絡(luò)的深度哈希檢索算法能夠有效減少長尾數(shù)據(jù)對(duì)圖像檢索的影響,并提高模型的魯棒性。未來,在此基礎(chǔ)上還可對(duì)該算法進(jìn)行深入探討,改善網(wǎng)絡(luò)模型,爭取更有效地減小長尾數(shù)據(jù)在圖像檢索任務(wù)中的影響。
[1] 張楚涵,張家僑,馮劍琳. AKNN-Qalsh: PostgreSQL系統(tǒng)高維空間近似最近鄰檢索插件[J]. 中山大學(xué)學(xué)報(bào)(自然科學(xué)版), 2019, 58(3): 79-85.(ZHANG C H, ZHANG J Q, FENG J L. AKNN-Qalsh: an approximate KNN search extension for high-dimensional data in PostgreSQL[J]. Acta Scientiarum Naturalium Universitatis Sunyatseni, 2019, 58(3):79-85.)
[2] 陳誠,鄒煥新,邵寧遠(yuǎn),等. 面向遙感影像的深度語義哈希檢索[J]. 中國圖象圖形學(xué)報(bào), 2018, 24(4): 655-663.(CHEN C, ZOU H X, SHAO N Y, et al. Deep semantic Hashing retrieval of remote sensing images[J]. Journal of Image and Graphics, 2019, 24(4): 655-663.)
[3] DATAR M, IMMORLICA N, INDYK P, et al. Locality-sensitive hashing scheme based on-stable distributions[C]// Proceedings of the 20th Annual Symposium on Computational Geometry. New York: ACM, 2004: 253-262.
[4] 康松林,劉楚楚,樊曉平,等. WOS-ELM算法在入侵檢測中的研究[J]. 小型微型計(jì)算機(jī)系統(tǒng), 2015, 36(8): 1779-1783.(KANG S L, LIU C C, FAN X P, et al. Research on intrusion detection based on WOS-ELM algorithm[J]. Journal of Chinese Computer Systems, 2015, 36(8): 1779-1783.)
[5] ZI?BA M, TOMCZAK S K, TOMCZAK J M. Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction[J]. Expert Systems with Applications, 2016, 58: 93-101.
[6] KHATAMI A, BABAIE M, KHOSRAVI A, et al. Parallel deep solutions for image retrieval from imbalanced medical imaging archives[J]. Applied Soft Computing, 2018, 63: 197-205.
[7] LAKE B M, SALAKHUTDINOV R, TENENBAUM J B. Human-level concept learning through probabilistic program induction[J]. Science, 2015, 350(6266): 1332-1338.
[8] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]// Proceedings of the 34th International Conference on Machine Learning. New York: JMLR.org, 2017: 1126-1135.
[9] JIANG L, ZHOU Z Y, LEUNG T, et al. MentorNet: learning data-driven curriculum for very deep neural networks on corrupted labels[C]// Proceedings of the 35th International Conference on Machine Learning. New York: JMLR.org, 2018: 2304-2313.
[10] WU L J, TIAN F, XIA Y C, et al. Learning to teach with dynamic loss functions[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2018: 6467-6478.
[11] WEISS Y, TORRALBA A, FERGUS R. Spectral hashing[C]// Proceedings of the 21st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2008: 1753-1760.
[12] GONG Y C, LAZEBNIK S, GORDO A, et al. Iterative quantization a procrustean approach to learning binary codes for large-scale image retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2916-2929.
[13] SHEN F M, XU Y, LIU L, et al. Unsupervised deep hashing with similarity-adaptive and discrete optimization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(12): 3034-3044.
[14] KULIS B, DARRELL T. Learning to hash with binary reconstructive embeddings[C]// Proceedings of the 22nd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2009: 1042-1050.
[15] LIU W, WANG J, JI R R, et al. Supervised hashing with kernels[C]// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2012: 2074-2081.
[16] SHEN F M, SHEN C H, LIU W, et al. Supervised discrete hashing[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 37-45.
[17] YAO T, LONG F C, MEI T, et al. Deep semantic-preserving and ranking-based hashing for image retrieval[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2016: 3931-3937.
[18] LIU H M, WANG R P, SHAN S G, et al. Deep supervised hashing for fast image retrieval[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2064-2072.
[19] LI Q, SUN Z N, HE R, et al. Deep supervised discrete hashing[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 2479-2488.
[20] DONG Q, GONG S G, ZHU X T. Class rectification hard mining for imbalanced deep learning[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 1869-1878.
[21] LIU X Y, WU J X, ZHOU Z H. Exploratory undersampling for class-imbalance learning[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009, 39(2): 539-550.
[22] MALDONADO S, MONTECINOS C. Robust classification of imbalanced data using one-class and two-class SVM-based multiclassifiers[J]. Intelligent Data Analysis, 2014, 18(1): 95-112.
[23] ZHANG Z L, LUO X G, GARCíA S, et al. Cost-sensitive back-propagation neural networks with binarization techniques in addressing multi-class problems and non-competent classifiers[J]. Applied Soft Computing, 2017, 56: 357-367.
[24] SUN Y, LI Z L, LI X W, et al. Classifier selection and ensemble model for multi-class imbalance learning in education grants prediction[J]. Applied Artificial Intelligence, 2021, 35(4): 290-303.
[25] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2012: 1097-1105.
[26] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587.
[27] ZHEN Y, YEUNG D Y. A probabilistic model for multimodal hash function learning[C]// Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2012: 940-948.
[28] LUO X, NIE L Q, HE X G, et al. Fast scalable supervised hashing[C]// Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2018: 735-744.
[29] JIANG Q Y, LI W J. Asymmetric deep supervised hashing[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2018: 3342-3349.
[30] YUAN L, WANG T, ZHANG X P, et al. Central similarity quantization for efficient image and video retrieval [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 3080-3089.
[31] WANG J, KUMAR S, CHANG S F. Semi-supervised hashing for large-scale search[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(12): 2393-2406.
[32] GUI J, LIU T L, SUN Z N, et al. Fast supervised discrete hashing[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(2): 490-496.
[33] HUANG C, LI Y N, LOY C C, et al. Learning deep representation for imbalanced classification[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 5375-5384.
[34] ZHAO F, HUANG Y Z, WANG L, et al. Deep semantic ranking based hashing for multi-label image retrieval[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1556-1564.
[35] YANG H F, LIN K, CHEN C S. Supervised learning of semantics-preserving hash via deep convolutional neural networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(2): 437-451.
[36] LI X, LIN G S, SHEN C H, et al. Learning hash functions using column generation[C]// Proceedings of the 30th International Conference on Machine Learning. New York: JMLR.org, 2013: 142-150.
[37] ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks[C]// Proceedings of the 2014 European Conference on Computer Vision, LNCS 8689. Cham: Springer, 2014: 818-833.
[38] OQUAB M, BOTTOU L, LAPTEV I, et al. Learning and transferring mid-level image representations using convolutional neural networks[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 1717-1724.
[39] WANG J D, ZHANG T, SONG J K, et al. A survey on learning to hash[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 769-790.
[40] LAI H J, PAN Y, LIU Y, et al. Simultaneous feature learning and hash coding with deep neural networks[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 3270-3278.
[41] SHU J, XIE Q, YI L X, et al. Meta-weight-net: learning an explicit mapping for sample weighting[C/OL]// Proceedings of the 2019 Conference and Workshop on Neural Information Processing Systems. [2021-02-21].https://papers.nips.cc/paper/2019/file/e58cc5ca94270acaceed13bc82dfedf7-Paper.pdf.
[42] LIN K, YANG H F, HSIAO J H, et al. Deep learning of binary hash codes for fast image retrieval[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 27-35.
[43] LU X Q, ZHENG X T, LI X L. Latent semantic minimal hashing for image retrieval[J]. IEEE Transactions on Image Processing, 2017, 26(1): 355-368.
[44] LIN K, LU J W, CHEN C S, et al. Learning compact binary descriptors with unsupervised deep neural networks[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1183-1192.
[45] NI B B, YAN S C, KASSIM A. Learning a propagable graph for semisupervised learning: classification and regression[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(1): 114-126.
[46] 柯圣財(cái),趙永威,李弼程,等. 基于卷積神經(jīng)網(wǎng)絡(luò)和監(jiān)督核哈希的圖像檢索方法[J]. 電子學(xué)報(bào), 2017, 45(1):157-163.(KE S C, ZHAO Y W, LI B C, et al. Image retrieval based on convolutional neural network and kernel-based supervised Hashing[J]. Acta Electronica Sinica, 2017, 45(1): 157-163.)
[47] 王珊,王會(huì)舉,覃雄派,等. 架構(gòu)大數(shù)據(jù):挑戰(zhàn)、現(xiàn)狀與展望[J]. 計(jì)算機(jī)學(xué)報(bào), 2011, 34(10): 1741-1752.(WANG S, WANG H J, QIN X P, et al. Architecting big data: challenges, studies and forecasts[J]. Chinese Journal of Computers, 2011, 34(10): 1741-1752.)
[48] 艾列富,于俊清,管濤,等. 大規(guī)模圖像特征檢索中查詢結(jié)果的自適應(yīng)過濾[J]. 計(jì)算機(jī)學(xué)報(bào), 2015, 38(1): 122-132.(AI L F, YU J Q, GUAN T, et al. Adaptively filtering query results for large scale image feature retrieval[J]. Chinese Journal of Computers, 2015, 38(1): 122-132.)
[49] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357.
HAN Yaru, born in 1995, M. S. candidate. Her research interests include multimedia image retrieval, artificial intelligence, machine learning.
YAN Lianshan, born in 1971, Ph. D., professor. His research interests include information photonics and future communication network, internet of things and industrial internet, artificial intelligence.
YAO Tao, born in 1981, Ph. D., associate professor. His research interests include multimedia analysis and computing, computer vision, machine learning.
Deep hashing retrieval algorithm based on meta-learning
HAN Yaru1, YAN Lianshan2*, YAO Tao1
(1,,264025,;2,,611756,)
With the development of mobile Internet technology, the scale of image data is getting larger and larger, and the large-scale image retrieval task has become an urgent problem. Due to the fast retrieval speed and very low storage consumption, the hashing algorithm has
extensive attention from researchers. Deep learning based hashing algorithms need a certain amount of high-quality training data to train the model to improve the retrieval performance. However, the existing hashing methods usually ignore the problem of imbalance of data categories in the dataset, which may reduce the retrieval performance. Aiming at this problem, a deep hashing retrieval algorithm based on meta-learning network was proposed, which can automatically learn the weighting function directly from the data. The weighting function is a Multi-Layer Perceptron (MLP) with only one hidden layer. Under the guidance of a small amount of unbiased meta data, the parameters of the weighting function were able to be optimized and updated simultaneously with the parameters during model training process. The updating equations of the meta-learning network parameters were able to be explained as: increasing the weights of samples which are consistent with the meta-learning data, and reducing the weights of samples which are not consistent with the meta-learning data. The impact of imbalanced data on image retrieval was able to be effectively reduced and the robustness of the model was able to be improved through the deep hashing retrieval algorithm based on meta-learning network. A large number of experiments were conducted on widely used benchmark datasets such as CIFAR-10. The results show that the mean Average Precision (mAP) of the hashing algorithm based on meta-learning network is the highest with large imbalanced rate;especially, under the condition of imbalanced ratio=200, the mAP of the proposed algorithm is 0.54 percentage points,30.93 percentage points and 48.43 percentage points higher than those of central similarity quantization algorithm, Asymmetric Deep Supervised Hashing (ADSH) algorithm and Fast Scalable Supervised Hashing (FSSH) algorithm.
deep learning; hashing algorithm; imbalanced data; meta-learning; image retrieval
This work is partially supported by National Natural Science Foundation of China (61872170).
1001-9081(2022)07-2015-07
10.11772/j.issn.1001-9081.2021040660
2021?04?25;
2021?09?01;
2021?09?07。
國家自然科學(xué)基金資助項(xiàng)目(61872170)。
TP183
A
韓亞茹(1995—),女,山東濟(jì)南人,碩士研究生,主要研究方向:多媒體圖像檢索、人工智能、機(jī)器學(xué)習(xí); 閆連山(1971—),男,山東煙臺(tái)人,教授,博士,主要研究方向:信息光子學(xué)與未來通信網(wǎng)絡(luò)、物聯(lián)網(wǎng)與工業(yè)互聯(lián)網(wǎng)、人工智能; 姚濤(1981—),男,山東煙臺(tái)人,副教授,博士,主要研究方向:多媒體分析與計(jì)算、計(jì)算機(jī)視覺、機(jī)器學(xué)習(xí)。