盛俊 顧沈勝 陳崚
摘 要:社會(huì)網(wǎng)絡(luò)頂點(diǎn)分類在解決實(shí)際問題中有廣泛的應(yīng)用,但絕大多數(shù)現(xiàn)有的網(wǎng)絡(luò)頂點(diǎn)分類算法都集中在無符號(hào)的網(wǎng)絡(luò),而在邊上具有符號(hào)的社交網(wǎng)絡(luò)上的頂點(diǎn)分類算法卻很少,且負(fù)鏈接對(duì)于符號(hào)網(wǎng)絡(luò)分析的作用大于正鏈接。研究了符號(hào)網(wǎng)絡(luò)中頂點(diǎn)的分類問題。首先將正、負(fù)網(wǎng)絡(luò)映射到相對(duì)應(yīng)的隱空間,提出基于隱空間的正負(fù)鏈接的數(shù)學(xué)模型;然后提出優(yōu)化該模型的迭代算法,通過對(duì)隱空間矩陣和映射矩陣的迭代優(yōu)化,來對(duì)網(wǎng)絡(luò)中的頂點(diǎn)進(jìn)行分類。由帶符號(hào)的社會(huì)網(wǎng)絡(luò)數(shù)據(jù)集的實(shí)驗(yàn)結(jié)果證明,該算法在數(shù)據(jù)集Epinions上得到結(jié)果的F1值在11以上,在數(shù)據(jù)集Slashdo上得到結(jié)果的F1值在23.8以上,與隨機(jī)算法相比具有較高的精確度。
關(guān)鍵詞:帶符號(hào)網(wǎng)絡(luò);隱空間;映射;頂點(diǎn)分類
中圖分類號(hào):TP393.02; TP393.094
文獻(xiàn)標(biāo)志碼:A
Abstract: Social network node classification is widely used in solving practical problems. Most of the existing network node classification algorithms focus on unsigned social networks,
while node classification algorithms on social networks with symbols on edges are rare. Based on the fact that the negative links contribute more on signed network analysis than the positive links. The classification of nodes on signed networks was studied. Firstly, positive and negative networks were projected to the corresponding latent spaces, and a mathematical model was proposed based on positive and negative links in the latent spaces. Then, an iterative algorithm was proposed to optimize the model, and the iterative optimization of latent space matrix and projection matrix was used to classify the nodes in the network. The experimental results on the dataset of the signed social network show that the F1 value of the classification results by the proposed algorithm is higher than 11 on Epinions dataset, and that is higher than 23.8 on Slashdo dataset,which indicate that the proposed algorithm has higher accuracy than random algorithm.
英文關(guān)鍵詞Key words: signed network; latent space; projection; node classification
0 引 言
隨著如Facebook、Twitter、LinkedIn、Epinions等一系列社會(huì)網(wǎng)絡(luò)網(wǎng)站的迅速崛起,研究者將大量的精力投入到了社會(huì)機(jī)制的研究當(dāng)中[1],來分析用戶的體驗(yàn)和行為模式。傳統(tǒng)的社會(huì)網(wǎng)絡(luò)分析主要只考慮了如Facebook和MySpace等無標(biāo)記的社會(huì)網(wǎng)絡(luò),這些網(wǎng)絡(luò)可以化作圖模型,其中頂點(diǎn)代表實(shí)體,帶權(quán)重的邊就代表實(shí)體之間是否存在關(guān)系以及這個(gè)關(guān)系的重要性[2]。最近,對(duì)帶符號(hào)的有向社會(huì)網(wǎng)絡(luò)的研究正逐步興起。在帶符號(hào)的社會(huì)網(wǎng)絡(luò)中,用戶之間的關(guān)系既可以是正的(表明用戶之間的信任),也可以是負(fù)的(表明用戶之間的關(guān)系是不信任),比如:在Epinions網(wǎng)絡(luò)[3]中用戶可以根據(jù)他們的評(píng)價(jià)來決定信任或者不信任其他用戶;在Slashdot網(wǎng)絡(luò)[4]這個(gè)主要關(guān)注與技術(shù)相關(guān)新聞的網(wǎng)站上,用戶們可以根據(jù)對(duì)文章的評(píng)論相互點(diǎn)擊成為朋友(喜歡)或者敵人(不喜歡)。這些帶符號(hào)的有向的社會(huì)網(wǎng)絡(luò)都可以用不對(duì)稱的鄰接矩陣來表示: 如果其中的元素是1,那么這兩個(gè)實(shí)體間的關(guān)系為正; 如果元素是-1,那么就說明這兩個(gè)實(shí)體間的關(guān)系為負(fù); 0則表示缺失。
在過去的十年中,在線社交網(wǎng)絡(luò)的出現(xiàn)產(chǎn)生了大量關(guān)于用戶的個(gè)人信息,如他們的活動(dòng)、個(gè)人或團(tuán)體之間的聯(lián)系,以及他們的意見和想法等。這些信息使得社交網(wǎng)站在社會(huì)認(rèn)同發(fā)展、社會(huì)交友活動(dòng)中起了很大的作用[5]。這些數(shù)據(jù)中的有些具有類別性質(zhì)的維度可以被建模為與個(gè)體相關(guān)的標(biāo)簽。這些標(biāo)簽在網(wǎng)絡(luò)結(jié)構(gòu)中表示為頂點(diǎn)上的類號(hào),這些類號(hào)將網(wǎng)絡(luò)的頂點(diǎn)劃分成若干個(gè)類。這些類號(hào)有多種形式:人口標(biāo)簽(如年齡、性別和出生地點(diǎn)等); 政治或信仰標(biāo)簽(如政治黨派、宗教信仰等); 其他還有興趣愛好的標(biāo)簽、工作單位標(biāo)簽等。標(biāo)簽通常在網(wǎng)絡(luò)上的用戶配置文件中顯示,或用于關(guān)聯(lián)到其他網(wǎng)絡(luò)中的相同類的對(duì)象(如照片、視頻等)。
然而,由于大多數(shù)社交媒體用戶都不愿意分享他們的信息,社交媒體網(wǎng)站只能收集非常有限的用戶信息[6],例如超過90%的Facebook用戶不愿意透露他們的政治觀點(diǎn)[6]。因此,社交網(wǎng)絡(luò)中有部分用戶所屬的類別是不知道的,即這些頂點(diǎn)的標(biāo)簽是未知的。網(wǎng)絡(luò)頂點(diǎn)分類就是要從已知用戶的信息以及網(wǎng)絡(luò)的鏈接結(jié)構(gòu)來推斷位置用戶的標(biāo)簽信息[7-8]。這個(gè)問題不同于傳統(tǒng)的分類問題,也不能直接用傳統(tǒng)的機(jī)器學(xué)習(xí)的分類方法來解決。傳統(tǒng)的基于機(jī)器學(xué)習(xí)的分類方法往往假設(shè)對(duì)象是獨(dú)立的,只是采用傳統(tǒng)的統(tǒng)計(jì)推斷過程來進(jìn)行分類,這樣會(huì)導(dǎo)致推斷的結(jié)果不精確;而在基于網(wǎng)絡(luò)的頂點(diǎn)分類中,必須充分利用頂點(diǎn)間鏈接所反映的潛在的相關(guān)性結(jié)果來提高分類的質(zhì)量。
許多現(xiàn)實(shí)世界的問題可以被建模為無符號(hào)社交網(wǎng)絡(luò)中的頂點(diǎn)分類問題,它的目標(biāo)是通過給網(wǎng)絡(luò)提供一些有標(biāo)簽的用戶[9]來預(yù)測(cè)未標(biāo)記的社交網(wǎng)絡(luò)中的未標(biāo)記頂點(diǎn)的標(biāo)簽。近年來,人們已經(jīng)提出一些關(guān)于頂點(diǎn)分類問題的方法[10],這些算法主要有基于局部分類器的方法和隨機(jī)游走方法[11]、基于局部分類器分類方法、使用局部社區(qū)信息來學(xué)習(xí)本地分類器、基于隨機(jī)游走的方法模擬粒子在網(wǎng)絡(luò)上進(jìn)行隨機(jī)游動(dòng)來進(jìn)行頂點(diǎn)分類等: 如Lu等[12]提出標(biāo)簽傳播方法進(jìn)行網(wǎng)絡(luò)頂點(diǎn)的分類; Azran等[13]利用隨機(jī)游走的半監(jiān)督學(xué)習(xí)的方法進(jìn)行網(wǎng)絡(luò)頂點(diǎn)的多分類; Zhu等[14]利用基于高斯場(chǎng)合調(diào)和函數(shù)的半監(jiān)督學(xué)習(xí)的方法進(jìn)行頂點(diǎn)分類; Zhou等[15-16] 應(yīng)用圖的正則化方法,提出了有向圖上的基于監(jiān)督學(xué)習(xí)的頂點(diǎn)分類方法; Baluja等[17]提出了吸附(adsorption)方法進(jìn)行頂點(diǎn)分類,并應(yīng)用于YouTube網(wǎng)站的視頻推薦之中; Nandanwar 等 [18]提出了一種基于鄰居的結(jié)構(gòu)的分類器,采用了一種基于懶惰隨機(jī)游走的方法,根據(jù)頂點(diǎn)的度等結(jié)構(gòu)特征,來決定鄰居的類號(hào); Taskar等[19]提出一種基于圖的概率模型方法,使用判別分析來進(jìn)行圖上關(guān)系數(shù)據(jù)的分類; 王小攀等[20]提出了一種基于屬類概率選擇圖近鄰結(jié)構(gòu)的方法, 利用高斯核函數(shù)計(jì)算近鄰點(diǎn)的邊權(quán)值,構(gòu)造屬類概率圖,并將這種屬類概率圖與K近鄰(KNearest Neighbors, KNN)圖線性組合并嵌入標(biāo)簽傳遞算法框架中,得到一種基于雙圖結(jié)構(gòu)的標(biāo)簽傳遞算法; 邵海軍[21]提出基于Web數(shù)據(jù)修補(bǔ)技術(shù)與新型蟻群算法相結(jié)合的弱關(guān)聯(lián)Web數(shù)據(jù)分類方法,對(duì)弱關(guān)聯(lián)Web數(shù)據(jù)的關(guān)聯(lián)性利用關(guān)聯(lián)查詢過程中的實(shí)時(shí)屬性進(jìn)行修補(bǔ),然后再利用求解旅行商問題(Traveler Salesman Problem, TSP)規(guī)則對(duì)蟻群信息素更新、融合,并將融合結(jié)果運(yùn)用于弱關(guān)聯(lián)數(shù)據(jù)分類中,以實(shí)現(xiàn)Web數(shù)據(jù)的準(zhǔn)確分類; Li等[22]提出基于信念網(wǎng)絡(luò)的網(wǎng)頁(yè)分類算法; Zhang等[23] 以及Kazienko等 [24]分別提出了標(biāo)簽相關(guān)和度量標(biāo)簽的網(wǎng)絡(luò)多類分類方法; Zhang等 [25]利用頂點(diǎn)之間基于的相對(duì)信息熵的相似度對(duì)網(wǎng)絡(luò)頂點(diǎn)進(jìn)行分類。
目前,絕大多數(shù)的頂點(diǎn)分類算法都集中在無符號(hào)的社交網(wǎng)絡(luò)或只有正鏈接的社交網(wǎng)絡(luò)[9,12,15,26]上,然而大部分真實(shí)的符號(hào)社交網(wǎng)絡(luò)中可以同時(shí)包含正面和負(fù)面的聯(lián)系,例如:Epinions中包括信任和不信任的聯(lián)系;Slashdot中有朋友和敵人的鏈接,但對(duì)于符號(hào)網(wǎng)絡(luò)的頂點(diǎn)分類的研究工作卻很少。傳統(tǒng)的基于關(guān)聯(lián)的網(wǎng)絡(luò)分類算法將結(jié)構(gòu)化的數(shù)據(jù)通過正相關(guān)相互聯(lián)系對(duì)數(shù)據(jù)對(duì)象進(jìn)行分類,而符號(hào)社會(huì)網(wǎng)絡(luò)數(shù)據(jù)中,含有兩類相反的相關(guān)關(guān)系,這將為頂點(diǎn)分類提供有效的信息,因此,如何在這種符號(hào)社會(huì)網(wǎng)絡(luò)中,充分利用這兩類相反的相關(guān)關(guān)系對(duì)用戶進(jìn)行分類是一個(gè)新的挑戰(zhàn)。
近來對(duì)于符號(hào)網(wǎng)絡(luò)的研究證明,負(fù)鏈接對(duì)于符號(hào)網(wǎng)絡(luò)分析的作用大于正鏈接,因此,負(fù)鏈接在許多對(duì)于符號(hào)網(wǎng)絡(luò)分析的任務(wù)中起著關(guān)鍵性的作用。本文研究符號(hào)網(wǎng)絡(luò)中頂點(diǎn)的分類問題時(shí),將正、負(fù)網(wǎng)絡(luò)映射到一個(gè)相同的隱空間,提出基于隱空間的正負(fù)鏈接的數(shù)學(xué)模型,提出優(yōu)化該模型的迭代算法,通過對(duì)隱空間矩陣和映射矩陣的迭代優(yōu)化,來對(duì)網(wǎng)絡(luò)中的頂點(diǎn)進(jìn)行分類。由帶符號(hào)社會(huì)網(wǎng)絡(luò)數(shù)據(jù)集的實(shí)驗(yàn)結(jié)果證明,本文算法的預(yù)測(cè)結(jié)果具有較高的精確度。
5 結(jié)語(yǔ)
本文將正負(fù)網(wǎng)絡(luò)映射到相對(duì)應(yīng)的隱空間,設(shè)計(jì)出基于隱空間的正、負(fù)鏈接的數(shù)學(xué)模型,并提出了優(yōu)化該模型的迭代算法,通過對(duì)隱空間矩陣和映射矩陣的迭代優(yōu)化得到分類矩陣,從而對(duì)網(wǎng)絡(luò)中的頂點(diǎn)進(jìn)行分類。由帶符號(hào)社會(huì)網(wǎng)絡(luò)數(shù)據(jù)集的實(shí)驗(yàn)結(jié)果證明,本文算法的預(yù)測(cè)結(jié)果具有較高的精確度。
參考文獻(xiàn) (References)
[1] GUHA R, KUMAR R, RAGHAVAN P, et al. Propagation of trust and distrust[C]// Proceedings of the 13th International Conference on World Wide Web. New York: ACM, 2004: 403-412.
[2] SONG D J, MEYER D. Link sign prediction and ranking in signed directed social networks[J]. Social Network Analysis & Mining, 2015, 5(1): 1-14.
[3] BURKE M, KRAUL R. Mopping up: modeling Wikipedia promotion decisions[C]// Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work. New York: ACM, 2008:27-36.
[4] KUNEGIS J, LOMMATZSCH A, BAUCKHAGE C. The Slashdot zoo: mining a social network with negative edges[C]// Proceedings of the 18th International Conference on World Wide Web. New York: ACM, 2009:541-550.
[5] 王新月,王興超,雷靂,等.社交網(wǎng)站在社會(huì)認(rèn)同發(fā)展中的作用[J].心理科學(xué)進(jìn)展, 2018, 26(11): 2024-2034.(WANG X Y, WANG X C, LEI L, et al. The role of social networking sites in the development of social identity[J]. Advances in Psychological Science, 2018, 26(11): 2024-2034.)
[6] ZHELEVA E, GETOOR L. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles[C]// Proceedings of the 18th International Conference on World Wide Web. New York: ACM, 2009: 531-540.
[7] ABBASI M A, TANG J,LIU H. Scalable learning of users preferences using networked data[C]// Proceedings of the 25th ACM Conference on Hypertext and Social Media. New York: ACM, 2014: 4-12.
[8] TANG L, LIU H. Relational learning via latent social dimensions[C]// Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2009: 817-826.
[9] SEN P, NAMATA G, BILGIC M, et al. Collective classification in network data[J]. AI Magazine, 2008, 29(3): 93.
[10] GETOOR L, DIEHL C P. Link mining: a survey[J]. ACM SIGKDD Explorations Newsletter, 2005, 7(2): 3-12.
[11] BHAGAT S, CORMODE G, MUTHUKRISHNAN S. Node classification in social networks[C]// Social Network Data Analytics. Berlin: Springer, 2011: 115-148.
[12] LU Q, GETOOR L. Linkbased classification[C]// Proceedings of the 20th International Conference on International Conference on Machine Learning. Menlo Park, CA: AAAI Press, 2003: 496-503.
[13] AZRAN A. The rendezvous algorithm: multiclass semisupervised learning with Markov random walks[C]// Proceedings of the 24th International Conference on Machine Learning. New York: ACM, 2007:49-56.
[14] ZHU X, GHAHRAMANI Z, JOHN L, et al. Semisupervised learning using Gaussian fields and harmonic functions[C]// Proceedings of the 20th International Conference on Machine Learning. Menlo Park, CA: AAAI Press, 2003: 912-919.
[15] ZHOU D, BOUSQUET O, THOMAS N L, et al. Learning with local and global consistency[C]// Proceedings of the 16th International Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2003: 321-328.
[16] ZHOU D, HUANG J, SCHOLKOPF B. Learning from labeled and unlabeled data on a directed graph[C]// Proceedings of the 22nd International Conference on Machine Learning. New York: ACM, 2005: 1036-1043.
[17] BALUJA S, SETH R, SIVAKUMAR D, et al. Video suggestion and discovery for YouTube: taking random walks through the view graph[C]// Proceedings of the 17th International Conference on World Wide Web. New York: ACM, 2008: 895-904.
[18] NANDANWAR S, MURTY M N. Structural neighborhood based classification of nodes in a network[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 1085-1094.
[19] TASKAR B, ABBEEL P, KOLLER D. Discriminative probabilistic models for relational data[C]// Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence. New York: ACM, 2002:485-492.
[20] 王小攀,胡艷.基于雙圖結(jié)構(gòu)標(biāo)簽傳遞算法的高光譜數(shù)據(jù)分類[J].計(jì)算機(jī)與數(shù)字工程,2018, 46(10): 2117-2122.(WANG X P,HU Y. Hyperspectral data classification based on double graph structure label transfer algorithm[J]. Computer and Digital Engineering, 2018, 46(10): 2117-2122.)
[21] 邵海軍. 弱關(guān)聯(lián)Web數(shù)據(jù)分類方法優(yōu)化研究與仿真[J].計(jì)算機(jī)仿真,2015,32(12): 392-395.(SHAO H J. Research and simulation on the optimization of weak association Web data classification method[J]. Computer Simulation, 2015, 32(12):392-395.)
[22] LI Y C, NIE X Q, HUANG R. Web spam classification method based on deep belief networks[J]. Expert Systems with Applications, 2018, 96: 261-270.
[23] ZHANG Z,WANG H,LIU L, et al. Multilabel relational classification via node and label correlation[J]. Neurocomputing, 2018, 292: 72-81.
[24] KAZIENKO P, KAJDANOWICZ T. Labeldependent node classification in the network[J]. Neurocomputing, 2012, 75(1): 199-209.
[25] ZHANG Q, LI M Z, DENG Y. Measure the structure similarity of nodes in complex networks based on relative entropy[J]. Physica A: Statistical Mechanics and its Applications, 2018, 491:749-763.
[26] GIANVITO P, FRANCESCO S, DONATO M, et al. Multitype clustering and classification from heterogeneous networks[J].Information Sciences, 2018, 425:107-126.
[27] LESKOVEC J, HUTTENLOCHER D. Predicting positive and negative links in online social networks[C]// Proceedings of the 19th International Conference on World Wide Web. New York: ACM, 2010:641-650.