張書旋 康海燕 閆涵
摘 要:隨著社交軟件的流行,越來越多的人加入社交網(wǎng)絡(luò)產(chǎn)生了大量有價值的信息,其中也包含了許多敏感隱私信息。不同的用戶有不同的隱私需求,因此需要不同級別的隱私保護。社交網(wǎng)絡(luò)中用戶隱私泄露等級受社交網(wǎng)絡(luò)圖結(jié)構(gòu)和用戶自身威脅等級等諸多因素的影響。針對社交網(wǎng)絡(luò)數(shù)據(jù)的個性化隱私保護問題及用戶隱私泄露等級評價問題,提出基于Skyline計算的個性化差分隱私保護策略(PDPS)用以發(fā)布社交網(wǎng)絡(luò)關(guān)系數(shù)據(jù)。首先構(gòu)建用戶的屬性向量; 接著采用基于Skyline計算的方法評定用戶的隱私泄露等級,并根據(jù)該等級對用戶數(shù)據(jù)集進(jìn)行分割;然后應(yīng)用采樣機制來實現(xiàn)個性化差分隱私,并對整合后的數(shù)據(jù)添加噪聲;最后對處理后數(shù)據(jù)進(jìn)行安全性和實用性的分析并發(fā)布數(shù)據(jù)。在真實數(shù)據(jù)集上與傳統(tǒng)的個性化差分隱私方法(PDP)對比,驗證了PDPS算法的隱私保護質(zhì)量和數(shù)據(jù)的可用性都優(yōu)于PDP算法。
關(guān)鍵詞:社交網(wǎng)絡(luò);隱私保護;Skyline計算;個性化差分隱私;基于Skyline計算的個性化差分隱私保護算法
中圖分類號:TP309
文獻(xiàn)標(biāo)志碼:A
Abstract: With the popularity and development of social software, more and more people join the social network, which produces a lot of valuable information, including sensitive private information. Different users have different private requirements and therefore require different levels of privacy protection. The level of user privacy leak in social network is affected by many factors, such as the structure of social network graph and the threat level of the user himself. Aiming at the personalized differential privacy preserving problem and user privacy leak level problem, a Personalized Differential Privacy based on Skyline (PDPS) algorithm was proposed to publish social network relational data. Firstly, users attribute vector was built. Secondly, the user privacy leak level was calculated by Skyline computation method and the user dataset was segmented according to this level. Thirdly, with the sampling mechanism, the users with different privacy requirements were protected at different levels to realize personalized differential privacy and noise was added to the integreted data. Finally, the processed data were analyzed for security and availability and published. The experimental results demonstrate that compared with the traditional Personalized Differential Privacy (PDP) method on the real data set, PDPS algorithm has better privacy protection quality and data availability.
英文關(guān)鍵詞Key words: social network; privacy preserving; Skyline query; personalized differential privacy; Personalized Differential Privacy based on Skyline (PDPS) algorithm
0 引言
社交網(wǎng)絡(luò)隱私信息可以分為兩種:一種隱私是用戶敏感信息隱私,比如用戶的手機號碼、家庭住址、疾病、收入等;另一種隱私是社交網(wǎng)絡(luò)關(guān)系隱私,即社交網(wǎng)絡(luò)中人與人之間的連接關(guān)系信息,如親屬關(guān)系、同學(xué)關(guān)系。在社交網(wǎng)絡(luò)中無論是哪種類型隱私信息的披露都可能會使個人的隱私受到威脅, 因此,隱私信息的識別和分類是非常必要的,需結(jié)合具體的信息類別來采取相應(yīng)有效的保護策略。本文主要研究社交網(wǎng)絡(luò)關(guān)系型數(shù)據(jù)的隱私保護與發(fā)布。
差分隱私[1]被公認(rèn)是一個強大的隱私保護模型,能夠為數(shù)據(jù)提供強大的隱私保證,但是該模型局限于為所有個人提供相同級別的隱私保護; 然而并非所有用戶都需要相同的隱私級別,為避免對那些不需要太高隱私級別的用戶提供過多的隱私保護,需要實現(xiàn)個性化的隱私保護。本文采用采樣方法實現(xiàn)個性化差分隱私保護,引入非均勻的不確定性。
采樣機制以前為其他目的已經(jīng)與差分隱私結(jié)合過。Li等[2]提出了一種滿足差分隱私的擾動方法,利用采樣的隨機性降低隱私成本,證明了均勻隨機采樣提高了差分隱私保護效果。Kellaris等 [3]提出了GS預(yù)處理方法(preprocesses by Grouping and Smoothing, GS),利用抽樣機制對發(fā)布數(shù)據(jù)進(jìn)行分組,降低拉普拉斯噪聲注入并實現(xiàn)差分隱私。Spiessl等[4]設(shè)計了一個根據(jù)靈敏度采樣的采樣器,能自動實現(xiàn)(ε,δ,γ)隨機差分隱私。
Skyline計算的研究分兩方面:一是對Skyline計算算法的優(yōu)化,二是將Skyline計算算法應(yīng)用于相關(guān)研究領(lǐng)域。目前, 數(shù)據(jù)的海量性和高維性以及數(shù)據(jù)環(huán)境的多樣性和動態(tài)性都使 Skyline計算面臨著愈加嚴(yán)峻的挑戰(zhàn)。各國學(xué)者針對這個問題進(jìn)行了不少研究, 主要得出以下幾種Skyline計算算法:塊嵌套算法[5]、最近鄰算法[6]、分支界限算法[7]等。Skyline算法在多標(biāo)準(zhǔn)決策系統(tǒng)、城市導(dǎo)航系統(tǒng)、數(shù)據(jù)庫可視化、用戶偏好查詢等多個研究領(lǐng)域都有著廣泛應(yīng)用, 例如: 在傳感器網(wǎng)絡(luò)應(yīng)用中,信俊昌等[8]提出了基于過濾的Skyline節(jié)點連續(xù)查詢算法(FIlterbased Skyline node moniToring algorithm, FIST),F(xiàn)IST算法能有效減少Skyline節(jié)點連續(xù)查詢過程中傳感器節(jié)點的通信代價,進(jìn)而降低傳感器網(wǎng)絡(luò)的能量消耗; 多維向量查詢方面,雷婷等[9]在云環(huán)境下提出一種基于超球面投影分區(qū)的 Skyline算法,通過將空間坐標(biāo)投影到超球面上轉(zhuǎn)化為超球面投影坐標(biāo),然后使用超球面投影坐標(biāo)進(jìn)行分區(qū),有效提高分區(qū)內(nèi)數(shù)據(jù)點的平均減枝力度,降低 Skyline 的計算代價; 在用戶偏好查詢方面,Zhang等[10]提出了一種使用Skyline查詢的算法,通過用戶的搜索和查詢,以確定哪種云服務(wù)最能滿足用戶的需求。本文利用Skyline計算方法給用戶評定隱私泄露等級,然后根據(jù)用戶隱私泄露等級進(jìn)行采樣處理,最后添加噪聲實現(xiàn)個性化的差分隱私保護。
本文的主要貢獻(xiàn)包括:1)利用采樣方法實現(xiàn)個性化差分隱私,引入非均勻的不確定性; 2)構(gòu)建了用戶屬性向量,利用Skyline計算方法給用戶評定隱私泄露等級; 3)提出了基于基于Skyline計算的個性化差分隱私保護(Personalized Differential Privacy based on Skyline, PDPS)算法發(fā)布機制的總體流程, 對數(shù)據(jù)采集、數(shù)據(jù)處理、數(shù)據(jù)分析和數(shù)據(jù)發(fā)布各個流程進(jìn)行了闡述說明; 4)在真實數(shù)據(jù)集上與傳統(tǒng)個性化差分隱私(Personalized Differential Privacy, PDP)算法進(jìn)行對比,驗證了發(fā)布數(shù)據(jù)安全性和可用性的提升。
4 結(jié)語
本文針對社交網(wǎng)絡(luò)用戶隱私泄露等級評定和差分隱私保護的個性化這兩個問題,提出PDPS算法用來發(fā)布社交網(wǎng)絡(luò)關(guān)系數(shù)據(jù)。將Skyline計算用于用戶隱私泄露等級的評定,并用采樣機制來實現(xiàn)了個性化的差分隱私保護。在真實數(shù)據(jù)集上對該算法的隱私保護效果和數(shù)據(jù)的效用進(jìn)行了驗證,證明了PDPS算法能夠提升社交網(wǎng)絡(luò)數(shù)據(jù)發(fā)布的安全性和可用性。今后可對分割和采樣系數(shù)的選取規(guī)則、帶權(quán)重的社交網(wǎng)絡(luò)圖的發(fā)布算法等方向進(jìn)行下一步研究。
參考文獻(xiàn) (References)
[1] 王豪,徐正全.面向軌跡聚類的差分隱私保護方法[J]. 華中科技大學(xué)學(xué)報(自然科學(xué)版), 2018, 46(1):32-36. (WANG H, XU Z Q. Differential privacy preserving method for trajectory clustering[J]. Journal of Huazhong University of Science & Technology (Natural Science Edition), 2018, 46(1):32-36.)
[2] LI N, QARDAJI W, DONG S. On sampling, anonymization, and differential privacy or, kanonymization meets differential privacy[C]// Proceedings of the 2012 ACM Symposium on Information, Computer and Communications Security. New York: ACM, 2012:32-33.
[3] KELLARIS G, PAPADOPOULOS S. Practical differential privacy via grouping and smoothing[J]. Proceedings of the VLDB Endowment, 2013, 6(5):301-312.
[4] SPIESSL S M, BECKER D A. Sensitivity analysis of a final repository model with quasidiscrete behaviour using quasirandom sampling and a metamodel approach in comparison to other variancebased techniques[J]. Reliability Engineering & System Safety, 2015, 134:287-296.
[5] SARMA A D, LALL A, NANONGKAI D, et al. Randomized multipass streaming Skyline algorithms[J]. Proceedings of the VLDB Endowment, 2009, 2(1):85-96.
[6] KANJ S, ABDALLAH F, DENEUX T, et al. Editing training data for multilabel classification with the knearest neighbor rule[J]. Pattern Analysis & Applications, 2016, 19(1):145-161.
[7] ZHENG J, CHEN J, WANG H. Efficient geometric pruning strategies for continuous Skyline queries[J]. ISPRS International Journal of GeoInformation, 2017, 6(3):91.
[8] 信俊昌, 王國仁. 無線傳感器網(wǎng)絡(luò)中Skyline節(jié)點連續(xù)查詢算法[J]. 計算機學(xué)報, 2012, 35(11): 2415-2430.(XIN J C, WANG G R. Continuous Skyline nodes query processing over wireless sensor networks[J].Chinese Journal of Computers, 2012, 35(11):2415-2430.)
[9] 雷婷,王濤,曲武,等.云環(huán)境下基于超球面投影分區(qū)的Skyline計算[J].計算機科學(xué),2013,40(6):164-171. (LEI T, WANG T, QU W, et al. Distributed Skyline processing based on hypersphere projection partitioning on cloud environments[J]. Computer Science, 2013, 40(6): 164-171.)
[10] ZHANG B, ZHOU S, GUAN J. Adapting Skyline computation to the MapReduce framework: algorithms and experiments[C]// Proceedings of the 16th International Conference on Database Systems for Advanced Applications. Berlin: SpringerVerlag, 2011:403-414.
[11] GULZAR Y, ALWAN A A, SALLEH N, et al. A model for Skyline query processing in a partially complete database[J]. Advanced Science Letters, 2018, 24(2):400-407.
[12] 康海燕, 馬躍雷. 差分隱私保護在數(shù)據(jù)挖掘中應(yīng)用綜述[J]. 山東大學(xué)學(xué)報(理學(xué)版), 2017, 52(3):16-23.(KANG H Y, MA Y L. Survey on application of data mining via differential privacy[J]. Journal of Shandong University (Natural Science), 2017, 52(3):16-23.)
[13] JORGENSEN Z, YU T, CORMODE G. Conservative or liberal? Personalized differential privacy[C]// Proceedings of the 2015 IEEE 31st International Conference on Data Engineering. Piscataway, NJ: IEEE, 2015:1023-1034.
[14] WANG Y, ZHENG B. Preserving privacy in social networks against connection fingerprint attacks[C]// Proceedings of the 2015 IEEE 31st International Conference on Data Engineering. Piscataway, NJ: IEEE, 2015:54-65.