陳前昕,畢仁萬(wàn),林劼,金彪,熊金波
支持多數(shù)不規(guī)則用戶的隱私保護(hù)聯(lián)邦學(xué)習(xí)框架
陳前昕1,2,畢仁萬(wàn)1,2,林劼1,金彪1,熊金波1,2
(1. 福建師范大學(xué)計(jì)算機(jī)與網(wǎng)絡(luò)空間安全學(xué)院,福建 福州 350117;2. 福建師范大學(xué)福建省網(wǎng)絡(luò)安全與密碼技術(shù)重點(diǎn)實(shí)驗(yàn)室,福建 福州 350007)
針對(duì)聯(lián)邦學(xué)習(xí)存在處理大多數(shù)不規(guī)則用戶易引起聚合效率降低,以及采用明文通信導(dǎo)致參數(shù)隱私泄露的問(wèn)題,基于設(shè)計(jì)的安全除法協(xié)議構(gòu)建針對(duì)不規(guī)則用戶魯棒的隱私保護(hù)聯(lián)邦學(xué)習(xí)框架。該框架通過(guò)將模型相關(guān)計(jì)算外包給兩臺(tái)邊緣服務(wù)器以減小采用同態(tài)加密產(chǎn)生的高額計(jì)算開銷,不僅允許模型及其相關(guān)信息以密文形式在邊緣服務(wù)器上進(jìn)行密文聚合,還支持用戶在本地進(jìn)行模型可靠性計(jì)算以減小傳統(tǒng)方法采用安全乘法協(xié)議造成的額外通信開銷。在該框架的基礎(chǔ)上,為更精準(zhǔn)評(píng)估模型的泛化性能,用戶完成本地模型參數(shù)更新后,利用邊緣服務(wù)器下發(fā)的驗(yàn)證集與本地持有的驗(yàn)證集聯(lián)合計(jì)算模型損失值,并結(jié)合損失值歷史信息動(dòng)態(tài)更新模型可靠性以作為模型權(quán)重。進(jìn)一步,在模型可靠性先驗(yàn)知識(shí)指導(dǎo)下進(jìn)行模型權(quán)重縮放,將密文模型與密文權(quán)重信息交由邊緣服務(wù)器對(duì)全局模型參數(shù)進(jìn)行聚合更新,保證全局模型變化主要由高質(zhì)量數(shù)據(jù)用戶貢獻(xiàn),提高收斂速度。通過(guò)HybridArgument模型進(jìn)行安全性分析,論證表明PPRFL(privacy-preserving robust federated learning)可以有效保護(hù)模型參數(shù)以及包括用戶可靠性等中間交互參數(shù)的隱私。實(shí)驗(yàn)結(jié)果表明,當(dāng)聯(lián)邦聚合任務(wù)中的所有參與方均為不規(guī)則用戶時(shí),PPRFL方案準(zhǔn)確率仍然能達(dá)到92%,收斂效率較PPFDL(privacy-preserving federated deep learning with irregular users)提高1.4倍;當(dāng)聯(lián)邦聚合任務(wù)中80%用戶持有的訓(xùn)練數(shù)據(jù)都為噪聲數(shù)據(jù)時(shí),PPRFL方案準(zhǔn)確率仍然能達(dá)到89%,收斂效率較PPFDL提高2.3倍。
聯(lián)邦學(xué)習(xí);隱私保護(hù);安全聚合;大多數(shù)不規(guī)則用戶;安全除法協(xié)議
近年來(lái),各類數(shù)據(jù)驅(qū)動(dòng)的深度學(xué)習(xí)(DL,deep learning)系統(tǒng)性能獲得大幅提升,人工智能實(shí)現(xiàn)了經(jīng)濟(jì)與公眾效益的共贏[1]。然而,由第三方服務(wù)商收集用戶數(shù)據(jù)將不可避免地導(dǎo)致個(gè)人隱私數(shù)據(jù)泄露[2]。為解決數(shù)據(jù)孤島難題并實(shí)現(xiàn)數(shù)據(jù)隱私共享,聯(lián)邦學(xué)習(xí)(FL,federated learning)技術(shù)應(yīng)運(yùn)而生[3-4]。盡管聯(lián)邦學(xué)習(xí)能緩解公眾對(duì)隱私泄露的擔(dān)憂,但相關(guān)研究[5-6]表明,攻擊者可以通過(guò)用戶上傳的明文梯度信息間接推理出用戶本地部分?jǐn)?shù)據(jù)集成員和對(duì)應(yīng)標(biāo)簽的信息。此外,明文梯度有可能遭受模型反演、模型推理等攻擊[7],導(dǎo)致數(shù)據(jù)隱私泄露。
針對(duì)聯(lián)邦學(xué)習(xí)梯度逆推帶來(lái)的新挑戰(zhàn),已有一些研究[8-9]聚焦于提升聯(lián)邦學(xué)習(xí)的隱私性,并應(yīng)用于各種場(chǎng)景[10-12]。在保證隱私性的同時(shí),多種聯(lián)邦學(xué)習(xí)方案進(jìn)一步研究如何提高系統(tǒng)的訓(xùn)練效率。Kanagavelu等[13]為降低通信成本和提高可擴(kuò)展性,提出一種支持安全多方計(jì)算的兩階段聯(lián)邦學(xué)習(xí)方案。董業(yè)等[14]基于秘密共享技術(shù)和Top-K梯度選擇算法,提出高效安全的聯(lián)邦學(xué)習(xí)框架。Aono等[15]對(duì)梯度數(shù)據(jù)進(jìn)行同態(tài)加密,保證多個(gè)訓(xùn)練者的本地?cái)?shù)據(jù)隱私安全。文獻(xiàn)[16]方案使用Paillier同態(tài)加密方法優(yōu)化加密模型聚合過(guò)程。
然而,這些隱私保護(hù)聯(lián)邦學(xué)習(xí)方案假設(shè)每個(gè)用戶均持有高質(zhì)量的數(shù)據(jù),沒(méi)有考慮系統(tǒng)中存在不規(guī)則用戶的情況。實(shí)際上,擁有高級(jí)專業(yè)知識(shí)或高級(jí)設(shè)備的用戶通常會(huì)生成高質(zhì)量的數(shù)據(jù),其他用戶卻可能由于噪聲干擾、記錄錯(cuò)誤等產(chǎn)生低質(zhì)量數(shù)據(jù),這些用戶稱為不規(guī)則用戶[17],他們上傳的模型參數(shù)可能會(huì)對(duì)全局模型精度造成負(fù)面影響。
針對(duì)上述直接傳輸原始模型梯度導(dǎo)致模型隱私信息泄露以及大多數(shù)不規(guī)則用戶存在導(dǎo)致現(xiàn)有聯(lián)邦學(xué)習(xí)框架效率降低的嚴(yán)峻問(wèn)題,本文提出一種支持大多數(shù)不規(guī)則用戶魯棒的隱私保護(hù)聯(lián)邦學(xué)習(xí)(PPRFL,privacy-preserving robust federated learning)框架,主要貢獻(xiàn)總結(jié)如下。
1)設(shè)計(jì)一種動(dòng)態(tài)的模型可靠性更新策略,解決大多數(shù)不規(guī)則用戶參與訓(xùn)練的精度提升緩慢問(wèn)題,并基于可靠性先驗(yàn)知識(shí)指導(dǎo)模型權(quán)重縮放,實(shí)現(xiàn)全局模型聚合更新,保證全局模型變化主要由高質(zhì)量數(shù)據(jù)用戶貢獻(xiàn),進(jìn)一步提高了聯(lián)邦模型收斂的效率。
2)設(shè)計(jì)高效的安全除法協(xié)議,構(gòu)建邊緣服務(wù)器輔助的魯棒隱私保護(hù)聯(lián)邦學(xué)習(xí)框架PPRFL,規(guī)避上傳和處理過(guò)程中的模型參數(shù)隱私泄露風(fēng)險(xiǎn)。
3)安全性詳細(xì)證明PPRFL框架可以有效保護(hù)模型參數(shù)的隱私性。實(shí)驗(yàn)結(jié)果表明,當(dāng)所有參與方都為不規(guī)則用戶時(shí),PPRFL方案的準(zhǔn)確率仍可達(dá)92%,相比PPFDL方案,聚合效率提高1.4倍,并且服務(wù)器間的計(jì)算與通信開銷較低。
圖1 PPRFL方案系統(tǒng)模型
Figure 1 System model of the PPRFL scheme
如同大多數(shù)外包計(jì)算場(chǎng)景的安全假設(shè)一樣,兩臺(tái)服務(wù)器為不共謀的誠(chéng)實(shí)但好奇服務(wù)器[20-22]。服務(wù)器能正確地執(zhí)行協(xié)議規(guī)范和相應(yīng)操作,但可能利用在協(xié)議期間收集到的存儲(chǔ)信息和消息流中的數(shù)據(jù)來(lái)推測(cè)用戶的隱私數(shù)據(jù)[29]。惡意用戶通常會(huì)對(duì)圖像進(jìn)行標(biāo)簽翻轉(zhuǎn)或植入Pattern實(shí)現(xiàn)投毒攻擊。與惡意用戶不同,不規(guī)則用戶通常在生成和存儲(chǔ)數(shù)據(jù)的過(guò)程中,由于記錄錯(cuò)誤、噪聲數(shù)據(jù)干擾、使用陳舊數(shù)據(jù)等產(chǎn)生低質(zhì)量數(shù)據(jù)。此外,用戶會(huì)誠(chéng)實(shí)地執(zhí)行模型訓(xùn)練以及后續(xù)加密上傳等操作,因此認(rèn)為用戶是可信的。
基于上述威脅模型,與文獻(xiàn)[19,21]類似,本文有如下隱私需求。①用戶模型參數(shù)。敵手可以通過(guò)本地模型和全局模型參數(shù)對(duì)用戶進(jìn)行成員推理等攻擊恢復(fù)出用戶的敏感信息,為保護(hù)用戶數(shù)據(jù)隱私,模型參數(shù)應(yīng)該以密文形式提交到服務(wù)器。②模型可靠性。模型可靠性可以認(rèn)為是用戶數(shù)據(jù)質(zhì)量的評(píng)價(jià),為使學(xué)習(xí)過(guò)程公平無(wú)歧視,這個(gè)信息應(yīng)對(duì)除用戶外的其他參與方保密。③模型聚合結(jié)果。全局模型的聚合結(jié)果可以視為用戶使用大量數(shù)據(jù)資源生成的知識(shí)產(chǎn)權(quán),除了參與訓(xùn)練的用戶外應(yīng)對(duì)其他參與方保密。
PPRFL包含用戶端和服務(wù)器端兩個(gè)部分,用戶端上的參與方在本地訓(xùn)練得到局部模型,基于組合驗(yàn)證集計(jì)算損失值,并結(jié)合歷史信息,采用動(dòng)態(tài)更新以及權(quán)重縮放方式得到模型可靠性;在服務(wù)器端上,服務(wù)器收到加密參數(shù)后先利用加同態(tài)性質(zhì)進(jìn)行聚合,然后交互執(zhí)行安全除法協(xié)議得到全局模型。參與聯(lián)邦學(xué)習(xí)的用戶有著相似的收集數(shù)據(jù)場(chǎng)景,類似于隱私保護(hù)聯(lián)邦學(xué)習(xí)方案[11,15,30],本文考慮每個(gè)用戶持有的高質(zhì)量數(shù)據(jù)都滿足獨(dú)立同分布(IID,independent and identically distributed)。
本節(jié)通過(guò)驗(yàn)證集損失分析來(lái)介紹處理大多數(shù)不規(guī)則用戶的動(dòng)態(tài)更新策略。在低質(zhì)量數(shù)據(jù)集的聯(lián)邦學(xué)習(xí)場(chǎng)景中,直接使用驗(yàn)證集損失值作為模型權(quán)重的計(jì)算方法將導(dǎo)致權(quán)重區(qū)分度失衡。這種失衡表現(xiàn)為經(jīng)不同噪聲比例數(shù)據(jù)訓(xùn)練出來(lái)的模型,在同一損失值下模型精度差異大;或是在相同的模型精度上,損失值差異大。基于以上觀察,本文設(shè)計(jì)組合驗(yàn)證集計(jì)算用戶模型損失值,并結(jié)合損失值歷史信息輔助評(píng)估當(dāng)前模型可靠性的動(dòng)態(tài)更新方法。
算法1 PPRFL框架在用戶端的執(zhí)行算法
9) 進(jìn)入下一輪迭代
本節(jié)介紹PPRFL框架在服務(wù)器端進(jìn)行模型聚合更新的執(zhí)行算法,具體步驟如下。
7) 返回1)進(jìn)入下一輪迭代
其中安全除法協(xié)議的構(gòu)造詳見附錄。
證畢。
表1比較了SecProbe[17]、SAHPP[19]、PPFDL[21]以及本文PPRFL框架的功能性,4種方案均能保護(hù)用戶上傳的模型參數(shù)信息。運(yùn)行在半可信服務(wù)器環(huán)境下的SecProbe需要在線評(píng)估模型效用并進(jìn)行模型聚合,這會(huì)造成可靠性隱私和聚合模型參數(shù)隱私泄露的問(wèn)題,并且SecProbe采用的差分隱私方法不支持用戶中途退出。SAHPP、PPFDL以及本文方案采用同態(tài)加密體系,對(duì)中間過(guò)程參數(shù)和聚合結(jié)果都進(jìn)行保護(hù),服務(wù)器無(wú)法得知用戶可靠性和聚合參數(shù)結(jié)果。此外,SecProbe、SAHPP和PPFDL在設(shè)計(jì)之初考慮的是低質(zhì)量數(shù)據(jù)不超過(guò)全局總數(shù)據(jù)量一半的情況,在噪聲數(shù)據(jù)占比過(guò)多時(shí)將造成聚合效率降低的問(wèn)題。而本文的PPRFL框架依托組合驗(yàn)證集上的損失值,并采用動(dòng)態(tài)更新和縮放方式計(jì)算模型可靠性,使本文方案在大多數(shù)不規(guī)則用戶環(huán)境下仍能保持較高的模型收斂效率。
表1 不同方案的功能性比較
注:“√”表示滿足對(duì)應(yīng)功能,“×”表示不滿足。
在功能性分析中,SecProbe[17]會(huì)導(dǎo)致模型可靠性和聚合參數(shù)隱私泄露并且無(wú)法應(yīng)對(duì)用戶中途退出的情況。因此在接下來(lái)的分析中,本文只與安全性較強(qiáng)的SAHPP[19]和PPFDL[21]方案進(jìn)行對(duì)比。
由于用戶端的計(jì)算能力有限,在考慮模型精度和通信開銷的同時(shí)應(yīng)注意用戶端的計(jì)算開銷。圖2給出在上傳相同的模型參數(shù)數(shù)量時(shí),不同方案的用戶端計(jì)算開銷。SAHPP[19]不僅需要密鑰協(xié)商交互計(jì)算,并且需要計(jì)算本地模型與全局模型的空間距離,因此SAHPP方案中的用戶端計(jì)算開銷較高。PPFDL[21]僅需進(jìn)行模型訓(xùn)練與模型參數(shù)加密,計(jì)算開銷相比較低。PPRFL比PPFDL方案需要多執(zhí)行一次前向計(jì)算和縮放模型參數(shù)計(jì)算,計(jì)算開銷略高于PPFDL,但大大增強(qiáng)了模型對(duì)于不規(guī)則用戶比例的魯棒性。
圖2 不同方案的用戶端計(jì)算開銷
Figure 2 Comparison of client’s computation costs among different schemes
圖3 不同方案的服務(wù)器端計(jì)算開銷
Figure 3 Comparison of server’s computation costs among different schemes
表3 不同方案的計(jì)算復(fù)雜度對(duì)比
圖4 不同用戶比例下模型準(zhǔn)確率隨著通信輪次的變化曲線
Figure 4 Curves showing the variations of model accuracy with number of communication rounds under different client ratios
圖5 不同噪聲比例下模型準(zhǔn)確率隨著通信輪次的變化曲線
Figure 5 Curves showing the variations of model accuracy with number of communication rounds under different noise ratios
圖6 相同噪聲比下收斂次數(shù)隨著用戶數(shù)量的變化曲線
Figure 6 Curves showing the variations of convergence counts with number of clients under the same noise ratios
針對(duì)現(xiàn)有聯(lián)邦學(xué)習(xí)方案受大多數(shù)不規(guī)則用戶影響產(chǎn)生聚合效率降低、采用明文通信導(dǎo)致隱私信息泄露的問(wèn)題,本文提出了一種支持多數(shù)不規(guī)則用戶的魯棒隱私保護(hù)聯(lián)邦學(xué)習(xí)框架PPRFL。PPRFL在用戶端利用組合驗(yàn)證集的損失值動(dòng)態(tài)計(jì)算和縮放模型可靠性;在服務(wù)器端結(jié)合設(shè)計(jì)的除法協(xié)議進(jìn)行安全參數(shù)聚合。經(jīng)過(guò)理論分析和實(shí)驗(yàn)驗(yàn)證,PPRFL可以保證模型精度并提高聯(lián)邦聚合的收斂效率,而且計(jì)算和通信開銷均優(yōu)于PPFDL。未來(lái)的工作將研究縱向聯(lián)邦學(xué)習(xí)架構(gòu)下處理不規(guī)則用戶的方案,尋找進(jìn)一步減小計(jì)算和通信開銷的方法。
安全除法協(xié)議的安全性證明如命題2所述。
證畢。
表4 不同除法協(xié)議的計(jì)算開銷對(duì)比
在通信開銷方面,SecDiv需要2次通信傳輸同態(tài)密文,且需要執(zhí)行通信開銷較高的混淆電路和不經(jīng)意傳輸操作,而本文提出的安全除法協(xié)議僅需要2次通信傳輸同態(tài)密文,通信開銷遠(yuǎn)低于SecDiv協(xié)議。
[1] LI T, SAHU A K, TALWALKAR A, et al. Federated learning: challenges, methods, and future directions[J]. IEEE Signal Processing Magazine, 2020, 37(3): 50-60.
[2] YANG Q, LIU Y, CHEN T J, et al. Federated machine learning[J]. ACM Transactions on Intelligent Systems and Technology, 2019, 10(2): 1-19.
[3] 梁應(yīng)敞, 譚俊杰, Dusit Niyato. 智能無(wú)線通信技術(shù)研究概況[J]. 通信學(xué)報(bào), 2020, 41(7): 1-17.
LIANG Y C, TAN J J, NIYATO D. Overview on intelligent wireless communication technology[J]. Journal on Communications, 2020, 41(7): 1-17.
[4] 楊強(qiáng). AI與數(shù)據(jù)隱私保護(hù):聯(lián)邦學(xué)習(xí)的破解之道[J]. 信息安全研究, 2019, 5(11): 961-965.
YANG Q. AI and data privacy protection: the way to federated learning[J]. Journal of Information Security Research, 2019, 5(11): 961-965.
[5] 譚清尹, 曾穎明, 韓葉, 等. 神經(jīng)網(wǎng)絡(luò)后門攻擊研究[J]. 網(wǎng)絡(luò)與信息安全學(xué)報(bào), 2021, 7(3): 46-58.
TAN Q Y, ZENG Y M, HAN Y, et al. Survey on backdoor attacks targeted on neural network[J]. Chinese Journal of Network and Information Security, 2021, 7(3): 46-58.
[6] MELIS L, SONG C Z, DE CRISTOFARO E, et al. Exploiting unintended feature leakage in collaborative learning[C]//Proceed- ings of 2019 IEEE Symposium on Security and Privacy. Piscataway: IEEE Press, 2019: 691-706.
[7] 周傳鑫, 孫奕, 汪德剛, 等. 聯(lián)邦學(xué)習(xí)研究綜述[J]. 網(wǎng)絡(luò)與信息安全學(xué)報(bào), 2021, 7(5): 77-92.
ZHOU C X, SUN Y, WANG D G, et al. Survey of federated learning research[J]. Chinese Journal of Network and Information Security, 2021, 7(5): 77-92.
[8] HITAJ B, ATENIESE G, PEREZ-CRUZ F. Deep models under the GAN: information leakage from collaborative deep learning[C]//Pr- oceedings of CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017: 603-618.
[9] MOTHUKURI V, PARIZI R M, POURIYEH S, et al. A survey on security and privacy of federated learning[J]. Future Generation Computer Systems, 2021, 115: 619-640.
[10] WAGH S, GUPTA D, CHANDRAN N. SecureNN: 3-party secure computation for neural network training[J]. Proceedings on Privacy Enhancing Technologies, 2019, 2019(3): 26-49.
[11] XU R H, BARACALDO N, ZHOU Y, et al. HybridAlpha: an efficient approach for privacy-preserving federated learning[C]//Pro- ceedings of the 12th ACM Workshop on Artificial Intelligence and Security. 2019: 13-23.
[12] 方晨, 郭淵博, 王一豐, 等. 基于區(qū)塊鏈和聯(lián)邦學(xué)習(xí)的邊緣計(jì)算隱私保護(hù)方法[J]. 通信學(xué)報(bào), 2021, 42(11): 28-40.
FANG C, GUO Y B, WANG Y F, et al. Edge computing privacy protection method based on blockchain and federated learning[J]. Journal on Communications, 2021, 42(11): 28-40.
[13] KANAGAVELU R, LI Z X, SAMSUDIN J, et al. Two-phase multi-party computation enabled privacy-preserving federated learning[C]// Proceedings of 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). 2020: 410-419.
[14] 董業(yè), 侯煒, 陳小軍, 等. 基于秘密分享和梯度選擇的高效安全聯(lián)邦學(xué)習(xí)[J]. 計(jì)算機(jī)研究與發(fā)展, 2020, 57(10): 2241-2250.
DONG Y, HOU W, CHEN X J, et al. Efficient and secure federated learning based on secret sharing and gradients selection[J]. Journal of Computer Research and Development, 2020, 57(10): 2241-2250.
[15] PHONG L T, AONO Y, HAYASHI T, et al. Privacy-preserving deep learning via additively homomorphic encryption[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(5): 1333-1345.
[16] 張澤輝, 富瑤, 高鐵杠. 支持?jǐn)?shù)據(jù)隱私保護(hù)的聯(lián)邦深度神經(jīng)網(wǎng)絡(luò)模型研究[J].自動(dòng)化學(xué)報(bào), 2020.
ZHANG Z H, FU Y, GAO T G. Research on federated deep neural network model for data privacy protection[J]. Acta Automatica Sinica, 2020.
[17] ZHAO L C, WANG Q, ZOU Q, et al. Privacy-preserving collaborative deep learning with unreliable participants[J]. IEEE Transactions on Information Forensics and Security, 2020, 15: 1486-1500.
[18] JAYARAMAN B, EVANS D. Evaluating differentially private machine learning in practice[C]//Proceedings of the28th USENIX Security Symposium. 2019: 1895-1912.
[19] 成藝. 聯(lián)合學(xué)習(xí)環(huán)境下保護(hù)隱私的數(shù)據(jù)聚合技術(shù)研究[D].成都:電子科技大學(xué), 2020: 17-45.
CHENG Y. Research on data aggregation technology based on privacy-preserving in federated learning[D]. Chengdu: University of Electronic Science and Technology of China, 2020: 17-45.
[20] SHAMIR A. How to share a secret[J]. Communications of the ACM, 1979, 22(11): 612-613.
[21] XU G W, LI H W, ZHANG Y, et al. Privacy-preserving federated deep learning with irregular users[J]. IEEE Transactions on Dependable and Secure Computing, 2020, (99): 1.
[22] ZHENG Y F, DUAN H Y, WANG C. Learning the truth privately and confidently: encrypted confidence-aware truth discovery in mobile crowdsensing[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(10): 2475-2489.
[23] TIAN Y L, LI T, XIONG J B, et al. A blockchain-based machine learning framework for edge services in IoT[J]. IEEE Transactions on Industrial Informatics, 2022, 18(3): 1918-1929.
[24] XIONG J B, BI R W, ZHAO M F, et al. Edge-assisted privacy-preserving raw data sharing framework for connected autonomous vehicles[J]. IEEE Wireless Communications, 2020, 27(3): 24-30.
[25] BONAWITZ K, EICHNER H, GRIESKAMP W, et al. Towards federated learning at scale: system design[J]. arXiv preprint arXiv:1902.01046, 2019.
[26] MCMAHAN H B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data[C]//Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. 2017: 1273-1282.
[27] PAILLIER P. Public-key cryptosystems based on composite degree residuosity classes[C]// Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques. 1999: 223-238.
[28] ACAR A, AKSU H, ULUAGAC A S, et al. A survey on homomorphic encryption schemes[J]. ACM Computing Surveys, 2019, 51(4): 1-35.
[29] CANETTI R, FEIGE U, GOLDREICH O, et al. Adaptively secure multi-party computation[C]//Proceedings of the twenty-eighth annual ACM Symposium on Theory of Computing. 1996: 639-648.
[30] HENDERSON M, THOMSON B, WILLIAMS J D. The third dialog state tracking challenge[C]//Proceedings of 2014 IEEE Spoken Language Technology Workshop. 2014: 324-329.
[31] BUDZIANOWSKI P, WEN T H, TSENG B H, et al. MultiWOZ - A large-scale multi-domain wizard-of-odataset for task-oriented dialogue modelling[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018: 5016-5026.
[32] SHOKRI R, SHMATIKOV V. Privacy-preserving deep learning[C]//Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 2015: 1310-1321.
[33] KANG J W, XIONG Z H, NIYATO D, et al. Toward secure blockchain-enabled Internet of vehicles: optimizing consensus management using reputation and contract theory[J]. IEEE Transactions on Vehicular Technology, 2019, 68(3): 2906-2920.
[34] KANG J W, XIONG Z H, NIYATO D, et al. Incentive mechanism for reliable federated learning: a joint optimization approach to combining reputation and contract theory[J]. IEEE Internet of Things Journal, 2019, 6(6): 10700-10714.
[35] KANG J W, XIONG Z H, NIYATO D, et al. Reliable federated learning for mobile networks[J]. IEEE Wireless Communications, 2020, 27(2): 72-80.
[36] CATALANO D, FIORE D. Using linearly-homomorphic encryption to evaluate degree-2 functions on encrypted data[C]//Procee- dings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 2015: 1518-1529.
[37] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
Privacy-preserving federated learningframework with irregular-majority users
CHEN Qianxin1,2, BI Renwan1,2, LIN Jie1, JIN Biao1, XIONG Jinbo1,2
1. College of Computer and Cyber Security, Fujian Normal University, Fuzhou 350117, China 2. Fujian Provincial Key Laboratory of Network Security and Cryptology, Fujian Normal University, Fuzhou 350007, China
In response to the existing problems that the federated learning might lead to the reduction of aggregation efficiency by handing the majority of irregular users and the leak of parameter privacy by adopting plaintext communication, a framework of privacy-preserving robust federated learning was proposed for ensuring the robustness of the irregular user based on the designed security division protocol. PPRFL could enable the model and its related information to aggregate in ciphertext on the edge server facilitate users to calculate the model reliability locally for reducing the additional communication overhead caused by the adoption of the security multiplication protocol in conventional methods, apart from lowering the high computational overhead resulted from homomorphic encryption with outsourcing computing to two edge servers. Based on this, user could calculate the loss value of the model through jointly using the verification sets issued by the edge server and that held locally after parameter updating of the local model. Then the model reliability could be dynamically updated as the model weight together with the historic information of the loss value. Further, the model weight was scaled under the guidance of prior knowledge, and the ciphertext model and ciphertext weight information are sent to the edge server to aggregate and update the global model parameters, ensuring that global model changes are contributed by high-quality data users, and improving the convergence speed. Through the security analysis of the Hybrid Argument model, the demonstration shows that PPRFL can effectively protect the privacy of model parameters and intermediate interaction parameters including user reliability. The experimental results show that the PPRFL scheme could still achieve the accuracy of 92% when all the participants in the federated aggregation task are irregular users, with the convergence efficiency 1.4 times higher than that of the PPFDL. Besides, the PPRFL scheme could still reach the accuracy of 89% when training data possessed by 80% of the users in the federated aggregation task were noise data, with the convergence efficiency 2.3 times higher than that of the PPFDL.
federated learning, privacy-preserving, secure aggregation, irregular-majority users, security division protocol
s: The National Natural Science Foundation of China (61872088, 61872090, U1905211), The Natural Science Foundation of Fujian Province (2019J01276)
TP309
A
10.11959/j.issn.2096?109x.2022011
2021?10?07;
2022?01?05
林劼, linjie891@163.com
國(guó)家自然科學(xué)基金(61872088, 61872090, U1905211);福建省自然科學(xué)基金(2019J01276)
陳前昕, 畢仁萬(wàn), 林劼, 等. 支持多數(shù)不規(guī)則用戶的隱私保護(hù)聯(lián)邦學(xué)習(xí)框架[J]. 網(wǎng)絡(luò)與信息安全學(xué)報(bào), 2022, 8(1): 139-150.
Format: CHEN Q X, BI R W, LIN J, et al. Privacy-preserving federated learning framework with irregular-majority users[J]. Chinese Journal of Network and Information Security, 2022, 8(1): 139-150.
陳前昕(1996?),男,福建泉州人,福建師范大學(xué)碩士生,主要研究方向?yàn)榘踩疃葘W(xué)習(xí)、隱私保護(hù)技術(shù)。
畢仁萬(wàn)(1996?),男,湖南常德人,福建師范大學(xué)博士生,主要研究方向?yàn)榘踩疃葘W(xué)習(xí)、安全多方計(jì)算。
林劼(1972?),男,福建三明人,博士,福建師范大學(xué)教授,主要研究方向?yàn)闄C(jī)器學(xué)習(xí)、生物信息學(xué)。
金彪(1985?),男,安徽六安人,博士,福建師范大學(xué)高級(jí)實(shí)驗(yàn)師,主要研究方向?yàn)樾畔踩?、?shù)據(jù)挖掘。
熊金波(1981?),男,湖南益陽(yáng)人,博士,福建師范大學(xué)教授,主要研究方向?yàn)榘踩疃葘W(xué)習(xí)、移動(dòng)群智感知、隱私保護(hù)技術(shù)。