李 琪 鐘 將 李 雪
1(重慶大學(xué)計(jì)算機(jī)學(xué)院 重慶 400044) 2(昆士蘭大學(xué)信息技術(shù)與電子工程學(xué)院 澳大利亞布里斯班 4072)
基于啟發(fā)策略的動(dòng)態(tài)平衡圖劃分算法
李 琪1鐘 將1李 雪2
1(重慶大學(xué)計(jì)算機(jī)學(xué)院 重慶 400044)2(昆士蘭大學(xué)信息技術(shù)與電子工程學(xué)院 澳大利亞布里斯班 4072)
(liqi0713@foxmail.com)
隨著計(jì)算技術(shù)的發(fā)展以及大數(shù)據(jù)時(shí)代的來(lái)臨,分布式計(jì)算已成為研究的熱點(diǎn),其中大圖迭代計(jì)算作為其研究的重點(diǎn),降低劃分后子圖之間的通信邊規(guī)模是改善計(jì)算性能的關(guān)鍵.傳統(tǒng)算法很難在切割率最小化與負(fù)載均衡上同時(shí)滿足.由于圖劃分屬于NP組合優(yōu)化問(wèn)題,提出了一種動(dòng)態(tài)平衡算法來(lái)解決圖的平衡劃分,確保在子圖邊界點(diǎn)劃分最優(yōu)的基礎(chǔ)上引入擾動(dòng)策略使其跳出局部最優(yōu)擴(kuò)大搜索空間,最后在真實(shí)世界圖上驗(yàn)證算法的可行性,分別從平衡系數(shù)、切割邊規(guī)模與傳統(tǒng)算法進(jìn)行了比較.在指定的擾動(dòng)次數(shù)下,此算法比常見(jiàn)的算法hash,Chunk,Metis在割邊率上分別降低了近40%,30%,5%.與Metis相比,平衡系數(shù)也更加地優(yōu)化,實(shí)驗(yàn)結(jié)果證明了該算法的有效性.
平衡圖劃分;啟發(fā)策略;負(fù)載均衡;分布式計(jì)算;局部?jī)?yōu)化
給定一個(gè)無(wú)向圖G=(V,E),V和E分別表示頂點(diǎn)和邊的集合,平衡K劃分是把點(diǎn)集V按照映射關(guān)系π→(S1,S2,…,SK)映射到K(K≥2)個(gè)不相交的子域中,每個(gè)子域的規(guī)模幾乎相等,要求使割邊數(shù)最少(邊的2個(gè)端點(diǎn)不在同一個(gè)域).當(dāng)K=2時(shí)是2劃分,針對(duì)2劃分經(jīng)典的是KL(Kernighan-Lin)[1]算法,其基本思想是將圖隨機(jī)劃分為2等份,將2分結(jié)果作為輸入,通過(guò)交換2個(gè)子域中的點(diǎn)來(lái)改進(jìn)2分結(jié)果.此算法已經(jīng)成為大多數(shù)圖劃分算法迭代改進(jìn)的基礎(chǔ),但是由于其較高的時(shí)間復(fù)雜度O(|V|3),不適合大圖的直接處理.Fiduccia和Mattheyses[2]對(duì)其進(jìn)行了改進(jìn),用單點(diǎn)移動(dòng)來(lái)代替KL的雙點(diǎn)交換以及加入了更有效的數(shù)據(jù)結(jié)構(gòu).圖劃分本身是NP完全問(wèn)題[3],可以通過(guò)元啟發(fā)式算法[4]解決此類(lèi)問(wèn)題,主要有模擬退火算法[5]、禁忌搜索算法[6]、遺傳算法[7]等.另外,Kumar等人提出了多層次的圖劃分模型Metis[8]和它的并行版本ParMetis[9].Metis算法設(shè)計(jì)主要基于多層次圖劃分范式.此類(lèi)方法還包括Chaco[10]和Scotch[11].圖劃分有廣泛的應(yīng)用,例如并行計(jì)算[12]、VLSI設(shè)計(jì)[13]、圖像分割[14]等.
圖可以表達(dá)復(fù)雜的結(jié)構(gòu)和豐富的語(yǔ)意,其迭代分析算法在社交網(wǎng)絡(luò)、Web和科學(xué)計(jì)算等諸多領(lǐng)域獲得了廣泛的應(yīng)用[15],然而,隨著數(shù)據(jù)規(guī)模的不斷增長(zhǎng)對(duì)計(jì)算要求提出了嚴(yán)峻的挑戰(zhàn).在2012年,Google月活躍用戶數(shù)為10億,Twitter月活躍用戶數(shù)為2億,平均每天發(fā)送的消息量達(dá)到了1.75億,與之相對(duì)應(yīng)的是數(shù)十億的邊與頂點(diǎn),但是我們?nèi)砸ㄟ^(guò)這些龐大的圖數(shù)據(jù)來(lái)進(jìn)行一些相關(guān)的計(jì)算,例如PageRank、尋找連通分量、計(jì)算三角形等.將如此海量的圖數(shù)據(jù)存儲(chǔ)在單機(jī)環(huán)境中計(jì)算效率會(huì)非常的低,進(jìn)而人們開(kāi)發(fā)了分布式迭代處理系統(tǒng),如Pregel[16],GraphLab[17],Spark[18],Giraph[19].圖劃分是Spark等系統(tǒng)進(jìn)行分布式計(jì)算的提前,每次迭代處理均會(huì)引入巨大的通信開(kāi)銷(xiāo),這將成為制約分布式處理性能的關(guān)鍵因素.一個(gè)良好的劃分算法應(yīng)保證劃分后的子圖在負(fù)載均衡的前提下,最小化割邊數(shù)規(guī)模.因此,設(shè)計(jì)劃分效果優(yōu)越的圖分割算法已經(jīng)成為現(xiàn)有大圖處理系統(tǒng)急需解決的問(wèn)題,已有的圖劃分算法[20-21]在割邊數(shù)規(guī)模與子域負(fù)載平衡上難以同時(shí)滿足.針對(duì)此問(wèn)題,本文提出了動(dòng)態(tài)平衡圖劃分算法——DyBGP,利用多種策略確保各子域負(fù)載均衡的基礎(chǔ)上最小化割邊率.
本文的貢獻(xiàn)主要有2個(gè)方面:
1) 設(shè)計(jì)了基于啟發(fā)策略的動(dòng)態(tài)平衡圖劃分算法,貪心頂點(diǎn)轉(zhuǎn)移操作能夠有效地減少割邊數(shù)達(dá)到局部最優(yōu),分區(qū)容量限制策略用來(lái)平衡各子域的負(fù)載,又定義了擾動(dòng)策略,是跳出局部最優(yōu)的關(guān)鍵,并利用全局記憶結(jié)構(gòu)存儲(chǔ)最優(yōu)的結(jié)果,同時(shí)也對(duì)該算法的復(fù)雜性進(jìn)行了理論分析;
2) 在真實(shí)的圖數(shù)據(jù)上進(jìn)行實(shí)驗(yàn)分析,分別在切割邊數(shù)量與平衡度2方面分別與Hash,Chunk,Metis進(jìn)行比較,實(shí)驗(yàn)結(jié)果證明了本文所提出算法在平衡圖劃分問(wèn)題的有效性.
1) 圖劃分.給定一個(gè)無(wú)向圖G=(V,E),V和E分別表示圖的點(diǎn)集和邊集,K路平衡劃分是將頂點(diǎn)V按照某種策略分配到K個(gè)子域中S1,S2,…,Sk,要求在各子域負(fù)載平衡的基礎(chǔ)上最小化割邊率,Vi代表第i子區(qū)中的頂點(diǎn)集,V1∪V2∪…∪Vk=V,Vi∩Vj=?,i≠j,ρ為平衡系數(shù)(ρ≥1),理想值為1.0.圖劃分問(wèn)題可以定義為
(1)
(2)
(3)
式(2)中的ECutij為子域Si到Sj(或者Sj到Si)所有邊的集合(Si≠Sj).
2)g(v,n).點(diǎn)v從所在子域Slocal移向另一個(gè)子域Sj(Slocal≠Sj),割邊減少的數(shù)量我們稱之為收益值,|EVi|(i∈[1,K])表示子域Si中點(diǎn)與點(diǎn)v相連的邊數(shù),圖1中有4個(gè)子域(S1,S2,S3,S4),點(diǎn)v在子域S3(Slocal)中,|EV1|=3,|EV2|=1,|EV3|=2,|EV4|=2.
n代表點(diǎn)v所移動(dòng)的目標(biāo)子域,取獲得收益最大的子域,g(v,n)不僅有正值也有負(fù)值(圖1中g(shù)(v,1)=1),用數(shù)學(xué)形式表示g(v,n)為
(4)
Fig. 1 An example of 4-partitioning圖1 4個(gè)子域的圖劃分
當(dāng)初始劃分完成后,首先選取邊界點(diǎn)作為候選點(diǎn),然后定義多種策略確保圖的候選點(diǎn)劃分(割邊數(shù)規(guī)模、子域負(fù)載)達(dá)到最優(yōu),為了擴(kuò)大搜索范圍我們加入了擾動(dòng)策略,在此基礎(chǔ)上引入了懲罰措施,懲罰負(fù)載過(guò)大和過(guò)小的子域,算法1用偽代碼詳細(xì)描述了此過(guò)程,詳細(xì)的子過(guò)程將分別在2.1~2.3節(jié)、2.5節(jié)介紹.
算法1. 動(dòng)態(tài)平衡劃分算法(DyBGP).
輸入:初始劃分Pk={V1,V2,…,VK}(見(jiàn)2.1節(jié));
輸出:劃分結(jié)果.
步驟1. 初始化參數(shù),擾動(dòng)次數(shù)(pertur_times),禁忌列表(tabu list),全局記憶結(jié)構(gòu)(global memory structure);
步驟2. For每一個(gè)候選點(diǎn)
計(jì)算g(v,n);
將點(diǎn)v插入增益結(jié)構(gòu)(見(jiàn)2.2節(jié));
End For
步驟3. Whilepertur_times
計(jì)算此時(shí)劃分圖狀態(tài);
① If 收斂
執(zhí)行擾動(dòng)策略跳出局部最優(yōu);
pertur_times=pertur_times-1;
pertur(v,n)(見(jiàn)2.5節(jié));
更新增益結(jié)構(gòu)和全局記憶結(jié)構(gòu);
② Else If 沒(méi)有收斂
Repeat
iter_number=iter_number+1;
greedy_move(v,Sdst)(見(jiàn)2.3節(jié));
更新增益結(jié)構(gòu)和禁忌列表;
Balance_move(v,Sdst)(見(jiàn)2.3節(jié));
更新增益結(jié)構(gòu)和禁忌列表;
Until候選點(diǎn)的收益值都小于等于零
③ 執(zhí)行懲罰策略(見(jiàn)2.5節(jié));
End If
End While
首先將圖分為K個(gè)小圖,為了證明本算法是否與初始劃分有關(guān),本文列出了3種初始的圖劃分.
1) Hash.Pregel,GraphLab采用此方法,根據(jù)index=Hash(ID) modK將頂點(diǎn)映射到第index個(gè)分區(qū),K為分區(qū)數(shù).此方法時(shí)間復(fù)雜度很低O(V).
3) Metis.Metis屬于多級(jí)劃分,分為3個(gè)階段——粗化、劃分、細(xì)化.粗化階段是壓縮圖的規(guī)模,時(shí)間復(fù)雜度大于O(|E|);粗化后的圖用KL等算法進(jìn)行劃分,時(shí)間復(fù)雜度為O(N3),N為粗化后的頂點(diǎn)數(shù),細(xì)化是將圖恢復(fù)成原圖并且在恢復(fù)過(guò)程中不斷調(diào)整優(yōu)化,時(shí)間復(fù)雜度大于O(|E|);Metis劃分整個(gè)過(guò)程時(shí)間復(fù)雜度大于O(2×|E|+N3).
桶結(jié)構(gòu)首次被Fiduccia和Mattheyses提出[2],是為了改進(jìn)2劃分的KL算法,把所有相同收益值的點(diǎn)放在木桶結(jié)構(gòu)中的相同位置,根據(jù)收益的大小進(jìn)行移動(dòng)操作,時(shí)間復(fù)雜度明顯降低.Benlic等人[22]提出了針對(duì)K-劃分的木桶結(jié)構(gòu).但是其隨著子域數(shù)量的增加,所消耗的內(nèi)存也是急速地增加,本文也提出了針對(duì)本算法的結(jié)構(gòu).
首先計(jì)算候選點(diǎn)的收益值,將點(diǎn)插入到對(duì)應(yīng)的收益值列表中,對(duì)應(yīng)相應(yīng)的目標(biāo)子域,每次將最大收益值對(duì)應(yīng)的點(diǎn)移向目標(biāo)子域.另外,還增加了鄰居列表和鄰居所在列表位置的列表,當(dāng)點(diǎn)v發(fā)生移動(dòng)時(shí),我們只需要根據(jù)索引更新點(diǎn)v和點(diǎn)v周?chē)従狱c(diǎn)的值,每次更新所需要的時(shí)間復(fù)雜度與點(diǎn)v的鄰居數(shù)有直接的關(guān)系,同樣也大大減少了計(jì)算量.圖2舉例說(shuō)明了將例圖劃分為3個(gè)子圖的增益結(jié)構(gòu).
Fig. 2 An example of gain struct for 3-partitioing圖2 例圖劃分為3個(gè)子域的增益結(jié)構(gòu)
為了在候選點(diǎn)上執(zhí)行局部?jī)?yōu)化操作,采用的操作策略:
如果移動(dòng)之前|Vsrc|<|Vdst|,那么移動(dòng)之后在子域Sdst選擇某一點(diǎn)v,滿足g(v,Ssrc)≥0,移向目標(biāo)子域Ssrc.但是如果對(duì)于Sdst中任意的點(diǎn)收益值g(v,Ssrc)<0,則不移動(dòng).
2) 子域負(fù)載限制操作{balance_move(v,Sdst)}.對(duì)于同一個(gè)子域來(lái)說(shuō),每次迭代可能會(huì)有很多點(diǎn)從不同子域轉(zhuǎn)移過(guò)來(lái),造成子域負(fù)載不平衡,因此設(shè)計(jì)了一種平衡操作,這種操作規(guī)定任意選擇2個(gè)子域Si和Sj,如果|Vi|>|Vj|,在子域Si中,選擇某一點(diǎn)v且g(v,Sj)≥0,將點(diǎn)v從Si移向Sj(如果|Vi|<|Vj|,執(zhí)行相反的操作),此操作也可以進(jìn)一步降低割邊率.
1) 禁忌列表(tabu list).本文所采用的轉(zhuǎn)移決策具有獨(dú)立性,局部的對(duì)稱性會(huì)導(dǎo)致無(wú)效的轉(zhuǎn)移,如1對(duì)互為鄰居的頂點(diǎn),在迭代中2頂點(diǎn)可能相互轉(zhuǎn)移到對(duì)方所在的子域中不斷地互相多次轉(zhuǎn)移,影響局部的收斂,為了防止此類(lèi)無(wú)效的轉(zhuǎn)移,規(guī)定:當(dāng)某個(gè)頂點(diǎn)從Si轉(zhuǎn)移到另一個(gè)子域Sj,在某個(gè)常數(shù)時(shí)間內(nèi)禁止返回原子域.該算法增加了禁忌表tabu list,禁忌長(zhǎng)度定義為t(v,Si)=border(|Vi|)×α,border(|Vi|)表示子域Si的邊界點(diǎn)個(gè)數(shù),α是一個(gè)因子,在本文中設(shè)α=0.05,每次擾動(dòng)之前,tabu list將清空重新計(jì)算.
2) 全局記憶結(jié)構(gòu)(global memory structure).由于擾動(dòng)具有隨機(jī)性,因此,在設(shè)定的擾動(dòng)次數(shù)下,用全局記憶結(jié)構(gòu)存儲(chǔ)劃分效果最好的一次擾動(dòng),但是也會(huì)相應(yīng)的增加內(nèi)存消耗.
為了跳出局部最優(yōu),本文設(shè)計(jì)了一種擾動(dòng)策略,選擇一個(gè)子域Si,在Si中任意選擇其中的γ個(gè)內(nèi)點(diǎn)(邊界點(diǎn)之外的點(diǎn)),每個(gè)點(diǎn)任意地移向其他子域Sj(Si≠Sj),γ=0.03×inside(|Vi|).點(diǎn)在不斷的移動(dòng)過(guò)程中,有些子域負(fù)載規(guī)??赡苓^(guò)大或過(guò)小,因此,引出2種懲罰措施.
以上2種策略,都是在收益值大于或等于零的情況下進(jìn)行移動(dòng),因此不會(huì)增加圖的割邊率.
本節(jié)對(duì)所提出算法的復(fù)雜度進(jìn)行分析,本算法的復(fù)雜度主要體現(xiàn)在初始劃分、擾動(dòng)以及擾動(dòng)之后的迭代時(shí)間,由于初始劃分的隨機(jī)性,因此設(shè)初始劃分的復(fù)雜度為O(t).擾動(dòng)次數(shù)為pertur_times,每輪擾動(dòng)之后的迭代次數(shù)為iter_number,擾動(dòng)之后總的迭代時(shí)間為pertur_times×iter_number.本文中每次擾動(dòng)的頂點(diǎn)數(shù)為0.03×inside(|Vi|),因此擾動(dòng)需要的時(shí)間復(fù)雜度為pertur_times×0.03×inside(|Vi|).整個(gè)算法時(shí)間復(fù)雜度O(t+pertur_times×iter_number+pertur_times×0.03×inside(|Vi|).
本節(jié)我們?cè)谡鎸?shí)圖上來(lái)測(cè)試本算法的可行性,介紹實(shí)驗(yàn)的具體步驟及平臺(tái)環(huán)境,展示實(shí)驗(yàn)的結(jié)果并對(duì)這些結(jié)果進(jìn)行分析.
實(shí)驗(yàn)中使用的真實(shí)圖數(shù)據(jù)來(lái)源于斯坦福大學(xué)網(wǎng)絡(luò)分析項(xiàng)目,詳細(xì)圖信息在表1中.算法用python語(yǔ)言編寫(xiě),在AMD phenom Ⅱ X4 955 4 GB上編譯測(cè)試.
Table 1 Experimental Data Sets表1 實(shí)驗(yàn)數(shù)據(jù)集
如圖3所示,我們用Hash,Chunk,Metis方法分別對(duì)圖loc-Gowalla進(jìn)行了初始的K-劃分(K=2,6,8,16,32,64),由圖3可以看出Hash的劃分結(jié)果最差,當(dāng)子域數(shù)量為64時(shí)割邊率幾乎達(dá)到了94%;Metis的劃分效果明顯優(yōu)于Hash和Chunk,隨著子域的增多,割邊比也會(huì)增加,但增幅明顯小于Hash與Chunk.
Fig. 3 Results of initial partitioning on loc-Gowalla圖3 基于Hash,Chunk,Metis的K-劃分
擾動(dòng)策略是跳出局部最優(yōu)的關(guān)鍵,因此也對(duì)擾動(dòng)策略進(jìn)行了實(shí)驗(yàn)分析,在圖4中,在沒(méi)有擾動(dòng)策略的情況下(即算法1中沒(méi)有步驟①)割邊率與迭代次數(shù)(iter_number)的關(guān)系,橫坐標(biāo)為迭代次數(shù),縱坐標(biāo)為割邊率.圖5展示了加入擾動(dòng)策略之后擾動(dòng)輪數(shù)與割邊率的關(guān)系,橫坐標(biāo)為擾動(dòng)次數(shù)(dister_number),縱坐標(biāo)為割邊率.
Fig. 4 Results of 16-partitioning on p2p-Gnutella8 without perturbation strategy圖4 p2p-Gnutella8上沒(méi)有擾動(dòng)策略的16-劃分結(jié)果
Fig. 5 Results of 16-partitioning on p2p-Gnutella8 with perturbation strategy圖5 p2p-Gnutella8上加入擾動(dòng)策略的16-劃分結(jié)果
如圖4所示,由于Hash的初始劃分的割邊率明顯高于Chunk和Metis,在沒(méi)有擾動(dòng)的情況下,Hash迭代收斂的次數(shù)最高,Metis收斂的迭代次數(shù)最少.加入擾動(dòng)之后,如圖5所示,割邊率都會(huì)有進(jìn)一步降低,隨著擾動(dòng)次數(shù)的增加,全局記性結(jié)構(gòu)里都會(huì)存儲(chǔ)最好的劃分結(jié)果,由結(jié)果可以看出,劃分結(jié)果質(zhì)量的優(yōu)劣與初始劃分沒(méi)有關(guān)系.
最后在表2中,分別取Hash,Chunk,Metis為本算法的初始劃分,結(jié)果取其平均值作為提出算法的劃分結(jié)果,括號(hào)中的數(shù)值為平衡因子.從表2中,可以看出DyBGP算法在割邊率上明顯提高,而且在平衡度上與Metis相比也有所提升,證明了所提出算法的有效性.
Table 2 Comparion of Our Approach (DyBGP) with Hash, Chunk and Metis表2 本文提出的方法(DyBGP)與Hash,Chunk,Metis結(jié)果比較
本文利用初始劃分的局部信息(邊界點(diǎn))通過(guò)啟發(fā)式策略調(diào)整點(diǎn)位置達(dá)到局部最優(yōu),為了擴(kuò)大搜索范圍,我們又定義了擾動(dòng)策略,用多種策略來(lái)確保圖的平衡劃分且最小化割邊率,實(shí)驗(yàn)數(shù)據(jù)也證明了此算法的有效性.平衡圖劃分有著廣泛的應(yīng)用,隨著大數(shù)據(jù)發(fā)展與應(yīng)用,在圖并行框架中起著重要的作用,未來(lái),我們將圖劃分運(yùn)用到具體的大圖迭代系統(tǒng)中,與具體的計(jì)算相結(jié)合,對(duì)于后續(xù)大圖算法的研究有很重要的意義.
[1]Dutt S. New faster kernighan-lin-type graph-partitioning algorithms[C] //Pro of ICCAD-93. Piscataway, NJ: IEEE, 1993: 370-377
[2] Fiduccia C M, Mattheyses R M. A linear-time heuristic for improving network partitions[C] //Proc of the 19th IEEE Conf on Electronic Design Automation. New York: ACM, 1988: 241-247
[3] Garey M R, Johnson D S, Stockmeyer L. Some simplified NP-complete graph problems[J]. Theoretical Computer Science, 1976, 1(3): 237-267
[4] Xu Jinfeng, Dong Yihong, Wang Shiyi. Summary of large-scale graph partitioning algorithms[J]. Telecommunications Science, 2014, 30(7): 100-106 (in Chinese)(許金鳳, 董一鴻, 王詩(shī)懿. 大規(guī)模圖數(shù)據(jù)劃分算法綜述[J]. 電信科學(xué), 2014, 30(7): 100-106)
[5] Johnson D S, Aragon C R, McGeoch L A. Optimization by simulated annealing: An experimental evaluation; part I, graph partitioning[J]. Operations Research, 1989, 37(6): 865-892
[6] Rolland E, Pirkul H, Glover F. Tabu search for graph partitioning[J]. Annals of Operations Research, 1996, 63(2): 209-232
[7] Rahimian F, Payberah A H, Girdzijauskas S, et al. JA-BE-JA: A distributed algorithm for balanced graph partitioning[C] //Proc of the 7th IEEE Int Conf on Self-Adaptive and Self-Organizing Systems. Piscataway, NJ: IEEE, 2013: 51-60
[8] Karypis G, Kumar V. A fast and high quality multilevel scheme for partitioning irregular graphs[J]. SIAM Journal on Scientific Computing, 1998, 20(1): 359-392
[9] Karypis G, Schloegel K, Kumar V. Parmetis: Parallel graph partitioning and sparse matrix ordering library[OL]. [2016-08-16]. https://www.research-gate.net/publication/238705993_Parmetis_Parallel_graph_partitioning_and_sparse_matrix_ordering_library
[10] Hendrickson B, Leland R. A multi-level algorithm for partitioning graphs[C] //Proc of ACM/IEEE Conf on Supercomputing. New York: ACM, 1995: 28-28
[11] Pellegrini F, Roman J. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs[C] //Proc of HPCN-Europe 1996. Berlin: Springer, 1996: 493-498
[12] Simon H D. Partitioning of unstructured problems for parallel processing[J]. Computing Systems in Engineering, 1991, 2(2/3): 135-148
[13] Karypis G, Kumar V. Multilevelk-way partitioning scheme for irregular graphs[J]. Journal of Parallel and Distributed Computing, 1998, 48(1): 96-129
[14] Grady L, Schwartz E L. Isoperimetric graph partitioning for image segmentation[J]. IEEE Trans on Pattern Analysis & Machine Intelligence, 2006, 28(3): 469-475
[15] Chen Ling, Li Xue, et al. Mining health examination records—A graph-based approach[J]. IEEE Trans on Knowledge and Data Engineering, 2016, 28(9): 2423-2437
[16] Malewicz G, Austern M H, Bik A J C, et al. Pregel: A system for large-scale graph processing[C] //Proc of the 2010 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2010: 135-146
[17] Low Y, Bickson D, Gonzalez J, et al. Distributed graphLab: A framework for machine learning and data mining in the cloud[J]. Proceedings of the VLDB Endowment, 2012, 5(8): 716-727
[18] Zaharia M, Chowdhury N M, Franklin M J, et al. Spark: Cluster computing with working sets[C] //Proc of the 2nd USENIX Conf on Hot Topics in Cloud Computing. Berkeley, CA: USENIX Association, 2010: 10
[19] Avery C. Giraph: Large-scale graph processing infrastructure on hadoop[OL]. [2016-08-16]. http://giraph.apache.org/
[20] Stanton I, Kliot G. Streaming graph partitioning for large distributed graphs[C] //Proc of the 18th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2012: 1222-1230
[21] Mehrdoost Z, Bahrainian S S. A multilevel tabu search algorithm for balanced partitioning of unstructured grids[J]. International Journal for Numerical Methods in Engineering, 2016, 105(9): 678-692
[22] Benlic U, Hao J K. An effective multilevel tabu search approach for balanced graph partitioning[J]. Computers & Operations Research, 2011, 38(7): 1066-1075
[23] Leskovec J, Kleinberg J, Faloutsos C. Graph evolution: Densification and shrinking diameters[J]. ACM Trans on Knowledge Discovery from Data, 2007, 1(1): 2
[24] Cho E, Myers S A, Leskovec J. Friendship and mobility: User movement in location-based social networks[C] //Proc of the 17th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2011: 1082-1090
[25] Leskovec J, Adamic L A, Huberman B A. The dynamics of viral marketing[J]. ACM Trans on the Web, 2007, 1(1): 228-237
DyBGP:ADynamic-BalancedAlgorithmforGraphPartitioningBasedonHeuristicStrategies
Li Qi1, Zhong Jiang1, and Li Xue2
1(CollegeofComputerScience,ChongqingUniversity,Chongqing400044)2(SchoolofInformationTechnologyandElectricalEngineering,UniversityofQueensland,Brisbane,Australia4072)
With the development of computing technology and the advent of the era of big data, the distributed computing has became a research hotspot. Iterative computation of big graph becomes the focus of the research. Reducing the communication data quantity between subgraph after effective partitioning, it is the key to improve the computational performance, because the existing algorithms are difficult to meet the requirements on both minimizing fraction of egdes cut and load balancing at the same time. In this paper, a dynamic-balanced algorithm for graph partitioning named DyBGP is proposed, and it is used to solve the problem of balanced partition. Based on ensuring the partitioning of subgraph boundary vertices optimal, the perturbation strategy to jump out of local optimum to expand the search space is used. Finally, our algorithm is verified the feasibility in the real-world graph, respectively from the balance coefficient and the scale of edges cut compared with the traditional algorithms, such as Hash, Chunk and Metis. In the number of edges cut, it is decreased about 40%, 30%, 5% with our algorithm under specifying perturbation times. In the balance coefficient, our algorithm is more optimized than Metis. The experimental results show that the algorithm is effective.
balanced graph partitioning; heuristic strategies; load balancing; distributed computing; local optimization
his PhD from Queensland University of Technology in 1997. His main research interests include opinion analysis from social media, big data analytics, knowledge discovery from sequences, mining distributed, high-speed, time-variant data streams, etc.
2016-09-09;
2017-02-21
國(guó)家“八六三”高技術(shù)研究發(fā)展計(jì)劃基金項(xiàng)目(2015AA015308);重慶市社會(huì)事業(yè)與民生保障科技創(chuàng)新專(zhuān)項(xiàng)(cstc2017shmsA0641)
This work was supported by the National High Technology Research and Development Program of China (863 Program) (2015AA015308) and the Social Undertakings and Livelihood Security Science and Technology Innovation Funds of CQ CSTC (cstc2017shmsA0641).
鐘將(zhongjiang@cqu.edu.cn)
TP301.6
LiQi, born in 1987. PhD candidate at the College of Computer Science, Chongqing University. His main research interests include data mining and graph computing, etc.
ZhongJiang, born in 1974. Recevied his PhD degree in computer science from Chongqing University in 2005. Professor and PhD supervisor. His main research interests include data mining, management information system, trusted computer system, service computing, etc.