胡毅 朱子江
摘? 要: 對于傳統(tǒng)云環(huán)境大數(shù)據(jù)聚類中的量子進(jìn)化方法的聚類精準(zhǔn)度比較低的問題,為了降低存儲開銷,提高數(shù)據(jù)管理能力與調(diào)度能力,提出將優(yōu)化粒子群算法作為基礎(chǔ)的云環(huán)境大數(shù)據(jù)聚類算法,對云環(huán)境大數(shù)據(jù)聚類原理進(jìn)行分析,將傳統(tǒng)模糊C均值聚類作為基礎(chǔ),通過粒子群聚類算法對大數(shù)據(jù)聚類算法進(jìn)行改進(jìn),從而實(shí)現(xiàn)空間分割,得出云存儲系統(tǒng)的海量數(shù)據(jù)模糊聚類。利用粒子群聚類方法分配聚類數(shù)據(jù)離散成本,得到數(shù)據(jù)聚類信息濃度;與粒子群優(yōu)化聚類約束條件結(jié)合,得到云環(huán)境大數(shù)據(jù)聚類中心最優(yōu)解。仿真結(jié)果表明,此算法的數(shù)據(jù)聚類精準(zhǔn)度比較高,具有良好的收斂性能。
關(guān)鍵詞: 大數(shù)據(jù)聚類; 云環(huán)境; 粒子群優(yōu)化; 空間分割; 模糊聚類; 仿真測試
中圖分類號: TN919?34? ? ? ? ? ? ? ? ? ? ? ? ? ? ?文獻(xiàn)標(biāo)識碼: A? ? ? ? ? ? ? ? ? ? ? 文章編號: 1004?373X(2020)14?0072?04
PSO?based big data clustering algorithm in cloud environment
HU Yi, ZHU Zijiang
(South China Business College Guangdong University of Foreign Studies, Guangzhou 410545, China)
Abstract: As the clustering accuracy of the quantum evolution method of the big data clustering in the traditional cloud environment is relatively low, a PSO?based big data clustering algorithm in the cloud environment is proposed to reduce the storage cost and improve the abilities of data management and scheduling. The principle of big data clustering in the cloud environment is analyzed. By taking the traditional fuzzy C?means clustering as the basis, the big data clustering algorithm is improved by means of the particle swarm clustering algorithm, so as to achieve the spatial segmentation and get the fuzzy clustering of mass data in the cloud storage system. The discrete cost of clustering data is distributed by means of the particle swarm clustering method to get the information concentration of data clustering, and is combined with the clustering constraint condition of particle swarm optimization to get the optimal solution of big data clustering center in the cloud environment. The simulation results show that the algorithm has high accuracy of data clustering and good convergence performance.
Keywords: big data clustering; cloud environment; particle swarm optimization; space division; fuzzy clustering; simulation testing
0? 引? 言
云計算概念是IBM于2007年提出的。云計算是并行處理、分布式計算、網(wǎng)格計算之后所發(fā)展起來的最新計算方式,其將各種互聯(lián)計算、數(shù)據(jù)、存儲和使用等資源整合,從而能夠?qū)崿F(xiàn)多層次虛擬化和抽象,用戶只需要和網(wǎng)絡(luò)連接,就能夠利用云計算強(qiáng)大的計算和存儲能力實(shí)現(xiàn)功能?;谠朴嬎惚尘?,大數(shù)據(jù)信息處理能夠?qū)崿F(xiàn)數(shù)據(jù)聚類,利用大數(shù)據(jù)的特征參量可以對數(shù)據(jù)進(jìn)行分析?;跀?shù)據(jù)聚類可實(shí)現(xiàn)大數(shù)據(jù)的創(chuàng)建,并且利用模式識別與診斷實(shí)現(xiàn)服務(wù)分析。
1? 云環(huán)境大數(shù)據(jù)存儲的設(shè)計
云計算是指通過現(xiàn)代互聯(lián)網(wǎng)對結(jié)構(gòu)模型與存儲空間進(jìn)行動態(tài)擴(kuò)展。要想以云計算作為背景,進(jìn)行分類挖掘與大數(shù)據(jù)存儲,首先就要實(shí)現(xiàn)大數(shù)據(jù)存儲機(jī)制架構(gòu)的創(chuàng)建。在云環(huán)境中,大數(shù)據(jù)存儲通過虛擬化存儲在計算機(jī)集群開展云計算部署,通過USB磁盤層、結(jié)構(gòu)層、計算機(jī)等構(gòu)成,企業(yè)利用終端就能夠使用,通過分布式計算機(jī)就能進(jìn)行計算。
云環(huán)境大數(shù)據(jù)存儲結(jié)構(gòu)如圖1所示。
利用圖1所示結(jié)構(gòu),將屋內(nèi)分配應(yīng)用到云計算虛擬機(jī)中。通過式(1)、式(2)實(shí)現(xiàn)優(yōu)化聚類算法,利用最優(yōu)解實(shí)現(xiàn)云計算背景中大數(shù)據(jù)特點(diǎn)聚類物理分配,公式為:
[x=12μ(1+μ+(μ+1)(μ-3))]
[x=12μ(1+μ+(μ+1)(μ-3))]
為了避免粒子陷入局部最優(yōu),實(shí)現(xiàn)大數(shù)據(jù)信息特征矢量Xi存檔,計算公式為:
[li(k)=(1-ρ)li(k-1)+γf(xi(k))]
設(shè)置聚類閾值為Nth,在Neff