黃文鋒
摘 要: 針對(duì)傳統(tǒng)的數(shù)據(jù)挖掘算法存在結(jié)構(gòu)復(fù)雜、耗時(shí)長(zhǎng)、數(shù)據(jù)分析過(guò)程中易出現(xiàn)錯(cuò)誤,數(shù)據(jù)計(jì)算結(jié)果難以準(zhǔn)確表達(dá)結(jié)果等缺陷,提出結(jié)合神經(jīng)網(wǎng)絡(luò)在數(shù)據(jù)挖掘中的應(yīng)用方法。由于神經(jīng)網(wǎng)絡(luò)擁有對(duì)噪聲數(shù)據(jù)承受能力高、錯(cuò)誤率低等優(yōu)點(diǎn),因此結(jié)合神經(jīng)網(wǎng)絡(luò)系統(tǒng)對(duì)數(shù)據(jù)挖掘算法進(jìn)行改進(jìn)設(shè)計(jì)可大幅度提高數(shù)據(jù)準(zhǔn)確性,該方法擁有結(jié)構(gòu)簡(jiǎn)單、表述清晰、精準(zhǔn)度高等優(yōu)勢(shì)。為基于神經(jīng)網(wǎng)絡(luò)的數(shù)據(jù)挖掘算法的可行性進(jìn)行了嚴(yán)謹(jǐn)?shù)膶?shí)驗(yàn)分析,對(duì)實(shí)驗(yàn)數(shù)據(jù)進(jìn)行認(rèn)真的記錄和研究,實(shí)驗(yàn)結(jié)果表明,基于神經(jīng)網(wǎng)絡(luò)的挖掘算法相比傳統(tǒng)數(shù)據(jù)挖掘算法,其精度明顯提高,且整個(gè)過(guò)程耗時(shí)較短,由此可證實(shí)基于神經(jīng)網(wǎng)絡(luò)的數(shù)據(jù)挖掘算法具有更高的實(shí)用性。
關(guān)鍵詞: 數(shù)據(jù)挖掘; 神經(jīng)網(wǎng)絡(luò); 粗糙集; 數(shù)據(jù)挖掘算法; 數(shù)據(jù)計(jì)算; 可行性分析
中圖分類號(hào): TN711?34; TP183 文獻(xiàn)標(biāo)識(shí)碼: A 文章編號(hào): 1004?373X(2018)14?0143?04
Design of mining algorithm based on improved neural network
HUANG Wenfeng
(Henan Provincial Institute of Scientific & Technical Information, Zhengzhou 450003, China)
Abstract: In allusion to the shortcomings existing in the traditional data mining algorithm for its complex structure, long time consumption, error easily appearing in data analysis process, and difficulty in accurate result expression of data calculation results, an application method of combining neural network in data mining is put forward. As the neural network has the advantages of strong ability to withstand noise data, and low error rate, improvement design of the data mining algorithm by combining with the neural network system can greatly improve the data accuracy. This method has the advantages of simple structure, clear expression, high precision and so on. A rigorous experiment analysis for the feasibility of the data mining algorithm based on neural network is performed on the basis of the recorded experimental data. The experimental results show that the mining algorithm based on neural network has higher accuracy and shorter time consumption than the traditional data mining algorithm, which can confirm that the data mining algorithm based on neural network is more practical.
Keywords: data mining; neural network; rough set; data mining algorithm; data calculation; feasibility analysis0 引 言
數(shù)據(jù)挖掘算法是從數(shù)據(jù)庫(kù)的數(shù)據(jù)中提取隱含有用信息的過(guò)程?,F(xiàn)階段常用的數(shù)據(jù)挖掘算法和理論主要集中于粗糙集理論和遺傳算法等相結(jié)合的方式[1]。為了解決傳統(tǒng)方法結(jié)構(gòu)復(fù)雜、耗時(shí)長(zhǎng)、錯(cuò)誤率高等問(wèn)題,提出改進(jìn)神經(jīng)網(wǎng)絡(luò)的數(shù)據(jù)挖掘算法。首先對(duì)神經(jīng)網(wǎng)絡(luò)進(jìn)行優(yōu)化和改進(jìn),以便減少誤差,然后基于改進(jìn)后的神經(jīng)網(wǎng)絡(luò)設(shè)計(jì)數(shù)據(jù)挖掘算法。該算法擁有魯棒性強(qiáng)、精準(zhǔn)度高、抗噪能力強(qiáng)、數(shù)據(jù)承受能力高、錯(cuò)誤率低等優(yōu)勢(shì),可廣泛應(yīng)用于機(jī)器學(xué)習(xí)等領(lǐng)域[2]。通過(guò)仿真實(shí)驗(yàn)對(duì)神經(jīng)網(wǎng)絡(luò)挖掘算法的功能進(jìn)行驗(yàn)證發(fā)現(xiàn)其對(duì)數(shù)據(jù)的搜索能力強(qiáng)大、擅長(zhǎng)全局搜索,且對(duì)數(shù)據(jù)挖掘的準(zhǔn)確性和實(shí)時(shí)性相對(duì)較高,可成功解決傳統(tǒng)方法中的諸多缺陷,具有較高的使用價(jià)值。
1 神經(jīng)網(wǎng)絡(luò)算法優(yōu)化設(shè)計(jì)
結(jié)合神經(jīng)網(wǎng)絡(luò)自身特點(diǎn)以及遺傳算法的優(yōu)勢(shì)對(duì)神經(jīng)網(wǎng)絡(luò)進(jìn)行優(yōu)化[3]。本文默認(rèn)網(wǎng)絡(luò)結(jié)構(gòu)為拓?fù)浣Y(jié)構(gòu),主要便于對(duì)隱層數(shù)及節(jié)點(diǎn)數(shù)進(jìn)行計(jì)算。由于人工干預(yù)常常會(huì)對(duì)網(wǎng)絡(luò)隱層數(shù)及節(jié)點(diǎn)數(shù)造成一定破壞,難以對(duì)其進(jìn)行準(zhǔn)確測(cè)量和計(jì)算,考慮通過(guò)遺傳算法對(duì)其進(jìn)行優(yōu)化從而避免人工干預(yù)次數(shù)[4]。
首先,對(duì)初始化網(wǎng)絡(luò)結(jié)構(gòu)進(jìn)行驗(yàn)證和確定,然后利用遺傳算法迭代產(chǎn)生權(quán)值優(yōu)化解[5]。設(shè)P為初始權(quán)值,按照遺傳算法對(duì)網(wǎng)絡(luò)隱層數(shù)進(jìn)行計(jì)算,在得到結(jié)果后進(jìn)一步觀察,若網(wǎng)絡(luò)節(jié)點(diǎn)數(shù)大概滿足:
[Pli→0?Pli→0, i=1,2,…,n;l=1,2,…,m] (1)
則可以考慮將網(wǎng)絡(luò)節(jié)點(diǎn)刪除。重新用遺傳算法產(chǎn)生的權(quán)值進(jìn)行訓(xùn)練,如果達(dá)到了滿意的結(jié)果,那么證明該算法有效,從而優(yōu)化了網(wǎng)絡(luò)結(jié)構(gòu)[6]。為了避免網(wǎng)絡(luò)優(yōu)化過(guò)程中出現(xiàn)誤差影響優(yōu)化結(jié)構(gòu),采用均方根誤差函數(shù)對(duì)系統(tǒng)進(jìn)行驗(yàn)證,計(jì)算方法如下:
[G(P)=i=1y(xy-P0)f] (2)
式中:G均方根誤差;xy表示在y隱層上的節(jié)點(diǎn)數(shù)為x;
f為適應(yīng)度函數(shù)值[7]。通過(guò)上述算法將誤差范圍縮小到最少,并通過(guò)反復(fù)計(jì)算直至誤差為零。由于引入了驗(yàn)證集導(dǎo)致驗(yàn)證集和訓(xùn)練集在可視化顯示方面存在較大差別,因此要在結(jié)合遺傳算法的神經(jīng)網(wǎng)絡(luò)中設(shè)置優(yōu)化初始權(quán)值和訓(xùn)練網(wǎng)絡(luò),以促進(jìn)可視化顯示達(dá)到最優(yōu)解的效果[8]。遺傳算法的全局搜索性會(huì)導(dǎo)致在以梯度下降法為基準(zhǔn)的網(wǎng)絡(luò)訓(xùn)練過(guò)程中,誤差在初始階段很快且與最優(yōu)解方向出現(xiàn)較大偏差,所以訓(xùn)練的開(kāi)始一段時(shí)間的圓圈處誤差下降很慢,為擺脫誤差情況要引入極小動(dòng)量項(xiàng)并且逐步調(diào)整搜索方向不斷在方向上與最優(yōu)解方向更靠攏[9]。
2 基于改進(jìn)神經(jīng)網(wǎng)絡(luò)方法的數(shù)據(jù)挖掘平臺(tái)設(shè)計(jì)
基于改進(jìn)神經(jīng)網(wǎng)絡(luò)的數(shù)據(jù)挖掘系統(tǒng)平臺(tái)強(qiáng)調(diào)數(shù)據(jù)挖掘算法過(guò)程中對(duì)神經(jīng)網(wǎng)絡(luò)應(yīng)用[10]。除了對(duì)數(shù)據(jù)重要信息進(jìn)行挖掘的基本系統(tǒng)功能外,該平臺(tái)可通過(guò)改進(jìn)后的神經(jīng)網(wǎng)絡(luò)實(shí)現(xiàn)數(shù)據(jù)挖掘過(guò)程中對(duì)數(shù)據(jù)進(jìn)行分類和聚類效果,同時(shí)可以對(duì)處理后的數(shù)據(jù)進(jìn)行重復(fù)的信息挖掘和檢測(cè)[11]。用戶根據(jù)挖掘的結(jié)果決定信息是否傳輸至數(shù)據(jù)庫(kù)進(jìn)行分類保存并對(duì)將來(lái)的新數(shù)據(jù)進(jìn)行分析[12]。圖1給出了基于神經(jīng)網(wǎng)絡(luò)方法的數(shù)據(jù)挖掘平臺(tái)的系統(tǒng)設(shè)計(jì)圖。
由圖1可知,基于神經(jīng)網(wǎng)絡(luò)方法的數(shù)據(jù)挖掘平臺(tái)可為用戶提供神經(jīng)網(wǎng)絡(luò)方面的數(shù)據(jù)挖掘服務(wù)或上傳數(shù)據(jù)。該系統(tǒng)對(duì)信息的挖掘處理方便高效,用戶從數(shù)據(jù)文件或數(shù)據(jù)庫(kù)中儲(chǔ)存的數(shù)據(jù)源進(jìn)行搜索和分析,并通過(guò)平臺(tái)的可視化交互功能對(duì)信息挖掘過(guò)程進(jìn)行準(zhǔn)確直觀的操作[13]。通過(guò)不同的神經(jīng)網(wǎng)絡(luò)算法對(duì)挖掘信息參數(shù)進(jìn)行設(shè)置并對(duì)挖掘結(jié)果評(píng)估,最后保存至神經(jīng)網(wǎng)絡(luò)模型,以便再次使用。圖2展示了建立和保存神經(jīng)網(wǎng)絡(luò)模型時(shí)序圖。
為了方便區(qū)分原始信息,圖2展示了神經(jīng)網(wǎng)絡(luò)模型對(duì)信息進(jìn)行保存和重建體系。通過(guò)該系統(tǒng)模型可以清晰命令挖掘用戶所需信息并對(duì)信息進(jìn)行保存以便日后使用[14]。對(duì)模型中保持的信息進(jìn)行應(yīng)用,其方法見(jiàn)圖3。
圖3所示的挖掘模型可進(jìn)一步提高信息挖掘性能。在對(duì)挖掘結(jié)果進(jìn)行篩選后可將數(shù)據(jù)上傳到數(shù)據(jù)庫(kù)或磁盤(pán)上,以便今后應(yīng)用時(shí),服務(wù)器可根據(jù)用戶選擇的名稱快速查找到其存放的位置并調(diào)出模型。為了保障挖掘過(guò)程準(zhǔn)確快速,建立了基于神經(jīng)網(wǎng)絡(luò)的挖掘模塊系統(tǒng),其工作流程如圖4所示。
基于神經(jīng)網(wǎng)絡(luò)的挖掘模塊系統(tǒng)有利于提高挖掘算法的準(zhǔn)確性。其擁有結(jié)構(gòu)簡(jiǎn)單、表述清晰、精準(zhǔn)度高等優(yōu)點(diǎn),方便對(duì)神經(jīng)網(wǎng)絡(luò)挖掘數(shù)據(jù)進(jìn)行分類和聚類的保存和傳送。
3 實(shí)驗(yàn)結(jié)果分析
將前文提出的基于神經(jīng)網(wǎng)絡(luò)算法應(yīng)用到非線性函數(shù)的逼近問(wèn)題中以驗(yàn)證算法的有效性。采用系統(tǒng)默認(rèn)參數(shù),對(duì)比改良前后神經(jīng)網(wǎng)絡(luò)與預(yù)期的非線性函數(shù)標(biāo)準(zhǔn)迭代次數(shù)曲線,得到結(jié)果如圖5所示。
由圖5可見(jiàn),通過(guò)對(duì)試驗(yàn)數(shù)據(jù)進(jìn)行比對(duì)不難發(fā)現(xiàn),在傳統(tǒng)神經(jīng)網(wǎng)絡(luò)系統(tǒng)中迭代次數(shù)曲線與預(yù)期標(biāo)準(zhǔn)曲線之間存在較大誤差,而文中提出的基于改進(jìn)神經(jīng)網(wǎng)絡(luò)的挖掘技術(shù)方法得到的曲線與預(yù)期的標(biāo)準(zhǔn)曲線基本一致,可證實(shí)該方法中誤差基本為零。
為了驗(yàn)證改進(jìn)神經(jīng)網(wǎng)絡(luò)的挖掘算法準(zhǔn)確性,進(jìn)行了2次實(shí)驗(yàn),分別對(duì)傳統(tǒng)挖掘算法、改良前的神經(jīng)網(wǎng)絡(luò)挖掘算法和改良后的挖掘算法精準(zhǔn)度進(jìn)程測(cè)量,得到對(duì)比結(jié)果如圖6所示。
根據(jù)實(shí)驗(yàn)結(jié)果進(jìn)行判斷,改進(jìn)神經(jīng)網(wǎng)絡(luò)后的挖掘算法無(wú)論從迭代次數(shù)還是從挖掘精準(zhǔn)度方面均比傳統(tǒng)算法及改良前的神經(jīng)網(wǎng)絡(luò)挖掘算法有所提高,且耗時(shí)相對(duì)較少。由此說(shuō)明了基于改進(jìn)神經(jīng)網(wǎng)絡(luò)的挖掘算法可以避免耗時(shí)長(zhǎng)、數(shù)據(jù)分析過(guò)程中易出現(xiàn)錯(cuò)誤等問(wèn)題,有較強(qiáng)的可行性。
4 結(jié) 語(yǔ)
通過(guò)對(duì)神經(jīng)網(wǎng)絡(luò)進(jìn)行改進(jìn),結(jié)合數(shù)據(jù)挖掘算法可有效提高挖掘算法的精準(zhǔn)度,且該方法能夠有效解決傳統(tǒng)挖掘算法中存在的不足,具有耗時(shí)少、精度高等優(yōu)點(diǎn)。通過(guò)仿真實(shí)驗(yàn)對(duì)該方法進(jìn)行驗(yàn)證,對(duì)比未改進(jìn)的神經(jīng)網(wǎng)絡(luò)處理的數(shù)據(jù)挖掘結(jié)果和傳統(tǒng)的挖掘結(jié)果,證實(shí)改進(jìn)后的神經(jīng)網(wǎng)絡(luò)挖掘算法效果良好、便于操作、適用性強(qiáng)。
參考文獻(xiàn)
[1] 王春梅.基于神經(jīng)網(wǎng)絡(luò)的數(shù)據(jù)挖掘算法研究[J].現(xiàn)代電子技術(shù),2017,40(11):111?114.
WANG Chunmei. Research on data mining algorithm based on neural network [J]. Modern electronics technique, 2017, 40(11): 111?114.
[2] 鄭斌.基于改進(jìn)遺傳算法的不完整大數(shù)據(jù)填充挖掘算法[J].微電子學(xué)與計(jì)算機(jī),2016,33(2):96?99.
ZHENG Bin. Incomplete data filling mining algorithm based on the improved genetic algorithm [J]. Microelectronics & computer, 2016, 33(2): 96?99.
[3] 李濱旭,姚姜虹.基于改進(jìn)QPSO和RBF神經(jīng)網(wǎng)絡(luò)的文本分類方法[J].計(jì)算機(jī)系統(tǒng)應(yīng)用,2016,25(7):264?267.
LI Binxu, YAO Jianghong. Document classification based on improved QPSO and RBF neural networks [J]. Computer systems & applications, 2016, 25(7): 264?267.
[4] 張華,王金蘭.基于物聯(lián)網(wǎng)和SOM算法的信息監(jiān)控系統(tǒng)設(shè)計(jì)[J].計(jì)算機(jī)測(cè)量與控制,2017,25(4):84?86.
ZHANG Hua, WANG Jilan. Design of information monitoring system based on Internet of Things and SOM algorithm [J]. Computer measurement & control, 2017, 25(4): 84?86.
[5] 田野,張程,毛昕儒,等.運(yùn)用PCA改進(jìn)BP神經(jīng)網(wǎng)絡(luò)的用電異常行為檢測(cè)[J].重慶理工大學(xué)學(xué)報(bào),2017,31(8):125?133.
TIAN Ye, ZHANG Cheng, MAO Xinru, et al. Research on abnormal behavior of power consumption based on BP neural network with PCA [J]. Journal of Chongqing Institute of Technology, 2017, 31(8): 125?133.
[6] 陸安江,金力,楊家紅,等.基于改進(jìn)的BP神經(jīng)網(wǎng)絡(luò)在車牌識(shí)別中的應(yīng)用研究[J].貴州大學(xué)學(xué)報(bào)(自然科學(xué)版),2015,32(6):71?74.
LU Anjiang, JIN Li, YANG, Jiahong, et al. Application research based on improved BP neural network algorithm in license plate recognition [J]. Journal of Guizhou University (Natural science), 2015, 32(6): 71?74.
[7] 張連濱,葛浙東,鞠明遠(yuǎn),等.改進(jìn)型神經(jīng)網(wǎng)絡(luò)在木件打磨機(jī)器人中的應(yīng)用[J].林業(yè)機(jī)械與木工設(shè)備,2017,45(4):19?22.
ZHANG Lianbin, GE Zhedong, JU Mingyuan, et al. Application of improved neural network in woodpiece polishing robot [J]. Forestry machinery & woodworking equipment, 2017, 45(4): 19?22.
[8] 趙建華.基于SOM神經(jīng)網(wǎng)絡(luò)的半監(jiān)督分類算法[J].西華大學(xué)學(xué)報(bào)(自然科學(xué)版),2015,34(1):36?40.
ZHAO Jianhua. Semi?supervised classification algorithm based on SOM neural network [J]. Journal of Xihua University (Natural science edition), 2015, 34(1): 36?40.
[9] 伍華麗,任心怡.基于模糊約束的數(shù)據(jù)庫(kù)特定數(shù)據(jù)挖掘算法[J].計(jì)算機(jī)仿真,2016,33(10):240?243.
WU Huali, REN Xinyi. Specific data mining algorithm based on fuzzy constraint database [J]. Computer simulation, 2016, 33(10): 240?243.
[10] 白麗揚(yáng),趙金海,劉占新,等.基于數(shù)據(jù)挖掘算法的底板破壞深度預(yù)測(cè)[J].煤炭工程,2017,49(6):92?95.
BAI Liyang, ZHAO Jinhai, LIU Zhanxin, et al. Depth prediction of floor damage based on data mining algorithm [J]. Coal engineering, 2017, 49(6): 92?95.
[11] 邢開(kāi)顏,李梅.數(shù)據(jù)挖掘分類算法在信號(hào)分類中的應(yīng)用[J].軟件,2016,37(6):1?6.
XING Kaiyan, LI Mei. Application of data mining classification algorithm in classification and recognition of signals [J]. Software, 2016, 37(6): 1?6.
[12] 任關(guān)友,王昕,李英娜,等.基于粒子群優(yōu)化神經(jīng)網(wǎng)絡(luò)算法的用戶防竊電研究[J].軟件,2017,38(8):215?219.
REN Guanyou, WANG Xin, LI Yingna, et al. Research on user′s anti?stealing power based on particle swarm optimization neural network algorithm [J]. Software, 2017, 38(8): 215?219.
[13] HUCHAO L I, SHAO A, DENGXIN H E, et al. Application of back?propagation neural network in predicting non?systematic error in numerical prediction model [J]. Plateau meteorology, 2015, 42(6): 1198?1201.
[14] THEANDER E, JONSSON R, SJ?STR?M B, et al. Prediction of Sjogren′s syndrome years before diagnosis and identification of patients with early onset and severe disease course by autoantibody profiling [J]. Arthritis & rheumatology, 2015, 67(9): 2427?2436.