唐美麗 胡瓊 馬廷淮
摘 ?要: 語音識別作為人工智能研究中不可或缺的一部分已經(jīng)逐漸滲透到人們的日常生活中。針對傳統(tǒng)語音識別方法不能很好地實現(xiàn)并識別復(fù)雜多變、非特定人語音的問題,文中提出利用在時間序列上關(guān)聯(lián)性較強的循環(huán)神經(jīng)網(wǎng)絡(luò)(RNN)建立語音識別模型??紤]到語音信號豐富的時頻信息表達,在特征提取環(huán)節(jié)進行改進,利用具有較好時頻分辨率的小波變換(WT)取代快速傅里葉變換(FFT)作為該模型的輸入;然后,采用隨時間展開的反向傳播算法(BPTT)進行特征學(xué)習(xí)與訓(xùn)練。在實驗測試中,首先,對比分析了基于小波變換的特征提取對識別效果的影響;其次,通過與傳統(tǒng)的HMM模型及BP神經(jīng)網(wǎng)絡(luò)的識別率做對比,驗證RNN神經(jīng)網(wǎng)絡(luò)可提高語音識別準(zhǔn)確率和穩(wěn)定性。
關(guān)鍵詞: 語音識別; 循環(huán)神經(jīng)網(wǎng)絡(luò); 反向傳播算法; 特征提取; 小波變換; HMM模型; BP神經(jīng)網(wǎng)絡(luò)
中圖分類號: TN912?34; TP391.1 ? ? ? ? ? ? ? 文獻標(biāo)識碼: A ? ? ? ? ? ? ? ? ? ?文章編號: 1004?373X(2019)14?0152?05
Research on speech recognition based on recurrent neural network
TANG Meili, HU Qiong, MA Tinghuai
(Nanjing University of Information Science & Technology, Nanjing 210044, China)
Abstract: Speech recognition as an indispensable part of artificial intelligence research has gradually penetrated into people's daily live. In allusion to the problems that the traditional method of speech recognition can not properly identify the complex and non?specific speech, establishing a speech recognition model based on recurrent neural network (RNN) with strong correlation in time series is propose in this paper. In consideration of the abundant time?frequency information of speech signal, the feature extraction process is improved, in which the wavelet transform (WT) with better time?frequency resolution is used as the input of the model to replace the fast Fourier transform (FFT). The back propagation time algorithm (BPTT) expanding with time is adopted to conduct the feature learning and training. In the experiment test, the contrastive analysis on the influence of the feature extraction based on wavelet transform on recognition effect was carried out, and the recognition rate of the speech recognition model proposed in this paper was compared with that of the traditional HMM model and BP neural network. By the above measures, the RNN neural network is proved that its accuracy of speech recognition rate and the stability of the recognition are improved to a certain extent.
Keywords: speech recognition; recurrent neural network; back propagation algorithm; feature extraction; wavelet transform; HMM model; BP network
0 ?引 ?言
隨著人工智能的迅猛發(fā)展,語音識別作為人機交互的樞紐工具而備受人們青睞,而且已經(jīng)初步應(yīng)用于手機、車載系統(tǒng)、搜索引擎、機器人、電子商務(wù)等多個領(lǐng)域。語音識別在應(yīng)用上的蓬勃發(fā)展使得對它的研究不斷更新和完善,傳統(tǒng)的模板匹配方法和統(tǒng)計學(xué)習(xí)方法對語音識別而言已趨成熟甚至出現(xiàn)了瓶頸[1],而利用人工神經(jīng)網(wǎng)絡(luò)進行語音識別因其突出效果而方興未艾。利用人工神經(jīng)網(wǎng)絡(luò)對語音進行學(xué)習(xí)與處理的優(yōu)勢在于神經(jīng)網(wǎng)絡(luò)的工作原理模仿了人腦神經(jīng)元的活動機理,通過各節(jié)點連接形成網(wǎng)絡(luò)結(jié)構(gòu)再輔之以自適應(yīng)算法完成識別過程。另一方面神經(jīng)網(wǎng)絡(luò)可映射復(fù)雜語音信號之間的非線性關(guān)系,對語音序列有強大的學(xué)習(xí)能力[2?3]。語音信號具有在時間序列上展開以及包含豐富的時頻信息兩個重要特點。傳統(tǒng)聲學(xué)模型雖然分析了各語音音子的內(nèi)部狀態(tài),但忽略了音子與音子之間相互影響的關(guān)系;而常用的人工神經(jīng)網(wǎng)絡(luò)雖然強調(diào)了語言音子之間的聯(lián)系,但內(nèi)部狀態(tài)之間沒有形成全連接而是以層與層的形式連接。鑒于以上方法的缺點,本文采用能彌補以上缺陷的循環(huán)神經(jīng)網(wǎng)絡(luò)進行語音識別的研究。