亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

        ?

        Speech Enhancement with Nonnegative Dictionary Training and RPCA

        2019-07-30 08:52:32,,,

        , , ,

        (1. First Military Representative Office of Air Force Equipment Department in Changsha, Changsha 410100, China; 2. College of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang 050000, China; 3. College of Information Science and Engineering, Yanshan University, Qinhuangdao 066000, China; 4. Shanghai Nanhui Senior High School, Shanghai 201300, China; 5. Institute of Command and Control Engineering, Arm Engineering University, Nanjing 210007, China)

        Abstract: An unsupervised single channel speech enhancement algorithm is proposed. It combines both the nonnegative dictionary training and Robust Principal Component Analysis(RPCA) so that we name it as NRPCA in short. The combination is accomplished by incorporating the nonnegative speech dictionary into the RPCA model, which can be learned via Nonnegative Matrix Factorization(NMF). With the NRPCA model, the method of Alternating Direction Method of Multipliers(ADMM) is applied for optimized solutions. Objective evaluations using Perceptual Evaluation of Speech Quality(PESQ) on TIMIT with 20 noise types at various Signal-to-Noise Ratio(SNR) levels demonstrate that the proposed NRPCA model yields superior results over the conventional NMF and RPCA methods.

        Keywords: speech enhancement; robust principal component analysis; nonnegative dictionary training

        Working as a pre-processor in speech recognition, speech coding, etc., speech enhancement has been a challenging research topic for decades[1]. It attempts to improve the quality of noisy speech and suppress speech distortion caused by interfering noise. Over the years, a large number of speech enhancement approaches, generally divided into two broad classes of unsupervised and supervised, have been proposed[2].

        Unsupervised approaches include a wide range of algorithms, such as Spectral Subtraction(SS)[3], Wiener[4]and Kalman filtering[5], Short-Time Spectral Amplitude(STSA) estimation[6]and methods based on statistical mathematical models of speech and noise signals[7]. Among those algorithms, one well-known type is the Robust Principal Component Analysis(RPCA)[8]model-based methods and their improved versions[9-10]. Under the hypothesis that the spectrogram of speech is sparse and the noise is low-rank, the RPCA-based approaches work by decomposing the noisy spectrogram into the sparse and low-rank components, which corresponding to the speech and noise parts of the noisy mixture, respectively. Under specific constraints, the separation of the sparse and low-rank components can be accomplished by solving an optimization problem of the mathematical model for RPCA with the proper optimization algorithm. These approaches have a significant advantage that they neither need to explicitly model nor require any prior knowledge of the noisy speech to be enhanced. Such an advantage makes the unsupervised approaches easily performed in real-world scenarios. However, its performance degrades severely in non-stationary noisy environments.

        Supervised speech enhancement algorithms[11]have been proposed to overcome this limitation and gain competitive superiority by making proper use of the prior knowledge of the speech or noise signal available. In particular, the Nonnegative Matrix Factorization(NMF) algorithm, one of the most powerful machine learning algorithms successfully applied in signal processing area, has been proved to be a popular tool for supervised speech enhancement[12-13]. The NMF theory is proposed by Lee and Seung[14]and it aims to project the magnitude/power spectrogram of speech onto a space spanned by a linear combination of a set of basis vectors and activation coefficients.

        With various types of speech enhancement algorithms, the combinations or fusions of different methods in a proper way can be effective for better performance[15]. In this paper, an unsupervised speech enhancement algorithm is proposed by combining the techniques of nonnegative dictionary training and RPCA. The proposed speech enhancement algorithm is unsupervised since we know nothing of the noisy mixture to be enhanced. Under the circumstances, we can make an assumption that its performance can be improved, if we can take some advantage of properly which is prior available. The nonnegative speech dictionary traditionally used in the NMF-based speech enhancement can be such prior, which is learned by using the sufficient training samples chosen from public datasets randomly. It is necessary to mention that the speech training samples are unrelated to the noisy speech to be enhanced. The combination of the proposed model (NRPCA) is accomplished by incorporating the nonnegative speech dictionary into the RPCA model. To decompose the noisy spectrogram into the speech and noise parts, mathematical model for the NRPCA is established with the constraint of the sparse and low-rank conditions. Finally, the enhancement process will be finished by solving the optimization problem of the NRPCA model via Alternating Direction Method of Multipliers(ADMM)[16]. Different from the multiplicative updates in the NMF, the ADMM is the optimization algorithm that can update variables separately and alternately by solving the corresponding sub-problems within its framework. Detailed analysis and comparisons of the convergence performance of the proposed NRPCA model with some state-of-the-art approaches will be executed in this paper.

        1 Framework of the proposed algorithm

        The proposed NRPCA-based speech enhancement scheme combines the techniques of RPCA and NMF. The overall diagram is illustrated in Fig.1, which mainly consists of a noise bases training process and an enhancement stage. In the beginning, the input noisy mixture is chopped into frames and Fast

        Fig.1 The processing flowchart of the proposed speech enhancement algorithm

        Fourier Transformation(FFT) is applied to compute the magnitude spectrogramYand its phase ∠Y. In the training process, the nonnegative speech dictionaryWsis learned by conducting NMF on magnitude spectrogram of the speech samples used for training. In the enhancement stage, the enhanced spectrogram of speech and noise are separated via the RPCA model combined with the trained nonnegative speech dictionaryWs. Finally, the time-domain speechsand noisencan be reconstructed by inverse FFT and overlap-add method.

        2 Nonnegative dictionary training via NMF

        (1)

        Variables off(1≤f≤F),r(1≤r≤R) andm(1≤m≤M) above represent the indices of frequency bins, speech bases and frames, respectively.F,RandMcorrespond to the total number of frequency bins, speech bases and frames, respectively.

        (2)

        3 Proposed RPCA model and its optimization problem solved via ADMM

        For speech enhancement, it is a common practice to assume that the clean speech is contaminated by an additive uncorrelated noise. Lets(t) andl(t) represent the clean speech and noise signal, respectively. The noisy speech can be computed as,

        y(t)=s(t)+l(t).

        (3)

        The technique of Short-Time Fourier Transform(STFT) will be applied to transform the signal of Equation (3) into the time-frequency domain:

        Y=S+L.

        (4)

        Then, according to the sparse and low-rank hypothesis for speech and noises, the RPCA can be employed to decompose the spectrogram of noisy speechYinto the sparse speech termSand the low-rank noise termL[18].

        (5)

        (6)

        With the assumption that better performance can be achieved if prior knowledge can be utilized properly, the nonnegative speech dictionaryWstrained via NMF, exactly the same nonnegative speech dictionary trained in the Section 2, is incorporated into the RPCA model in the form below:

        (7)

        (8)

        The Equation (8) can be rewritten with augmented Lagrangian function using Euclidean distance as below:

        (9)

        where,ΩY,ΩS,ΩHsandΩLare the scaled dual variables,ρis the scaling parameter that controls the convergence rate. As shown in Fig.2, the objective function shown in Equation (9) can be efficiently solved by the ADMM algorithm[8]. The value ofλis set in the same way with previous research[9]. The symbols ofS(·) is the soft-threshold operator. TheS+(·) denotes the nonnegative operation of the corresponding soft-threshold operatorS(·).

        Fig.2 The program code of the algorithm

        4 Experiments and results

        4.1 Experiments preparations

        The test clean speech examples consisting of samples lasting 25 seconds spoken by 2 males and 2 females are chosen randomly from the TIMIT dataset[19]. 20 types of noise from the Noizeus-92 dataset[20],babble,birds,buccaneer1,buccaneer2,casino,eatingchips,f16,factory1,factory2,hfchannel,jungle,leopard,m109,ocean,pink,rain,stream,thunder,white, are included. The signals are mixed at five different Signal-to-Noise Ratios(SNRs) levels from -10 to 10dB spaced by 5dB. The nonnegative speech dictionary is learned using 1000 clean utterances produced by 20 different speakers. The number of speech and noise bases is 40 each. All files are sampled at 8kHz sampling rate. To calculate the spectrograms, a window length of 64ms (512 points) and a frame shift of 16ms (128 points) are used.

        4.2 The chosen baselines and evaluation metrics

        In order to compare the performance of the proposed approaches, four unsupervised speech enhancement algorithms, including the Semi-supervised NMF(SNMF), one improved version of the traditional RPCA method in the magnitude spectrogram domain[9], the well-known SS[3], and a state-of-the-art Noise Estimation(NE) algorithm[21], are chosen as the baselines. The number of iterations for SNMF and RPCA is 200 where convergence can be observed in all the experiments. To evaluate the performance of the speech enhancement algorithms, the most frequently used criterion of Perceptual Evaluation of Speech Quality(PESQ) score is used to measure the speech quality. A higher score of the PESQ indicates better speech enhancement performance.

        4.3 Enhancement performance of the algorithms

        From the scores that all the algorithms reach in Fig.3(see page 368), we can find that the combination of nonnegative speech dictionary and RPCA makes a significant improvement for the metric of PESQ, scoring higher values than the traditional semi-supervised NMF-based speech enhancement and the baseline of RPCA. Compared with well-known unsupervised algorithms of SS and NE, the proposed algorithm has shown better performance for most of the considered 20 noise types. In terms of the performance of the noise estimation in Ref.[21], it outperforms the algorithms of SS, SNMF, RPCA and gains competitive results with the proposed algorithm.

        Fig.3 The PESQ values of all the algorithmsRemarks: The numbers are the mean values over five input SNR conditions

        The numbers in Tab.1(see page 368) are the average values of the five methods for 20 noise types at different SNR levels. We can see that proposed algorithm substantially achieves higher values than the SS, SNMF and RPCA methods under all SNR levels. Additionally, our proposed algorithm outperforms the NE algorithm at low SNR conditions (-10dB, -5dB, 0dB) while the performance degrades when the SNR condition improves (10dB). Moreover, when computing the improvements of the proposed method and RPCA over the SS at all SNR levels, we can see the improvement is more obvious at low SNR conditions (-10dB, -5dB, 0dB) than that at high SNR conditions (5dB, 10dB). It may be explained that the speech parts of the noisy mixture will be sparser at low SNR conditions, which makes the RPCA model more effective. Additionally, for some types of noise, such asfactory2, leopard andm109, the proposed algorithm may not improve greatly. The robustness may be different in various noise environments and the property of such noise may not fit the sparse and low rank assumption well in the proposed model.

        Tab.1 Average PESQ values of all methods for 20 noise types

        Remarks: 1) denote input SNR

        5 Conclusion

        This paper proposes an unsupervised speech enhancement algorithm by combining the techniques of robust principal component analysis and nonnegative dictionary training. The prior knowledge of the nonnegative speech dictionary trained via nonnegative matrix factorization is incorporated into the algorithm of robust principal component analysis. The optimization problem of the mathematical model describing the proposed algorithm is efficiently solved by the alternating direction method of multipliers. Experimental results under 20 noise types at different SNR levels demonstrate that the incorporation of nonnegative speech dictionary into the RPCA model may be a proper way for better performance. However, compared to the proposed algorithm with state-of-the-art algorithms, future research will be devoted to improve the performance of the proposed algorithm at high SNR levels.

        丝袜美腿亚洲一区二区| 色婷婷精品综合久久狠狠| 人妻风韵犹存av中文字幕| 青青草成人在线播放视频| 乱码av麻豆丝袜熟女系列| 成人做爰高潮尖叫声免费观看| 国产高清国内精品福利99久久| 亚洲一区二区三区资源| 国产高清成人在线观看视频| 无码av无码天堂资源网| 国产精品一区二区在线观看完整版 | 毛茸茸的女性外淫小视频| 日韩人妻熟女中文字幕a美景之屋| 国产成人乱色伦区| 天堂在线观看av一区二区三区| 久久精品伊人久久精品伊人| 国产精品久久久久久av| 国产老熟女狂叫对白| 无码国产日韩精品一区二区| 一区二区高清视频免费在线观看| 含紧一点h边做边走动免费视频| 欧美情侣性视频| 一本色道久久综合中文字幕| 美女扒开内裤让我捅的视频| 久久久久久国产精品免费免费| 国产人成精品免费视频| 亚洲高清一区二区三区视频| 国产一区二区自拍刺激在线观看| 成人免费看www网址入口| 久久狠色噜噜狠狠狠狠97| 有码视频一区二区三区| 日韩av无码中文无码电影| 国产免费久久精品国产传媒| 日韩在线中文字幕一区二区三区 | 美女脱掉内裤扒开下面让人插 | 亚洲av成人无码久久精品| 日韩欧美精品有码在线观看| 日韩女优图播一区二区| 我爱我色成人网| 在线精品日韩一区二区三区| 丰满人妻被持续侵犯中出在线|