亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

        ?

        Data Augmentation based Convolutional Neural Network for Auscultation

        2019-07-30 08:52:32,,,,2,,2

        , , , ,2, ,2

        (1. School of Computer Science and Technology, Fudan University, Shanghai 201203, China; 2. Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433, China)

        Abstract: Acoustic analysis has great potential for clinical application because of its objective, non-invasive and low-cost nature. Auscultation is an important part of Traditional Chinese Medicine(TCM). By analyzing a voice signal, we attempt to diagnose the syndrome of the subject by labelling them normal or deficient. In this paper, we explore a Data Augmentation based Convolutional Neural Network(DACNN) for auscultation. The idea behind this method is the use of Convolutional Neural Network(CNN) on imbalanced data with data augmentation for automatic feature extraction and classification. We conduct experiments on our auscultation dataset containing voice segments of 959 speakers (346 males and 613 females), which were labeled by two experienced TCM physicians. We demonstrate the effectiveness of data augmentation to overcome the imbalanced dataset problem. We also compare its performance with traditional machine learning methods. By using DACNN, we achieve 97.25% diagnosis accuracy for females and 95.12% diagnosis accuracy for males, with 1%—10% improvement in accuracy and slight improvements in other indicators over traditional machine learning methods. The experimental results demonstrate that the proposed approach is helpful for objective auscultation diagnosis.

        Keywords: acoustic analysis; traditional Chinese medicine; auscultation; convolutional neural network; machine learning method; data augmentation

        Auscultation is to utilize the auditory sense to differentiate the patient’s syndromes or perform disease classification[1]. In the theory of TCM, pathology or disease would occur if the Ying-Yang balance was disturbed in the human body. Irregular vibrations of the human body system would be apparent in some parts of the body such as speech sound. In primitive TCM, the accuracy of auscultation largely depends on the doctor’s professional level. Hence it is often considered qualitative, subjective and unreliable to some extent due to lack of objective quantification, especially when compared to western medicine.

        In recent years, the development of objective TCM diagnosis studies has relieved the issues mentioned above. Specifically, objective auscultation is often achieved by acoustic analysis. In general, most existing studies focus on extracting different acoustic features by signal processing methods. However, there is no specific feature corresponding to the process of acoustic diagnosis. For example, according to TCM theory, voice largely depends on Zang Qi; a normal person’s voice with sufficient Zang Qi usually sounds sonorous and steady, while a deficient person’s voice lacking Zang Qi is usually timid and weak. These characteristics may correspond to acoustic features such as energy, shimmer, Linear Predictive Cepatral Coefficient(LPCC), etc. We can utilize deep learning to perform automatic feature extraction which could avoid omitting features in some way.

        Deep learning has been successfully used for classification tasks in many domains, such as computer vision and speech recognition. However, few studies have used deep learning for auscultation diagnosis. In this paper, we propose DACNN that is, utilizing data augmentation (adding noise) to overcome the imbalanced dataset problem and then using CNN for automatic feature extraction and classification. We expect convolutional layers to automatically learn high-level features and the fully connected layers to differentiate a patient’s syndromes into normal and deficient.

        1 Related work

        Most related works use traditional machine learning methods with extracted signal features. Chiu et al.[2]extract four acoustic parameters (temporal parameters: zero-crossing rates, variations on peaks and valleys, as well as spectral parameters: variations on peaks and valleys, spectral energy ratios) and classify syndromes into non-vacuity, moderate qi-vacuity and severe qi-vacuity through logistic regression. In their later work[3], they utilize a non-linear method (fractal dimension parameters) for auscultation, which is proved slightly better than their previous work. Yan et al.[4]utilize Support Vector Machine(SVM) to differentiate syndromes into health, Qi-vacuity, and Yin-vacuity with wavelet packet transform and approximate entropy. Later, Yan et al.[5]focus on non-stationarity of vocal signal, which uses non-linear cross-prediction to extract features. Furthermore, they proved that auscultation features based on the fractal dimension combined with wavelet packet transform were conductive to differentiate healthy, lung Qi-deficiency and lung Yin-deficiency[6].

        In general, feature extraction by signal processing technique is the first step for traditional machine learning methods. On one hand, features correspond to TCM diagnosis principles such as zero-crossing rate, energy, jitter and shimmer[2-3]are often commonly selected. On the other hand, features frequently used in relevant areas (e.g. speech recognition, singer identification) are also considered, such as Mel-Frequency Cepstral Coefficients(MFCC)[7], LPCC[8]and LSP[9], etc.

        For machine learning methods, the most commonly used method is SVM[10], which finds the optimal hyperplane that separates two classes maximizing the margin between separating boundary[4-6]. Gaussian Mixture Model(GMM)[11], boosting[12], random forest[13]and Auto-Associative Neural Networks(AANN)[14]are also commonly used methods in related tasks.

        2 Methodology

        In this paper, we proposed DACNN for auscultation to differentiate a patient’s syndrome into normal and deficient. The overview of proposed DACNN method and traditional models can be seen in Fig.1.

        Fig.1 The overview of proposed DACNN method and traditional methods

        2.1 Data balancing

        The number of both syndromes of our dataset is imbalanced. This issue will be introduced in detail in Section 3.1. There are two solutions:

        (1) Weighting imbalanced data

        This method attributes normal instances more weight than deficient instances. For example, the number of deficient male instances is 6 times that of normal male instances. Therefore, we give more weight to normal instances when assigning classification.

        (2) Data augmentation

        This method uses data augmentation techniques to generate new ‘data’ with little changes. Commonly used techniques in audio fields are: time shifting, pitch shifting, time stretching, noise adding, and so on. Because we will utilize pitch related features, and time shifting may distort some important patterns, we chose to add random Gaussian noise. Note that we will constrain the amplitude of the noise so that it mimics noise in the environment without significant impact on the original audio.

        2.2 The architecture of DACNN

        We use Short-Time Fourier Transform(STFT) to transform voice signals from time-domain into spectral-domain. In this pre-processing step, each recording was split into multiple 10 ms long segments (Hamming windowed) with 50% overlap. Then the spectrogram is reshaped to 513×250 points, removing the area with almost no information. An example of input feature maps can be seen in Fig.2. We utilize CNN to automatically extract high-level features that can differentiate normal and deficient syndromes from input feature maps.

        Fig.2 The input feature map of a voice segment

        For DACNN architecture, we use three stacks of convolutional layers to transform input feature maps into a high-level feature representation. A set of 16 kernels (5×5) is used to convolve the input feature maps with stride one. Then max-pooling is down-sampled by 4×4 shape filters to reduce the dimensionality of feature maps. We use Rectified Linear Unit(RELU) as activation function to make it non-linear and fit for classification. The second and third convolutional layers are almost the same as the first one except we use 32 filters to convolve. There are three fully connected layers ended with a Softmax layer for classification. We also combine cross entropy withL2 regularization to prevent over-fitting. The architecture of DACNN can be seen in Fig.3.

        Fig.3 The architecture of proposed DACNN model

        3 Experiments

        3.1 Datasets

        All data in our dataset was collected and labeled by a TCM institution in China. Each recording segment contains a normal pitch of vowel /a/ vocalization of duration about 1—3s. The recordings are sampled at 50kHz with 16-bit resolution. Each voice recording was labeled normal or deficient by two experienced and professional TCM doctors. We removed all the recordings which have inconsistent labels.

        Tab.1 Detailed information of our dataset

        Considering the different acoustic characteristics between genders, we split the dataset into female dataset and male dataset. Finally, we got a collection of 959 voice recordings, containing 346 males and 613 females. More detailed information of our dataset is listed in Tab.1.

        Modern people are mostly of a sub-optimal health. The number of deficient people in our dataset is also much higher than that of normal people, which corresponds to our expectations. However, the imbalanced ratio between normal and deficient groups is an important issue we need to solve.

        3.2 Evaluation metrics

        Considering different acoustic characteristics between genders, we perform experiments with female and male samples separately. We use a 10-fold cross-validation method and utilize the indicators of accuracy, precision, recall, and F1 value to measure the performance of our results. The indicators are calculated as follows:

        (1)

        (2)

        (3)

        (4)

        whereTPrepresents true positive number,TNrepresents true negative number,FPrepresents false positive number andFNrepresents false negative number.

        The dataset is divided into a training set and a testing set containing 70% and 30% samples respectively.

        3.3 Hyper-parameter setting

        We utilize PyTorch framework to build and train our DACNN on GPU with a NVIDIA GTX1070. The dataset is split into mini-batches (50). Adam Optimizer is chosen with learning rate of 0.000 5. The decay weight ofL2 regularization is set to 0.000 1 and the maximum epochs of training is set to 100.

        3.4 Experimental results

        3.4.1 Comparison of two data balancing methods

        Consistent with previous experiments, commonly used features such as zero-crossing rate, energy, jitter, shimmer, MFCC, LPCC and LSP are extracted and combined into an 89-dimentional feature representation vector for each voice recording. Then Principal Components Analysis(PCA) is used to remove irrelevant features and avoid over-fitting. Most previous studies utilize SVM as the classifier; here we also choose SVM (with no parameter tuning) as the baseline method to compare two data balancing methods (weighting unbalanced data and data augmentation) with na?ve baseline model.

        Tab.2 Detailed information of augmented dataset

        For data augmentation method, we “create” some data for training as described in Section 2.1. Note that there is no augmented data in the testing set in order to evaluate the proposed method more accurately. The detail of augmented dataset can be seen in Tab.2.

        The results of baseline model and two proposed models can be seen in Tab.3.

        Tab.3 The comparison of data balancing methods

        From the results we find that, with data balancing, almost all the indicators improve, which indicates that both data balancing methods work. The data augmentation method has better performance, especially for accuracy; hence we will use this method for our later experiments.

        3.4.2 Comparison of proposed DACNN with traditional machine learning methods

        Our baseline models are commonly used machine learning methods with extracted features.Firstly, we extract features that are the same as Section 3.4.1. For comparison, we choose various commonly used traditional machine learning algorithms, such as SVM, GMM, adaboost, random forest and AANN with optimal parameter settings. The comparison of different classifiers (with optimal parameters) is shown in Tab.4.

        Tab.4 The comparison of data balancing method

        From the table we find that, by using CNN, we achieved 97.25% diagnosis accuracy for females and 95.12% diagnosis accuracy for males, compared to 95.15% diagnosis accuracy for females and 94.29% diagnosis accuracy for males by using best performance traditional methods. Among traditional machine learning methods, SVM’s performance did not demonstrate the highest accuracy (92.79% for females and 92.31% for males). But compared to other methods, it demonstrated good F1 values (0.961 6 for female and 0.940 3 for male), which indicates strong generalization ability. Adaboost and the Random Forest method had high accuracy (95.15%, 93.60% for females and 94.29%, 93.57% for males respectively), both of which can handle samples with high dimensional features well. Owing to its symmetric topology network architecture, AANN performed well (95.13% for females and 90.01% for males).

        The proposed DACNN method observed better performance over traditional machine learning methods on both male and female datasets. Its F1 value is high (0.970 0 for female and 0.950 4 for male), indicating great generalization ability. DACNN learned to recognize high-level features through its convolutional layers, and became adept at differentiating syndromes. Taken together, our results demonstrated the effectiveness of the proposed DACNN method.

        4 Conclusion

        In this paper, we proposed a DACNN method for differentiating the patient’s syndromes into normal and deficient. We perform experiments with female and male samples separately on our newly constructed dataset. We first compare two data balancing methods (data augmentation and weighting unbalanced data), and we demonstrate that data augmentation has better performance for the same classifier. Then we compare our proposed method with several traditional machine learning methods (SVM, GMM, Adaboost, random forest and AANN). The results show that DACNN achieved 97.25% diagnosis accuracy for females and 95.12% diagnosis accuracy for males, with 1%—10% accuracy improvement and slight improvements in other indicators. We demonstrate that, with high-level feature representation ability, the proposed DACNN method is helpful for objective auscultation diagnosis.

        In the future, we plan to expand data set with high quality label. Secondly, considering that recordings with inconsistent labels were removed, the remaining data has better distinguishable degree. It is challenging and meaningful to explore audio with controversial labels. Besides, we will explore to model both local feature and the temporal dependency for auscultation. Furthermore, we will try to differentiate syndromes into normal, Qi-deficient, and Yin-deficient.

        2021国产最新在线视频一区| 国产免费在线观看不卡| 91色老久久偷偷精品蜜臀懂色| 天堂网av一区二区三区在线观看| 777精品出轨人妻国产| 精品日产卡一卡二卡国色天香 | 亚洲国产av一区二区三区四区| 精品亚洲国产探花在线播放| 爆乳午夜福利视频精品| 日本精品熟妇一区二区三区| 国产中文色婷婷久久久精品| 蜜桃av在线免费网站| 日本另类αv欧美另类aⅴ| 中文字幕一区在线观看视频| 久久久久久av无码免费看大片| 无码日日模日日碰夜夜爽| 免費一级欧美精品| av影片手机在线观看免费网址| 国产无套中出学生姝| 麻豆婷婷狠狠色18禁久久| 人人澡人人澡人人看添av| 无码国产一区二区三区四区| 综合网在线视频| av男人操美女一区二区三区| 国产av一卡二卡日韩av| 疯狂三人交性欧美| 无码精品a∨在线观看十八禁 | 久久er这里都是精品23| 亚洲综合原千岁中文字幕| 黄色潮片三级三级三级免费| 日日碰日日摸日日澡视频播放| 东京道一本热中文字幕| 日本无遮挡吸乳呻吟视频| 97中文字幕在线观看| 蜜桃视频高清在线观看| 亚洲国产系列一区二区| 亚洲无线一二三四区手机| 99国内精品久久久久久久| 亚洲аv天堂无码| 国产猛男猛女超爽免费av| 一本之道久久一区二区三区|