亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

Fault Diagnosis for Rolling Bearings with Stacked DenoisingAuto-encoder of Information Aggregation

2019-09-17 07:46:02LiZhangXinGaoandXiaoXu

Journal of Harbin Institute of Technology(New Series) 2019年4期

Li Zhang, Xin Gao and Xiao Xu

(School of Information,Liaoning University,Shenyang 110036,China)

Abstract:Rolling bearings are important central components in rotating machines, whose fault diagnosis is crucial in condition-based maintenance to reduce the complexity of different kinds of faults. To classify various rolling bearing faults, a prognostic algorithm consisting of four phases was proposed. Since stacked denoising auto-encoder can be filtered, noise of large numbers of mechanical vibration signals was used for deep learning structure to extract the characteristics of the noise. Unsupervised pre-training method, which can greatly simplify the traditional manual extraction approach, was utilized to process the depth of the data automatically. Furthermore, the aggregation layer of stacked denoising auto-encoder (SDA) was proposed to get rid of gradient disappearance in deeper layers of network, mix superficial nodes' expression with deeper layers, and avoid the insufficient express ability in deeper layers. Principal component analysis (PCA) was adopted to extract different features for classification. According to the experimental data of this method and from the comparison results, the proposed method of rolling bearing fault classification reached 97.02% of correct rate, suggesting a better performance than other algorithms.

Keywords:deep learning; stacked denoising auto-encoder; fault diagnosis; PCA; classification

1 Introduction

Rolling bearing plays an important role in maintaining the normal operation of the entire machine[1]. The transform from a normal state to a fault state is a cumulative process, during which a timely prediction is crucial for industrial production and smooth economic progress. Different methods have been proposed to make such a diagnosis including signal processing, knowledge-based models, and deep learning methods[2].

Experienced maintenance men in other engineering fields can judge whether a machine runs normally by the sound features. For fault diagnosis, high identification accuracy depends on effective feature representations. However, it is difficult to extract valid characteristics because of the noises and complex structures in the observed signal. For this reason, a large amount of work relating to feature extraction and selection in fault diagnosis has been performed using different types of signals and algorithms. In this regard, studies based on machine learning techniques and statistical inference techniques have been performed from multiple aspects to improve the effectiveness of different fault state classifications, resulting in a number of classic and typical classification methods, such as vector machines[3-4]and random forest (RF)[5]. The important tasks for such studies are to effectively learn elemental feature information from complex and heterogeneous signals as indicators as well as accurately identify different bearing states of fault based on the indexes. However, capacities of diagnosis algorithms with simple architectures, such as one hidden layer neural network, have limitations when faced with complex non-linear relationships in fault diagnosis issues[6].

As a real-time online system, deep learning can improve the accuracy of sample detection, classification, and prediction[7], which can use historical training data and classify different faults for rolling bearings[8]without establishing a precise mode. A representative research called auto-encoder is a multi-layered feed-forward neural network[9], which was first proposed by Bengio in 2007. Aiming at improving the robustness and anti-interference ability of model, denoising auto-encoder was introduced by Wang et al.[10]A marginalized denoising auto-encoder for a non-linear representations model was proposed by Chen et al.[11]It was an improvement for DAE via marginalized noises of auto-encoder, which can reduce errors of reconstruction and the calculated number of models. Similarly, Gehring et al.[12]proposed a method by using stacked auto-encoder (SAE) to extract deep bearing features, whose main idea is to input noise before learning process. In the learning process, the model can reconstruct pure features from the infectant ones. Although fault diagnosis by deep learning method is widely used in different regions, it is a researchable task to further enhance accuracy in rolling bearing diagnosis[13].

In this paper, an improved method of stacked denoising auto-encoder is proposed to enhance classification accuracy with respect to complex sensory signals, in which an unsupervised learning algorithm and data deconstruction processes are used to achieve better features representations. The main states consist of the following steps. First, a denoising auto-encoder (DA) structure network with a single hidden layer was trained. Then, the DA output of the first layer was used as the input of the second single-hidden layer DA network structure and the previously trained weights and bias values were not updated. After that, the network was continuously trained, and the trained DA was connected and divided into encoder and decoder. Finally, the back propagation (BP) algorithm was applied to calculate the objective function and optimize the new network. The output of the hidden node of the aggregation layer was used as the input to train and adjust the support vector machine (SVM). Finally, classification tags were generated by D-S evidence method, and then the bearing fault data categories were obtained.

2 Improved Stacked Denoising Auto-encoder

2.1 Stacked Denoising Auto-encoder

Stacked denoising auto-encoder (SDA)[14-16]uses a common denoising auto-encoder as the basic unit of a deep network and forms a deep neural network by superposing layers and adding a single-layer Softmax classifier as the last layer. It utilizes a greedy algorithm and exploits the feedback of the output code of the previous layer as the input of the current layer. Each unsupervised learning process only learns one hidden layer, and each layer is trained through DA by minimizing reconstruction errors. When thek-th layer has been trained, the training of the (k+1)th layer will start. After all layers are trained, fine-tuning as the second training phase starts, which is the supervision phase of the entire deep learning network. Therefore, the measurement error of the supervision tasks should be minimized in the same way as that in training multi-layer sensors. The structure of SDA is shown in Fig.1, in which the arrow stands for input features, and each circle represents a neural node, different result is represented by corresponding output node.

Fig.1 Structure of SDA

Assume that the input sample isX=(x1,x2,…,xn), and the distribution of pulled randomized sample isX～N(0,σ2I), then the output of SDA is represented as

y=f?(X)=s(W×X+b)

(1)

whereWis the matrix of weights. After reconstruction, the de-coding can be expressed as

z=g?(Y)=s(WT×Y+bT)

(2)

Linear decoder and non-linear decoder are different from traditional denoising auto-encoder in terms of loss functions, and the loss function of linear decoder can be described by

(3)

wherexi,zi∈X,Z. In general,Zis not the precise reconstruction of input variables (whereXis the input vector), while it only highly approximates the original input vector under the condition of (X|Z). Therefore, the key problem is to optimize the reconstructive error function as

L(X,Z)∝-log2p(X│Z)

(4)

For Eq. (4), there are two cases of selection as follows:

Case1IfX∈Rd:X|Z～N(Z,σ2I), then

L(X,Z)=L2(X,Z)=C(σ2)‖X-Z‖2

(5)

In Eq.(5),C(σ2) is a constant decided byσ2.

Case2IfX∈{0,1}d:X|Z～B(Z), meaning non-linear one andXare the binary feature, then the reconstructive function of non-linear decoder can be derived from Eq.(6) as follows:

(6)

According to the definition of auto-encoder, it can be derived from the constrained condition equation as

W′=WT

(7)

Therefore, under the condition of Eq.(4), the main goal is to minimize the loss function of AE:

argθ,θ′minEq(X)[ρlog2p(X|Y)=fθ(x);θ′]

(8)

An extra penalty function of the optimization problems can be formulated as follows:

(9)

whereρa(bǔ)ndρjstand for the coefficient and average activation respectively[17].

When the network structure changes as the number of layers increases, sometimes the classification effects are not as good as those of superficial-layer networks. The reason is that when a deep network reverse propagates, the change rate of weight and bias value of underlying layers are far less than those of high layers, leading to the superposition of errors lack of deep expression. Furthermore, for each time, weight and bias value need to be updated and convergence is usually not good. Therefore, the improvement of SDA will be discussed.

2.2 Improvement of Hidden Layers

The improved structure of SDA is shown in Fig.2, in which the highest layer is an aggregation layer, which not only uses the previous layer as the input but also adds the inputs of multiple previous layers. Suppose that there arenhidden layers of the net, from each layeri,(i=1,2,…,n-1), part of original information is preserved while other information come to the next layer. Thus, it can contain relatively complete information in terms of expressive ability.

Technically speaking, there is no best method to divide weights in aggregation layer, since weights depend on the proportion between original quantity of information and reconstructive information. Empirically, the proportion of original information is less than reconstructed ones. Thus, the proportion of 0.561:0.439 was confirmed through a different experiment.

Fig.2 Hidden layers with aggregation layer

2.3 Improvement of Structure

Softmax is the most popular algorithm to solve industrial scale problems, while its advantages are not superior in terms of operation and efficiency compared with other algorithms. When the number of features is large, its logical regression is obviously degraded in performance. Although SVM can avoid many shortcomings of logical regression, it is structurally superficial and its ability to express complex functions is very limited. Hence, the structure uses SDA to train deep network and extract deep features of the input through fine-tuning as well as classifies the deep features of rolling bearing by the SVM method. Deep features were performed by D-S evidence[18]to obtain classification tags. The improved structure is shown in Fig.3.

Fig.3 Structure with SVM classifiers

2.4 Training Process of Improved SDA

Assume that the input layer dimension ism, the number of hidden layers isl, and the network of each layer is marked asn1,n2,...,nl. As the depth increased, the training duration also increased, which is described in Table 1.

Table1BackpropagationlearningprocedurefortheISDAmodel

StepProceduresStep 1Calculate the outputs of layers in the forward direction from the first hidden layer.Step 2Use output(s) of the previous layer to train the next layer. Sequentially, train the output value, weight, and bias ac-cording to sample data until the (l-1)th layer. Step 3Use the output of the (l-1)th layer nodes as the input of the lth layer.Step 4Connect the trained network into two parts: encoder and decoder. The encoder part is the network from data input to the output of the final DA network, which is positive propagation. The decoder part is the network from the fi-nal auto encoding network to the original input data, which is reverse propagation.Step 5Set different proportions and aggregate information for training and testing data.Step 6Select the hidden node of the aggregation layer as the input of SVM. Train the SVM and adjust SVM parameters. So far, the whole ISDA process is completed.

3 Fault Classification Applications

3.1 Experiment Description

The datasets composed of multivariate vibration series were generated by a bearing test-rig as shown in Fig.4. The experiment used an acceleration sensor to collect vibration signals. The sensor was mounted on the motor housing by using a magnetic base, and the data was obtained when the bearing was in normal, outer race fault, inner race fault, and rolling fault states under loads of 0 HP and 2 HP. The fault diameter of all fault states was 0.007″, and the speed of bearing revolution was 1 797 r/min and 1 750 r/min, respectively. The sampling point at the driving end and the sampling frequency was 12 kHz.

Fig.4 Bearing test rig used for experiment

In the experiment, fault diameter of 0.007″ was used, and loads 0 HP and 2 HP were chosen from dataset. Detailed information is listed in Table 2.

Table 2 Bearing information

3.2 Data Processing and Health State Definition

The Drive End (DE) accelerometer data of the normal (ID: 97,99), inner race fault (ID: 105,107), rolling element fault (ID: 118,120), and outer race fault (ID: 130,132) conditions were acquired for classification.

Since vibration signals need to be inputted to the neural network and the activation function in the neural network needs to transform the features into the data of [0,1] interval distribution, the following normalization is required:

(10)

wherexioindicates thei-th feature parameter after the normalization, andxiis thei-th feature parameter after the preprocessing, whilexminandxmaxare the minimum and maximum values of the current feature parameters, respectively. Finally, to improve the noise processing capability of the algorithm in practical application, random noises with mean value 0 were added to the data, in which 70% of each sample was taken as the training set, 15% as the verification set, and 15% as the test set.

3.3 Established Model

An established optimal ISDA model had an impact on the diagnosis effect. Considering the stack characteristics of the deep learning process, the study in Ref.[21] conveys several parameters that have an important influence on the unsupervised learning process. In this section, receptive input size, numbers of hidden layers and units were tested to determine the optimal values in ISDA model. Reconstruction errors of auto-encoders were set to be the indicators for judging the employed parameters. Training set was used to provide data for reconstruction algorithm.

But I did. I was so weak my legs trembled. I could never have crossed that squirming deck if Benny hadn t supported me. The kid was stronger than he looked. He helped me down the steps and steered12 me to the bench where Dad was sitting with his head drooping13 on his chest.

(11)

(12)

3.3.1Inputsizeofreceptivevector

In general cases, the larger the input data is, the better the representations are. However, the receptive input size needs to be set based on computational constraints. To observe the changing influence, the experiment was first conducted based on the first auto-encoder, as shown in Fig.5. The reconstruction error clearly deceased when larger receptive input size ranged from 100 to 600. However, the subsequent values were rather stable and less than 0.12, when the receptive input size was greater than 550, which means that employed input features covered enough information. Therefore, nodes of the aggregate layer were set to 550.

Fig.5 Receptive input size

3.3.2Numberofhiddenlayersandnodes

Hidden nodes are important to achieve high performance in the ISDA model, so does the depth of deep architecture. Take the example of two auto-encoders (i.e., SDA and improved algorithm) into account, where input size was set to 550 based on the experiment above, as illustrated in Fig.6. It was noted that the two algorithms seemed to be satisfactory when the nodes were greater than 130. However, the SDA got the least reconstruction error which was 160, and ISDA was 150 (taking the last hidden layer into account). That is because with the aggregate layer, more information can be directly conveyed by the last layer without previous layers.

Fig.6 Number of hidden nodes of two algorithms

By taking all the experimental results into account, further experiments of multiple hidden layers of two algorithms were conducted to demonstrate the effectiveness of ISDA. Conditions of each experiment are listed in Table 3.

Table3Experimentalresultcomparisonoftwomethodswithdifferenthiddenlayers

MethodsReceptive input sizeHidden layersHidden nodesReconstruction error55021600.250 355031600.239 7SDA 55041600.177 5algorithm55051600.155 155061600.074 955071600.073 855021500.211 955031500.178 655041500.148 8ISDA algorithm55051500.094 155061500.069 555071500.068 355081500.068 2

In Table 3, it can be found that changing with a different number of hidden layers is not clear and the expectation was that the number of hidden layers would be greater than 6. However, compared with SDA, by using the improved SDA algorithm, lower error levels were achieved in general with 7 or 8 hidden layers. Both algorithms reached lower errors of approximately 0.070 (7%), meaning that they are capable of mining salient bearing feature representations. Thus, 7 or 8 layers with 150 hidden units were employed. Since they have 8 different bearing operating states, the output layer has 8 nodes.

4 Bearing Diagnosis with Improved SDA

As stated above, the method in this section is used to extract the hidden layer features of the test data. The outputs of the hidden layer of the trained model were extracted. In general, the fault characteristic of self-learning could be regarded as a type of dimension reduction process[22]. To verify the expressive ability of the extracted features, the principal component analysis (PCA)[23]method was used to extract the first three main components for visual analysis. Here, BP algorithm, RNN algorithm, and SDA algorithm were used to process the data set, where the fault diameter is 0.007″(mm). The visual analysis results were compared, as presented in Figs.7-9 (Note: 7 layers are shown in Fig.8(a) and Fig.9(a), and 8 layers are shown in Fig.8(b) and Fig.9(b), in which dots of same color represent a category).

Fig.7 PCA graph of PB

Fig.7 illustrates the output of the single-llayer BP neural network hidden layer node, which was processed by PCA. Figs.8(a) and (b) show the processing of the 7th and the 8th hidden layer nodes by SDA. Figs.9(a) and (b) show the processing of the outputs of the 7th and the 8th hidden layer nodes by ISDA. The faults can be clearly differentiated in Fig.7, but it is difficult to distinguish the different working conditions of each fault, indicating that the original information was not very well classified. In Figs.8-9, faults were clearly classified even when the number of layers increased. However, the distinction between working conditions 0,1,4,5 was still insufficient. Based on Fig.9, Fig.10 shows that the faults of working conditions 0,1,4,5 were accurately classified and each feature was highlighted, suggesting the improvement of the classification accuracy. Another experiment utilized the data in Table 2, but added with different types of noise, with the learning rate of gradient descent set to 0.01, the times of pre-training to 25, and the fine tuning to 50 cycles, as shown in Table 4. The classification rate can be performed as follows:

(13)

whereUpredictare the classifications from test set,Uactualare the correct labels, andUtotalare the total numbers of test set.

(a) PCA graph of RNN (7 layers)

(b) PCA graph of RNN (8 layers)

(a) PCA graph of SDA(7 layers)

(b) PCA graph of SDA(8 layers)

(a) PCA graph of ISDA

(b) PCA graph of ISDA

Fig.10PCAgraphofISDA

From the diagnosis table, methods of RNN, SDA, and ISDA exhibited high correct classification rates of above 85%. However, SDA and ISDA produced relatively higher accuracy and stability in most cases for different SNRs owing to their ability of dealing with complex non-linear data. Moreover, compared with SDA, it seems that the data destruction process in ISDA was superior in terms of bearing diagnosis under the working conditions of strong space noise, with an accuracy improvement of 0.64% at 17 dB as an example. Also note that, the RNN algorithm had good anti-noise ability when SNR changed from 15 dB to 16dB, as shown in Fig.11.

Table 4 Classification results

Fig.11 Methods with different SNRs

Fig.12 shows the training time for each algorithm, which became shorter when SNR increased from 14 dB to 18 dB. The training time was 371 s, 368 s, 376 s, and 64 s for ISDA, SDA, RNN, and BP at 18 dB. However, at 14 dB, the calculation time increased to 396 s, 398 s, 403 s, and 67 s, respectively. As mentioned above, the proposed method could achieve better diagnosis performance but it seems to be more time-consuming than other methods. It can be explained by their learning and back propagation mechanisms with aggregation layer. It is worth mentioning that compared with SDA at 14 dB, the improved algorithm received relatively less training time because the aggregate layer could contain previous information and data destruction could reduce back propagation time with better robustness.

Fig.12 Training time for different methods

5 Conclusions

A classification algorithm using SVM with ISDA is proposed in this paper to reduce classification error and improve accuracy of rolling bearing. The classification accuracy may be influenced due to samples with noises. ISDA algorithm was used to extract features automatically, while BP and RNN algorithms need vibration signals, and a wide set of features were used as the inputs in the neural network models. Aggregate layers can make different features of layers together to get better expression of information. This method effectively divided different features of rolling bearing with different SNRs. The experimental results demonstrated the effectiveness of the proposed approach by comparing with the traditional BP, RNN, and SDA. The diagnostic result showed that ISDA had better generalized performance, higher diagnostic accuracy, and stronger robustness than other algorithms.

At the same time, the optimal parameter determination such as learning rate and the choosing of parameters of classifiers are still challenges for the further improvement of this paper.

Journal of Harbin Institute of Technology(New Series)2019年4期

Journal of Harbin Institute of Technology(New Series)的其它文章: Combining Multi-scale Directed Depth Motion Maps and Log-GaborFilters for Human Action Recognition; The Optimization of Manufacturing Resources AllocationConsidering the Geographical Distribution; Modeling and Dynamic Analysis of Fractional-Order Buck Converterin Continuous Conduction Mode; Impact of the Real-time Bus Information on Bus Riders, an Evidencefrom Guangzhou; Analysis on Critical Force of Unidirectional CFRP Slender Column; Modified Independent Component Regression MethodWithout Prewhitening