Jiaxin Zhang ,Wenjia Luo,*,Yiyang Dai,Yuman Yao
1 School of Chemistry and Chemical Engineering,Southwest Petroleum University,Chengdu 610500,China
2 School of Chemical Engineering,Sichuan University,Chengdu 610065,China
Keywords:Cycle temporal algorithm Fault diagnosis Dynamic kernel principal component analysis Multiway dynamic kernel principal component analysis Reconstruction-based contribution
ABSTRACT Multivariate statistical process monitoring methods are often used in chemical process fault diagnosis.In this article,(I) the cycle temporal algorithm (CTA) combined with the dynamic kernel principal component analysis (DKPCA) and the multiway dynamic kernel principal component analysis (MDKPCA) fault detection algorithms are proposed,which are used for continuous and batch process fault detections,respectively.In addition,(II)a fault variable identification model based on reconstructed-based contribution (RBC) model that paves the way for determining the cause of the fault are proposed.The proposed fault diagnosis model was applied to Tennessee Eastman(TE)process and penicillin fermentation process for fault diagnosis.And compare with other fault diagnosis methods.The results show that the proposed method has better detection effects than other methods.Finally,the reconstruction-based contribution(RBC)model method is used to accurately locate the root cause of the fault and determine the fault path.
The scale of equipment and factories has increased with improvements in the data intelligence of chemical industry processes.Due to the flammable,explosive,toxic and corrosive nature of typical chemical processes,any accident can cause a major disaster,which in turn could cause huge environmental,social and economic losses.Effective and timely detection and location of fault locations is an important aspect of ensuring the safety of the current chemical industry process and product quality[1].Process monitoring (PM) is a widely adopted tool for process safety and quality enhancement [2].Generally,process monitoring can be divided into three categories:model-based methods,knowledge-based methods,and data-based methods.Data-based methods have been widely used in chemical processes in recent years [3-5].
According to different types of products produced,chemical processes can be divided into continuous processes and batch processes.Because the continuous process and batch process are multivariable control processes.Therefore,the multivariate statistical process monitoring(MSPM) method is one of the most commonly used methods in modern industrial processes.Among them,the principal component analysis (PCA) method in MSPM is the most widely used in chemical process [6-9].By projecting the data into a low-dimensional subspace containing enough variance information for normal operating data,PCA can effectively process highdimensional,noisy and highly correlated data.PCA application However,the PCA method has limitations for fault detection and fault identification in chemical processes with strong nonlinearity and dynamics.For the fault detection,Kuet al.proposed dynamic principal component analysis(DPCA) [10],Leeet al.[11]proposed kernel principal component analysis (KPCA) and Yanget al.[12]proposed dynamic kernel principal component analysis (DKPCA)for continuous process fault detection.Compared with continuous processes,batch processes have more operating stages,and the data patterns are usually high-dimensional.Before entering the monitoring model,data needs to be processed in multiway [13].Therefore,multiway principal component analysis (MPCA) [14],multiway kernel principal component analysis (MKPCA) [15] and adjacent principal component analysis (ADPCA) [16] have been successively proposed.Using the above various PCA models for fault detection,the monitoring space is divided into two subspaces,called the principal component subspace and the residual subspace.By constructingT2statistics and SPE statistics to explain the mean and variance information of the processes in the two subspaces respectively [17],when one of theT2and SPE statistics exceeds the control limit,it indicates that the system has detected a fault.For the fault identification part,the most widely used method based on the PCA model is the contribution of variable method[18].The method is based on quantifying the contribution of each process variable to a single principal component score.The contribution of each process variable to the principal component in the out-of-control state is added,which is called variable contribution [19,20].It is believed that the huge contribution of process variables may be the root cause of the fault.In recent years,many variable identification methods have been reported[21-23],but it is still difficult to identify faults,especially when the number of process variables is too large and the process is very complicated.The fault may be covered by the control loop.Therefore,it is very important to detect faults in time and capture the most useful information to identify the fault variable.The above-mentioned fault detection and identification based on the traditional PCA expansion method have some problems.For fault detection,when the amount of data is large and the matrix dimension calculated by the model is large,there are problems of computational redundancy and high fault detection delay.On the basis of the above problems,dynamic trend analysis (QTA) is one of the methods to solve the above problems.some researchers have also proposed some fault diagnosis methods based on QTA,such as:dynamic locus analysis (DLA) [24],dynamic time warping (DTW) [25],artificial immune system (AIS) [26],metric temporal logic (MTL)[27],metric interval temporal logic (MITL) [28] and signal temporal logic (STL) [29].This article improves the traditional temporal logic,proposes a cycle temporal algorithm (CTA) and combines the DKPCA and MDKPCA fault detection methods to solve the above two problems.CTA not only has the strong qualitative ability of temporal logic,but also can effectively optimize the big data calculation redundancy and computational complexity of chemical process through the two parts of cycle segmentation and cycle merge,so as to improve the accuracy of fault detection and reduce the fault time delay.For fault identification,the main problem is that the existing fault identification methods do not effectively extract fault information,and then effectively identify fault variables.The reconstruction-based contribution (RBC) model to concentrate the effective fault information in the principal component and residual subspace,retain the fault information to the greatest extent,and determine the fault variable and the cause of the fault is used in this paper.
On the basis of the above discussion,this article proposes the CTA and combines it with the DKPCA model and MDKPCA model to form the CTA-DKPCA and CTA-MDKPCA fault diagnosis model,respectively,to monitor continuous and batch processes.The proposed model uses temporal logic to set the threshold,the starting point and ending point of segmentation,and perform piecewise linear fitting on the process data,thereby reducing the dimension of the calculation matrix and improving the calculation accuracy.The CTA-DKPCA model and CTA-MDKPCA model can effectively collect useful information in the subspace,deal with the loss of information,and greatly improve the detection rate.Combined with an enhanced fault variable identification algorithm,the RBC method is used to reconstruct the variable contributions,to visually display the contribution rate of the variable (expressed as a percentage),to emphasize the strong correlation between variables and to extract fault information.In the continuous process,3 different types of faults are selected,and comparing the CTADKPCA model with similar derived algorithms in the Tennessee Eastman (TE) process indicates the advantages of the CTA-DKPCA method.Then,comparison with the 2-class SVM optimization algorithm proposed by Onelet al.[30] indicates the accuracy of the fault detection of the CTA-DKPCA model.For batch processes,choosing a typical fault and making comparisons with the MDKPCA method shows the advantages of the CTA-MDKPCA method.Finally,the RBC method is used to diagnose the key variables that cause the fault in the continuous process and the batch process,locate the root cause of the fault,and determine the path of the fault.
The rest of the paper is structured as follows.In Section 2,the cycle temporal algorithm is proposed,and combined with the DKPCA model and MDKPCA model,and the fault variable is diagnosed through the RBC method.In addition,the continuous and batch processes fault diagnosis models are obtained.In Section 3,the TE process and penicillin fermentation process are used to demonstrate the effectiveness of the CTA-DKPCA method and the CTA-MDKPCA model.Then,the RBC method is used to determine the root cause of the fault and the fault propagation path.In Section 4,we summarize our work.
The traditional PCA method has limitations for fault diagnosis of chemical continuous and batch processes with nonlinearity and dynamics.Therefore,there is a need for an algorithm that can perform calculations in batch and continuous processes,namely,the MDKPCA algorithm and DKPCA method.Specifically,the DKPCA is for continuous process fault diagnosis,and MDKPCA is for batch process fault diagnosis.
PCA is a popular dimension reduction technique to project the data onto a lower dimensional subspace that contains most of the variance in the data.However,PCA can only solve the linear correlation between data.For the strong nonlinear and dynamic characteristics of chemical process data,the PCA method has certain difficulties in calculating this type of data.Therefore,the KPCA method was proposed by Kumaret al.[31] through nonlinear improvement,and KPCA method is a kind of learning machine based on kernel.The data of original space is mapped to feature space by nonlinear mapping,and linear method is used to analyze in feature space.Thus,nonlinear problem of original space is transformed into linear problem of feature space.However,the dynamic capture of data requires the conversion of dynamic models,so a DKPCA model of non-linear and dynamic capture of data is formed.
Define the normal data set X containsmvariables,and each variable hasnobservations,the vectors at timetand augmented matrix Xscontaining the observations at the previousstime to reflect the relationship between the variables’ dynamic relationship.
Then use the dynamic matrix Xsto establish the DPCA through the PCA method,and then the dynamic characteristics can be analyzed.
where T is the score matrix;P is the load matrix;E is the residual matrix,which is the projection of the sample in the residual space.
There is a nonlinear mapping Φi(t:t-s):Rm→F,which maps the samples in the input space to the high-dimensional feature spaceF,and the covariance matrix of the samples in the feature space is calculated as follows:
Let the firstPeigenvalues satisfying Eq.(6) areThe corresponding eigenvectors form the load matrix V=of t he principal component.Let the coefficient matrix of V is α=,Among them,the number of principal componentspcan be used based on the principle of minimum mean square reconstruction error,which is obtained through cross-validation.
Given a new sample Xs,k,which mapsin the feature space,the principal component scorein the feature space can be obtained by projecting to each load vector υiin the direction ofi=1,···,p.Wheretkirepresents the projection ofin the νidirection,namely:
Gaussian kernel function is selected here as KPCA method’s kernel function,and the expression as follows:
The batch process is a repetitive production process,and its data collection has more one-dimensional “batch”components than the continuous process data collection.A three-dimensional data matrix X (I×J×K),can be used to represent the process data collection.In the matrix,Iis the number of batches,Jis the number of variables,andKis the number of sampling points.MDKPCA cuts the three-dimensional data matrix into batch and variable data blocks X (I×J),along the time axis,and each data block is arranged horizontally to the right to form a new two-dimensional data matrix X (I×JK),as shown in Fig.1.After the three-dimensional data matrix is expanded into a two-dimensional data matrix,the MDKPCA data processing and analysis process is equivalent to the DKPCA method.
The multiway method combined with the above DKPCA method can diagnose batch process faults.
Fault diagnosis is particularly important for system safety,and qualitative trend analysis (QTA) [32] is an important technology for fault diagnosis.The main idea of QTA is to use the measurement signal as a trend sequence based on primitives,which are constant,rising and falling.However,the traditional temporal logic fault diagnosis method based on QTA has some limitations.That is,the system scale is small,the number of variables and sensors are small.
In this section,QTA is extended to the time constraints,and a new fault diagnosis method is proposed.An extended QTA method is proposed;the cycle temporal algorithm (CTA),which combines the aforementioned DKPCA and temporal logic.DKPCA was used to obtain the principal component(PC).The selected PC saves most of the process data variance,and each PC is a linear combination of all process variables.Compared with the traditional QTA,the proposed method uses temporal logic to describe the dynamic characteristics of the process in time.The temporal logic is statistically learned from the collected process data through the comprehensive understanding of the time characteristics of a single variable in the process variables to the linear correlation time characteristics between them.This helps to improve the accuracy of fault detection and recognition.The segmentation of the cycle temporal algorithm is brought into the model calculation,which also improves the calculation speed.
2.3.1.Temporal logic
The temporal logic equation is summarized and defined as follows:
Fig.1.MDKPCA data matrix decomposition.
Given a data sequence of lengthn,x= 〈x1,x2,···xn〉.wherexrepresents the data point at time τ.The defined temporal logic satisfies the following:
?x[τ]?uif and only iff(x[τ])>0.
?x[τ]?T if an d only if “always true”.
?x[τ]?if and only if ?τ,∈[1,τ] satisfiesx[τ,]?φ.
?x[τ]?if and only if ?τ,∈[1,τ] satisfiesx[τ,]?φ.
?x[τ]??φ if and only if ? (x[τ]?φ).
?x[τ]?φ1∧φ2if and only ifx[τ]?φ1andx[τ]?φ2.
?x[τ]?if and only if one of the following conditions:
(1) If τ <b,the nx[τ]?T.
(2) If τ >e,then ?τ,∈[b,e] satisfies and ?τ" ∈[1,τ]satisfiesx[τ "]?φ1
(3) If τ ∈[b,e],then ?τ,∈[1,τ]satisfiesx[τ,]?φ1or ?τ,≤τ satisfiesx[τ,]?φ2,and ?[τ "]∈[1,τ,] satisfiesx[τ "]?φ1.
2.3.2.Construction of cycle temporal algorithm
To simplify the construction process of temporal logic,the calculation time is decreased and the optimal number of principal components is extracted.The CTA is improved based on temporal logic.The algorithm is divided into cycle segmentation process and cycle merge process as shown in Fig.2.
First,determine the start pointbLand end pointeLof the data segment for the data that needs to be divided,and perform a segmentation process and linear fitting according to the entire data sequenceX=The two new end points generated by the division are set toeL/2,The linear fitting equation is:kandcrepresent the slope and they-axis intercept,respectively.Then calculate the size of the error generated by this linear fitting.If the linear fitting error err is greater than δewhich calculated from the data segment equation above,the two sequencesX1andX2after twice cycle are divided into two again.Then,as in the previous steps,linear fitting and error calculation are performed on the segmented data segments respectively,until the linear fitting errorerr is not greater than the predetermined error threshold δe.The linear fitting error err is calculated as shown in Eq.(13):
If the err is less than the threshold but the data segment in the previous cycle also meets the condition,the length of the data segment is problematic.Therefore,the long data segment is re-circularly segmented at this time.Make the length of all data segments equal.Once a certain linear fit is obtained,according to the remaining data sequence,repeat the above steps until the entire data sequence is divided into data segment sequence,and each data segment in the sequence corresponds to a linear fit.The obtained segments are input into the model as input to the model for calculation,and after the segment output results are obtained,the merging process is executed.Merge two adjacent data segments into one data segment and perform linear fitting for the data segments to be combined.If the fitting error is not greater than the predetermined error threshold δe,merge the two data segments into one data segmentation;otherwise,it will not merge.Return the cycle to calculate the linear fitting error until the cycle to calculate err is greater than the threshold δe,and further judge whether the end point of the merged segment is the segment end pointeL.If the data points are not merged completely,return to continue to cycle to calculate the linear fitting error of the remaining data set,Until the merged data segment is merged into a complete data segment.It indicates that the start pointbLand the end pointeLare in the same data segment.
Based on the above segmented fitting method,cycle segmentation and cycle merge are performed for a given data sequence.A sequence of quaternions of degreeLcan be obtained:
In each quadruple,(ki,ci,bi,ei)(i∈[1,L]),kiandciare the slope andy-axis intercept of thei-th segment of the linear fitting,respectively.The integersbiandeirepresent the start and end times of the data segment,respectively,and the corresponding atomic predicate is shown in Eq.(15):
Based on the above,the temporal logic is reconstructed as:
Eventually,The CTA combines temporal logic to define the principal of the segmented cycle,determine the size of the cycle and the error threshold,and the final segment starting point and ending point.Complete the cycle segmentation part of the model data to reduce the dimensionality calculation of the matrix.The cycle merge part is the inverse process of the cycle segmentation part,which can merge the effective information obtained in the model.Through the effective merging and counting of the data,the calculation efficiency of the model is improved,the accuracy of the calculation result is improved,and the delay caused by the calculation is reduced.
2.4.1.Continuous process fault detection
As in the previous section,for continuous process fault detection,is the single load matrix divided by the CTA.In addition,let the column vectorz(J×1)represent the current observation data point,and dividezinto the correspondingLblocks based on theLvariable block in the previous section,which isBased on theT2statistics and SPE statistics of the global DKPCA model constructed above,two statistics calculation equations of the segmented DKPCA model are obtained:
Fig.2.The cycle temporal algorithm calculation process.
If one of the above 2+2Lstatistics exceeds the corresponding control limit,the system is considered to have fault.
2.4.2.Batch process fault detection
The fault detection process of the batch process is similar to the continuous process,and the statistical data areT2and SPE.Therefore,thestatistic and the SPEs,b,istatistic are calculated according to Eqs.(21) and (22):
wheretirefers to the score vector,Sirepresents the covariance matrix,andeiis representative of the residual vector,bmeans batch process.The control limits (control thresholds) of the two statistics are calculated according to the methods given in Eqs.(23) and (24):
In the Eq.(24),gis the weight,his the degree of freedom,and the values ofgandhare calculated by referring to the value of the SPE statistics at theith moment of the model.The calculation method isg=v/2mandh=2m2/v,wheremand v are the mean and variance of the SPE statistics at thekth time in the reference model,respectively.TheT2statistic is considered to approximately obey theFdistribution.
2.4.3.Fault identification
Once the system detects a fault,then the numerical method of the RBC model is used to identify the variable contribution rate.
The corresponding variable direction is unit vector ξ(J×1)with amplitudefs;in theith model,the corresponding model is unit vectorwith amplitudeExcept for the first component being 1,all other components are 0,and the variable reconstruction value of data pointziis:
The statistical reconstruction index is defined as:
For the continuous process,based on Eqs.(17) and (18):
For the batch process,based on Eqs.(21) and (22):
The contribution rate of thekth variable is defined as:
It can be seen from the above two Equations that the larger the contribution ratethe smaller the reconstruction index
Integrating the above methods,the CTA-DKPCA and CTAMDKPCA fault diagnosis models are obtained.The specific process is shown in Fig.3.
The fault diagnosis model of the continuous and batch process based on CTA-DKPCA and CTA-MDKPCA is divided into four parts.In the first part,we obtained the required optimal segmentation data and segmentation data fitting threshold through the calculation temporal logic.The second part is the model calculation.Through the calculation of the offline and online stages in the model,it can be concluded whether the detection result of each piece of data exceeds the limit,and whether there is a fault.The third part is the segmented merge part,which linearly merges the segmented data after calculation to determine whether there are faulty data positions in the whole process.The fourth part is the variable identification part.In the process of fault detection,the contribution rate of each variable is calculated by RBC method,and the cause of the fault is comprehensively analyzed according to the detection result.The specific implementation steps are as follows:
(1) Cycle segmentation process:
②Perform a linear fit to the initial process dataset Xc/b.The fitting parameters δe,b1,ande1are set by Eq.(12).
③The piecewise linear fitting error err is shown in Eq.(13).If the fitting error err is less than δe,the current segment is determined to be the final segment numberL;Otherwise,divideL=L+1,divide the current segment into two halves,update the end valueeL=eL/2 of the new segment,and then return to step 2 until err is less than the threshold δe.Go to step 4.
④After err reaches the threshold δe,continue to judge that the set end time pointeLhas reached the data lengthL.If it does not reach,it means that all data has not been trained.Return to step 3 to calculate the remain data in a cycle until the data is completely trained.Obtain newLsegment data.
⑤Input all the segmented data segments into the DKPCA/MDKPCA model for calculation.
(2) Model calculation
Off-line model:
①Collect normal segmentation dataof the continuous process or the batch processFor continuous process data,NandJare the sampling points and the number of variables,respectively.For batch process data,Iis the number of batch,Jis the number of variables,andKis the number of sampling points.
Fig.3.The fault diagnosis model of continuous and batch processes based on CTA-DKPCA and CTA-MDKPCA.
②Set the dynamic time correlation of the collected data to obtain the matrix Xs,c,i(N×J)and Xs,b,i(I×J×K)that capture the dynamics of the data.
③The continuous process segmented data Xs,c,i(N×J)enters step 4,and the batch process segmented data Xs,b,i(I×J×K)is reduced to two-dimensional data:Xs,b,i(I×JK).
④Standardize the segmented matrix Xs,c,i(N×J)/Xs,b,i(I×JK)to have a zero mean and unit error,and the value of αiis determined by Eq.(6):
⑤Calculate the kernel matrix Kcand Kbof the each block matrix.
⑥Calculate the number of principal component subspaces and residual subspaces.
⑦After calculating the standard control limit SPEs,c,i,lim,of the no fault condition by Eqs.(17),(18),(21),(22),enter step 8 of the online detection section.
On-line model:
①Collect new segmented process data Xc,i,new(N×J)/Xb,i,new(I×J×K).
②Set the dynamic time correlation of the collected data to obtain the matrix Xs,c,i,new(N×J)and Xs,b,i,new(I×J×K)that capture the dynamics of the data.
③The continuous process segmented data Xs,c,i,new(N×J)enters step 4,and the batch process data Xs,b,i,new(I×J×K)is reduced to two-dimensional data:Xs,b,i,new(I×JK).
④Perform the standardization process
⑤Calculate the new kernel matrix Kc,newand Kb,new.
⑥Centralize the kernel matrix by Eq.(9).
⑦Calculate the nonlinear principal component by Eq.(8):
⑧Combining the reference value of the non-fault condition,calculate the new data statistics,check whether theor SPEs,c,i,SPEs,b,istatistics exceed the control limitSPEs,c,i,lim,SPEs,b,i,lim.If one of the control limit is exceeded,it indicates that the segment has a fault condition;otherwise,it is identified as normal,and then return to step①to continue to detect a new set of process datasets.(3) Cycle merge part
②Starting from the number of merged sequencesi=2,merge adjacent sequences and calculate the merged fitting error err.
③Judge whether it exceeds the threshold δe;if yes,theni=i+1;go to step ④.if not,setL=L-1 and merge into the next group until err is greater than δe.
④ Whenerris greater than the threshold δe,it is further judged whether the current merging end point is the all data end pointeL,and if it is not reached,return to the step ③cycle merge.
⑤When the sequence is completed and all groups are equal merged,obtainSPEs,c,SPEs,bandSPEs,c,lim,SPEs,b,lim,the value ofishould be 1 greater thanL,and then the merge is completed.Determine whether the total statistics exceed the control limit or exceed the control limit,and then perform variable identification and analysis.If it does not exceed,it is normal.Detect the next set of process sequence.(4) Variable identification and cause analysis
(1) After the merged part,theor SPEs,c,SPEs,bstatistical information detection exceeds the limit,and variable identification is performed.
(2) According to the principle of variable reconstruction,statistical reconstruction indicators are determined,and the contribution rate ψ(zk)is calculated.
(3) According to the variable reconstruction and the contribution value of the detection result,when the detection result shows a fault,the reconstruction contribution rate and the contribution index of the cut variable are calculated to determine the cause of the fault and the path of the fault.
In this paper,the TE process and the penicillin fermentation process are selected to represent the continuous chemical process and the batch process,and the proposed fault diagnosis model is applied to the two chemical processes.Judge the validity of the proposed model.
TE process was created by Downs and Vogel in 1993[33]and is widely cited as a benchmark for studies in control and fault diagnosis.The flow chart of the TE benchmark process is presented in Fig.4 [34].
The TE benchmark process consists of five operating units,including the reactor,product condenser,vapor liquid separator,recycle compressor,and product stripper,including 11 operating variables and 41 measured variables.In addition,the platform has 21 different types of faults,and its data are nonlinear,strong coupling,time varying,etc.The dataset of the TE process is divided into training data and test data.Both the training set and the test set contain 52 observation variables and 960 observation values,respectively,of which the first 160 in the test dataset are under normal working conditions,and the data from 161 to 960 are fault data.We take the TE process as the experimental object to evaluate the proposed algorithm.The proposedalgorithm is devotedtochemical industryprocesses.TE benchmark process faults are shown in Table 1.
Table 1 Faults for the Tennessee Eastman process
To verify the superiority of the CTA-DKPCA model for fault detection,the CTA-DKPCA model is used to analyze the causes of faults under different fault types.
3.1.1.Fault detection results
The number of PCs selected in this paper is calculated by cumulative variance percentage (CPV) in this paper.The calculation of CPV is shown in Eq.(31).
where λ is the characteristic value of covariance,jis the number of variables,andNis the number of principal components.The number of PCs in the final selection is shown in Fig.5.
The number of PCA and its derived algorithms is determined according to the cumulative method percentage greater than 85%.Finally,the number of PCA,KPCA,DKPCA and CTA-DKPCA is9,17,22 and 12 respectively.In the calculation of the CTA,the relationship between the number of cycles and the error is shown in Fig.6.
Fig.4.TE benchmark process.
Fig.5.The number of PCs calculated by different method.
Fig.6.The relationship between the number of cycles and the error.
It can be seen that for the TE process loop 5 times,32 data segments are divided,and the error reaches the threshold requirement.Then,we compare the monitoring performance of the CTADKPCA algorithm with the PCA,KPCA and DKPCA algorithms.The normal samples are used to train the monitoring model.The kernel width of the radial basis function is set to 800,and the confidence is set to 95%.This paper uses the fault detection rate (FDR) to test all 18 faults in the TE process.
Table 2 lists the fault detection rate results of these 18 faults.It can be seen from the remaining 18 faults that CTA-DKPCA shows better fault detection performance under most faults.
The CTA-DKPCA method has a higher detection accuracy for most faults than the other PCA methods,In theT2subspace,FDR for faults 4,5,8,10,and 21 reaches 100%,In the SPE subspace,FDR for faults 4,5,7,18,19 and 20 reaches 100%.Under other faults,the CTA-DKPCA fault detection algorithm proposed in this paper also performs better than other algorithms.This indicates that the CTA-DKPCA method is very sensitive to fault.
To further quantify the monitoring performance,two indicators are introduced,the fault alarm rate (FAR)and the time delay (TD).Calculate as follows:
wheretdis the fault detection time andt0is the fault occurrence time.
The FARs ofT2and SPE for fault detection using the CTA-DKPCA method in the normal state of the TE process are 0.0012 and 0,respectively,indicating that the proposed CTA-DKPCA method has very low FAR.
The CTA-DKPCA method is compared with other fault detection methods,and the results are shown in Table 3.
The comparison with 2-class SVM method shows that the two methods are very similar in FDR.In terms of the two statistics of FAR and TD,the CTA-DKPCA method is better than the 2-class SVM under most fault conditions.Especially in fault 10,13 and 18,the method proposed in this paper is far superior to the 2-class SVM method.The CTA-DKPCA algorithm solves the redundancy problem caused by the fault calculation process,which accelerates the calculation ability of the algorithm and improves the detection accuracy.
Select three different types of faults 7,8,and 14 to show the detection effect,Fault 7 is the pressure loss of material C,which is related to the step change;Fault 8 is composed of A,B and C,which is related to random changes.Fault 14 is the fault of the reactor cooling water valve,which is related to the check valve.They are representative to a certain extent.Fig.7 shows the experimental results of the four algorithms corresponding to these three faults.The dotted lines corresponding to different colors indicate their control limits.
Based on the above experimental results,the following conclusions are drawn.Under the three fault conditions,the CTA-DKPCA algorithm is better than the other three methods.It is difficult for PCA to detect the fault correctly.KPCA introduces the nonlinear mapping of the kernel function to capture the nonlinear structure in the data,which has good separation abilities and good monitoring performance.However,due to the lack of dynamic data performance,DKPCA combines the advantages of KPCA and DPCA,which can express the dynamics of data through an augmented matrix,and the kernel function can also monitor the nonlinear process,but the problem of computational redundancy in the later stage of data calculation is larger and slightly weaker.CTA improves the DKPCA,uses temporal logic to perform cycle segmentation of data,eliminates the calculation redundancy caused by the large amount of data,and greatly improves the calculation accuracy and speed.
3.1.2.Fault diagnosis results
When a fault is detected,the RBC method described above is used to further diagnose the detected fault and determine the root cause of the fault to determine the cause and path of the overallfault.For the TE process,18 faults were diagnosed using the RBC method,and the root variable corresponding to each fault was obtained as shown in Table 4.
Table 2 Comparison of the FDR (%) of different methods in the TE process
Table 3 Comparisons with the Onel et al.(2-Class SVM) method
Table 4 The root fault variables of 18 faults in the TE process
Fig.7.The test results of four methods for three types of faults (a -Fault 7,b -Fault 8,c -Fault 14).
For different types of faults in the TE process,we are consistent with the three types of faults selected in the detection stage above.The three types of fault diagnosis results are shown in Fig.8.Wherein a fault is shown in blue part of the contribution rate exceeds the average fault,the black line represents the normal range of fault contribution rate.
It can be shown from Fig.8,the main variables that cause fault 7 are variable 45 (total feed volume),variable 7 (reactor pressure),and variable 13 (product separator pressure).From this,the causeof fault 7 can be clarified as when the total feed flow rate on stream 4 changes,the process variable reactor pressure and the product separation pressure are measured separately.So,to compensate for the reduced C header pressure,the total feed flow rate is increased by adjusting the flow valve on stream 4,which in turn will affect the reactor pressure and product separation pressure in the process.The changes in each variable are shown in Fig.9.
For fault 8,the diagnosis result shows that the main variable that causes fault 8 is process variable 47 (percentage of drain valve).When the drain valve discharge fails,the feed composition of A,B,C is disordered.The measured variables 7(reactor pressure)and 13(product separator pressure)will be affected,and then variable 16 (stripper pressure) will also be abnormal,and finally variable 20 (compression power) will be abnormal.Will be affected.The main variable changes are shown in Fig.10.
The analysis of Table 4 shows that the key fault variables of fault 4,fault 11,and fault 14 overlap.Therefore,the three faults are distinguished separately.
For fault 4,the variable that caused the fault is process variable 51,which is the reactor temperature.The other variables still show a normal status.Therefore,if the diagnosis result shows that only control variable 51 is faulty,then fault 4 is due to a step increase in the temperature of the condensate water inlet of the reactor,and other variables in the reactor do not respond,so the controller responds first.The condensate flow of the condenser of the balance system rises rapidly to cool down the reactor,thereby diagnosing the fault.The cause of fault 11 is a random fault at the inlet temperature of the reactor,and the detection time of the fault is slower than that of fault 4.While the condenser cooling water flow increased,the reactor temperature also increased slightly.The way that fault 14 occurs is as follows.If the temperature of the reactor cooling water increases,thereby losing the cooling effect on the reactor,we will observe a direct increase in the reactor temperature (measurement variable 9).Then,the controller attempts to reduce the elevated reactor temperature by increasing the flow of the condenser cooling water(process variable 51).On the other hand,if the temperature of the reactor cooling water is reduced due to the valve being stuck,thereby increasing the cooling effect on the reactor,we will notice a drop in the reactor temperature.In this case,the controller will reduce the condenser cooling water flow to achieve equilibrium.In addition,the third key process variable is the measured measurement variable 21,that is,the reactor cooling water outlet temperature,which is directly affected by the viscous reactor cooling water valve.Therefore,the difference between the three types of faults lies in the propagation time of the fault caused by the fault type.The diagnosis result of the fault variable can intuitively judge the cause of the fault.The influence of the fault variables of faults 4,11,and 14 is shown in Fig.11(a)-(c),respectively.
Fig.8.Three types of fault diagnosis results (a -fault 7,b -fault 8,c -fault 14).
Fig.9.Plot of the root cause variables (fault 7).
Fig.10.Plot of the root cause variables (fault 8).
Fig.11.The root cause of fault 4,fault 11 and fault 14 (a -fault 4,b -fault 11,c -fault 14).
From the above diagnosis results,it can be seen that for the 21 types of faults in the TE process,the RBC method can more accurately locate the process variables (control) that cause the fault and its related measurement variables to determine the cause of the fault and clarify the path of the fault through variable reconstruction.The method reduces the autocorrelation between variables,improves the cross-correlation of related variables,and achieves the purpose of locating the fault location in time.
Different fault conditions of the penicillin fermentation process are used to judge the effectiveness of the proposed CTA-MDKPCA method.In the process simulation,the microorganisms were grown in a batch operation within the first 40 h to achieve high cell density.Then,the fermenter was switched to the fed-batch mode,with a continuous feed of glucose as the substrate to maintain a high level.The growth rate of biomass promotes the synthesis of penicillin.The target product,penicillin,is also a secondary metabolite in the process,which is mainly secreted in the fedbatch stage.This process is characterized by nonlinear dynamics and multiple operating stages.The process flow diagram of the fed-batch penicillin fermentation process is shown in Fig.12 [35].
Fig.12.The process flow diagram of fed-batch penicillin fermentation.
In this study,the observation data of the 16 measurement variables given in Table 5 were collected for batch process fault diagnosis.
Table 5 Monitoring variables of the fed-batch penicillin fermentation process
The sampling time used is 0.1 h,and the total duration of each batch is 400 h,including the batch and fed-batch stages.The normal distribution of the measured variables during the fermentation process is shown in Fig.13.
Fig.13.Normal profiles of the monitored variables in the fed-batch penicillin fermentation process.
In our proposed method,only a complete batch of benchmark data is required,and the monitored data are checked for online fault detection.When the penicillin fermentation process is carried out for 80 h,the actual fault amplitude is 30.
The batch process is divided into cycles first,and the relationship between the number of cycles and the error is shown in the Fig.14.
Fig.14.The relationship between the number of cycles and the error.
This batch process’s cycle segmentation part is divided 7 times,divided into 128 data segments,the error reached below the error threshold in the 7th cycle.Input the segmented data into the MDKPCA model.The number of selected principal components is 6,which is less than the 8 principal components of MDKPCA.More fault information can be obtained,and subsequent fault detection results will be more accurate.
Then,we make a comparison with the MKPCA and MDKPCA methods for FDR,FAR and TD.The comparison results are shown in Table 6.
Table 6 The comparison results of CTA-MDKPCA,MDKPCA and MKPCA
The comparison with MDKPCA and MKPCA method shows that the batch process fault diagnosis model proposed in this paper outperforms the MDKPCA and MKPCA methods in FDR,FAR and TD statistics.Among them,the CTA-MDKPCA method’s FDR is 99.2%,and TD is 2 s,which proves that the CTA-MDKPCA method has greatly improved the accuracy and speed of fault detection.In addition,in terms of FAR,it is also superior to MDKPC and MKPCA algorithms,which proves that the algorithm proposed in this paper is a comprehensive optimization.
The detection results are shown in Fig.15.
Fig.15.The CTA-MDKPCA and MKPCA test comparison results (a -MKPCA,b -MDKPCA,c -CTA-MDKPCA).
The fault detection results show that the MKPCA method is not sensitive to faults.The fault situation introduced in this article is 80 h,but the MKPCA method has 60 h and 14 h response times in the principal component space and the residual subspace after the fault occurs,which is extremely large delayed the implementation of subsequent fault remedial measures.The fault detection results of the MDKPCA method show that although the MDKPCA method can respond to the fault at the point of fault,it may be affected by early data fluctuations,so it may cause misjudgments when detecting early faults.However,in the CTA-DKCPCA method,when a fault occurs,the principal component space and the residual subspace respond at the same time,and the occurrence of the fault can be detected quickly and accurately.
After detecting the fault of the batch process,the contribution of each variable is calculated.The diagnosis result is shown in Fig.16.
Fig.16.The percentage of fault contribution of each variable in the batch process.
The result of the contribution rate of the current fault shows that the stirring power of Variable 2 is increased by 80%of the fault range.From the SPE variable identification,the main error occurs in Variable 2;but for theT2identification result,the average fault contribution rate should be 6.25%.The substrate feed flow rate increases,thereby increasing the cell concentration,product concentration and reactor volume,which increases the fermentation reaction rate;and as the carbon dioxide concentration increases,the accumulated heat in the fermenter increases due to an overreaction,and finally,the cooling water flow rate increases to keep the system balanced.
For the penicillin fermentation process,CTA-MDKPCA can accurately detect faults in the batch process.The identification of two statistics through the RBC model shows the root cause of the fault,and the path of occurrence and propagation of the fault.
This paper proposes a CTA to improve the performance of chemical process fault diagnosis.Through the cycle segmentation and cycle merge process,the effective information is retained to the greatest extent and the calculation dimension of the matrix is reduced.The DKPCA and MDKPCA algorithms are embedded in the CTA and applied to the chemical continuous and batch process fault detection model.Improve the fault detection rate while reducing the fault time delay.When a fault is detected,it is input into the RBC model,which will determine the percentage of each variable’s contribution to the fault based on the principle of reconstruction contribution,comprehensively judge the cause of the fault,locate the fault variable and determine the fault path.
For continuous processes,to determine the advantages of the CTA-DKPCA model in the detection effect of the continuous process,the FDR of 18 faults in the TE process was compared with PCA,KPCA and DKPCA.The results show that in theT2and SPE statistics,the average FDR in the TE process of the CTA-DKPCA method is 98.63% and 98.74%,respectively.Comparing the CTADKPCA algorithm with the 2-class SVM algorithm,it can be seen that the CTA-DKPCA algorithm is better than the 2-class SVM algorithm in FDR,FAR and TD,which shows that CTA-DKPCA solves the computational redundancy of the large-scale data problem and greatly improves the calculation speed and accuracy.Finally,three different types of faults 7,8 and 14 were selected for comparison.CTA-DKPCA has better curve smoothness and detection effects than PCA and its derived algorithms.Then,the fault of the TE process are brought into the RBC model for diagnosis,and the root cause fault variables of each fault are obtained.Three types of faults consistent with the detection results are selected for diagnosis.In addition,the cause of the fault is also distinguished for the faults that contain the relationship of the fault variables.The results show that the CTA-DKPCA model and the RBC model for actual continuous chemical process faults have a high detection and identification effect.The calculation of the model can detect and determine the fault and its cause in time.
The CTA-MDKPCA model is applied to the fed-batch penicillin fermentation process;16 variables are selected as the main variables,and a fault condition is set for diagnosis.The results show that the CTA-MDKPCA model can accurately detect fault and that the RBC model determines the cause and propagation path of the fault.Under fault conditions,the CTA-MDKPCA model is compared with the MKPCA and MDKPCA methods in terms of FDR,FAR and TD.The FDR,FAR and TD of the CTA-MDKPCA model reach 99.2%,4.2% and 2 s,respectively.The advantages of the proposed model in fault detection are shown.In a variation from the continuous process and due to the multistage operation in the batch process,the recognition of the principal component space will not have a single variable exceeding 50%;this outcome will be supplemented in the next stage to achieve the existing balance.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
The authors gratefully acknowledge financial support from the National Natural Science Foundation of China (21706220).
Chinese Journal of Chemical Engineering2022年7期