亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

Data-driven optimal operation of the industrial methanol to olefin process based on relevance vector machine

2021-09-02 12:45:04ZhiquanWangLiangWangZhihongYuanBingzhenChen

Chinese Journal of Chemical Engineering 2021年6期

Zhiquan Wang,Liang Wang,Zhihong Yuan,Bingzhen Chen,*

1 Department of Chemical Engineering,Tsinghua University,Beijing 100084,China

2 State Key Laboratory of Chemical Engineering,Department of Chemical Engineering,Tsinghua University,Beijing 100084,China

Keywords: Methanol to olefins Relevance vector machine Genetic algorithm Operation optimization Systems engineering Process systems

ABSTRACT Methanol to olefin (MTO) technology provides the opportunity to produce olefins from nonpetroleum sources such as coal,biomass and natural gas.More than 20 commercial MTO plants have been put into operation.Till now,contributions on optimal operation of industrial MTO plants from a process systems engineering perspective are rare.Based on relevance vector machine(RVM),a data-driven framework for optimal operation of the industrial MTO process is established to fully utilize the plentiful industrial data sets.RVM correlates the yield distribution prediction of main products and the operation conditions.These correlations then serve as the constraints for the multi-objective optimization model to pursue the optimal operation of the plant.Nondominated sorting genetic algorithm II is used to solve the optimization problem.Comprehensive tests demonstrate that the ethylene yield is effectively improved based on the proposed framework.Since RVM does provide the distribution prediction instead of point estimation,the established model is expected to provide guidance for actual production operations under uncertainty.

1.Introduction

Ethylene and propylene are very important basic organic chemicals,and they are commonly regarded as the cornerstone of the modern chemical industry.Traditionally,ethylene is produced by thermal cracking of light naphtha,while propylene is produced by catalytic cracking of heavy oil.Triggered by Mobil’s pioneering work on conversion of methanol to olefins(MTOs)in the 1970s[1],extensive contributions have shown that MTO conversion is an important alternative to the aforementioned traditional light olefin production technology [2].Currently,UOP/Hydro and the Dalian Institute of Chemical Physics are the world’s leading commercial MTO technology suppliers/licensors.Since 2010,more than 20 commercial MTO plants have been put into operation.

Promoted by the mainstream commercial catalyst SAPO-34,highly exothermal MTO reactions occur at reaction temperatures of around 770 K [2].To guarantee that the reaction temperature is within the designed region,an external cooling system is used.However,coke generated by the reaction is distributed on the catalyst surface and subsequently lowers the catalyst activity.Similar to the fluid catalytic cracking process,the MTO process uses the circulating fluidized bed reactor–regenerator configuration.Using this configuration,the spent catalysts are moved to the regenerator in which coke deposited on them is continuously removed.The regenerated catalysts then go back into the reactor.

A very large number of studies on the reaction principles,kinetic models,catalyst synthesis,and process research and development of MTO conversion have been published[2–12].However,it should be noted that contributions to the optimal operation of MTO processes from a process systems engineering perspective are rare.Because of the complex reaction mechanism and configuration of the reactor–regenerator system,optimal operation of the MTO process is not by trial and error.Although first-principle based interpretable models are the main approach for optimal operation of chemical processes [13–15],accurate description of the MTO reactor–regenerator system by ordinary or partial differential equations is currently under investigation.The aim of this work is to establish a data-driven framework for optimal operation of the industrial MTO reactor–regenerator system to fully utilize the plentiful industrial data sets.

According to the reaction kinetic model proposed by Boset al.[3],the product distribution is largely dependent on the coke deposited on the surface of the SAPO-34 catalyst.About 4%(mass)of methanol is converted into complex aromatic compounds,which are also called ‘‘coke”.Coke is formed in the pores of the molecular sieve catalyst,and the catalyst pore size becomes smaller.When the coke content in the catalyst is high,according to the effect of the steric selectivity,the selectivity towards C4+products is low,in other words,the selectivity towards ethylene and propylene is high.Note that a uniform distribution of coke cannot be guaranteed.Hence,even under similar operating conditions,the product yield may be different.It is important and interesting to predict the probability of the distribution by a data-driven approach to empirically correlate the product yield and the operating conditions.

Here,based on a relevance vector machine (RVM) [16–18],we propose a data-driven approach for modeling the industrial MTO reactor.Neural networks (NN) [19–21]is a very powerful supervised learning method,and backpropagation is used to train its multilayer architecture directly.Vapniket al.proposed support vector machines (SVM) [22,23]to solve classification and regression problems.The SVM can handle nonlinear situations very well by introducing kernel functions.Compared with NN and SVM,the most prominent advantage of RVM is that it can give the distribution prediction,including the mean and standard deviation rather than the yield prediction.The accuracy of the mean prediction is not lower than that of point estimation by the traditional method.Additionally,the confidence interval can be described by the standard deviation,which will be helpful for decision making.A smaller standard deviation represents a higher degree of confidence.For example,when the same yields are predicted under two distinct sets of operating conditions,predicted distributions of the RVM model will be guidance for choosing the optimal conditions.Furthermore,RVM-based approaches usually require fewer model parameters,which significantly benefits training of the model.The trained RVM model serves as the constraint for the optimization model with the objective of maximizing the ethylene yield.Nondominated sorting genetic algorithm II (NSGA-II) [24–26]is used to solve the established optimization model.

The rest of this paper is organized as follows.In Section 2,the RVM regression model and the optimization framework based on the RVM prediction model are introduced.In Section 3,the results of the MTO case are provided to demonstrate the effects of the optimization algorithm.Finally,the main conclusions are summarized in Section 4.

2.Method Description

2.1.RVM regression model

The RVM is a machine learning algorithm based on Bayesian theory proposed by Tippinget al.[27].The RVM can tackle classification and regression problems.Clearly,predicting the product yield belongs to the regression category.The RVM and the SVM share several common features.For example,they have similar forms of linear functions,and both of them deal with nonlinear problems through kernel transformation.Furthermore,the RVM exhibits more advantages than the SVM.First,rather than point estimates,the RVM provides distribution predictions.The confidence interval is described by the standard deviation.Second,with similar fitting accuracy,the RVM is sparser and has fewer related sample points,so the prediction speed for new samples is faster.Third,the RVM has fewer parameters and can more easily determine the set of optimal parameter combinations[28–30].In addition,the RVM only has one main parameter,the kernel function parameter,and it is convenient for model training.

As mentioned above,the RVM can give the probability distribution of predicted target variablestfor sample input vector x:

where w is the weight vector and β is the noise accuracy,which are the model parameters to be trained.Similar to the SVM,the mean of the predicted distribution is a linear model:

whereNis the training set size,xnis the training set input data,wnandbare the weights and bias obtained during model training,andk（x，xn） is the kernel function.The inputs of theNobservations of the data set are represented together as the data matrix X=（x1，···，xN）T,and the corresponding target values are t=（t1，···，tN）T.The likelihood function can be expressed as

A Gaussian distribution with mean of 0 and precision of α is introduced for the weight parameter w:

For training,the values of most of the hyperparameters αitend to be infinity,indicating that the distributions of most weights are spikes with mean values of 0,which means that the weight values are likely to be 0.These sample points have no effect on the model.The RVM obtained after a large number of pruning is a sparse model.Because the prior distribution and likelihood function are both Gaussian,the posterior distribution of the weights according to the Bayesian theorem is also Gaussian:

where m and Σ are the mean vector and covariance matrix of the posterior distribution of weights.Φ is the design matrix with dimensions ofN×Mthat is obtained by transforming the input data X through the kernel function.For the Gaussian kernel function used in this paper,the design matrix can be expressed as

wheredis the width parameter of the Gaussian kernel function,which is also the only main parameter of the RVM.

The hyperparameters determine the parameters of the model,so the training model needs to first obtain the values of the hyperparameters.Here,we can use the second maximum likelihood estimation to calculate the values of the hyperparameters.The problem can be expressed as

An iterative formula for determining the hyperparameters can be directly obtained by derivation:

wheremiis theith element of the weight m and Σiis theith diagonal element of the covariance matrix Σ.

The trained model can provide the predicted distribution for the new sample input.Its mean is mTφ（x） and its variance is β-1+φ（x）TΣφ（x）,where φ（x） is obtained by performing a kernel function operation on the new sample input x and training set input X.The overall framework of the training and test process of the RVM is summarized in Fig.1.It consists of four steps.The first step is the determination of the design matrix using kernel transformation.The second step is the calculation of the model hyperparameters using the second type of maximum likelihood estimation.The third step is the determination of the model parameters based on the hyperparameters.The fourth step is the prediction of the probability distribution of the test samples.

2.2.Optimization framework based on the RVM prediction model

The trained RVM model provides the distribution of predicted target variables.Next,we need to determine the input variables when the target variable takes the maximum value to achieve the optimal operation.In optimization of practical problems with uncertainty,we hope to obtain a result with a high target value and high credibility,which can be described in probability theory as a high mean and concentrated distribution.It is difficult to judge whether the traditional data-driven model is reliable,but for the RVM model based on Bayesian theory,the advantages of distribution prediction can be fully used to solve the problem of reliability.When the mean values of the two predicted distributions are equal,their predicted standard deviations can be compared.A smaller standard deviation represents higher confidence,which helps the choice of the decision maker.Specifically,the mean value of the predicted distribution can be used as the objective function,and the standard deviation can also be restricted by an upper limit to ensure confidence.Obviously,such a single-objective optimization model can offer reliable solution to some extent of the actual process operation.This optimization model can be expressed as

Fig.1.Overall framework of the training and test process of the RVM model.

where v is the number of input variable features.The value of the kernel function parameterdand the constraint upper limit σuof the standard deviation of the predicted distribution need to be given,as well as the upper and lower limitsxk，maxandxk，minof each feature.The model constraint has three parts.The first part is the RVM model.Eqs.(9d)–(9g) describe the mapping relationship between the parameters m,Σ and β and the input data including the kernel function parameter and design matrix,which are the second and third steps in Fig.1.Eqs.(9b)and(9c)describe calculation of the mean and standard deviation of the predicted distribution after determining the model parameters.The second part is the upper and lower limit constraints on each feature of the input variable described by Eq.(9h).The third part is the upper limit constraint on the standard deviation of the predicted distribution described by Eq.(9i) to control the uncertainty to a certain extent and improve the reliability of the optimization results.

The main disadvantage of the single-objective optimization model is that it is difficult to choose the value of σu.When solving this optimization problem by a genetic algorithm,it is necessary to repeatedly generate new individuals to meet the standard deviation constraints in the process of selection,crossover and mutation.If σuis small,the evolutionary process will take a lot of time,and it is not easy to jump out of the local solution.If the decision maker has limited knowledge of the data set,it is difficult to choose a reasonable value for σu.Conversely,we hope to determine a solution where the mean value of the predicted distribution is as high as possible and the standard deviation is as low as possible under such operating conditions.However,in practice,these two variables cannot simultaneously reach their respective optimal values.Improvement of the performance of one variable often comes at the cost of the performance of the other variable,and a trade-off needs to be made.The essence of such a problem is a multiobjective optimization problem [31,32].By setting the standard deviation in the objective function instead of the constraints,the multiobjective optimization model can avoid choosing the value of σu,which solves the problem of the single-objective optimization model.Such a multiobjective optimization model can be expressed as

where the meanings of the variables in the multiobjective optimization model are the same as those in the single-objective optimization model.

Because multiple objectives usually do not simultaneously achieve optimality,the solution of a multiobjective optimization problem is a set of noninferior solutions,which is called the Pareto optimal solution.This means that the decision maker only needs to select the operating conditions in the Pareto optimal solution,because these operating conditions ensure that the standard deviation is the smallest under a certain yield distribution mean,or that the mean is the largest under a certain standard deviation.

3.Case Study

3.1.Data source and preprocessing

The data sets were directly obtained from an industrial MTO plant.First,those data sets with missing data and large fluctuations were excluded.This left 179 days of data,with one data record point per minute.To reduce overfitting,the average of 60 records per hour was taken as a sample point,giving 4296 sets of data.The training and test sets are divided with the ratio of 4:1.

Each data set consisted of ten operating variables and two main product mass fractions.The operating variables,including the temperature at the top of the reactor,pressure at the outlet of the reactor,methanol feed flow,methanol feed temperature,mass flow rate of the spent catalyst,mass flow rate of the regenerated catalyst and openings of the four catalyst circulation slide valves in the reactor,can be directly or indirectly adjusted.

Although ethylene and propylene were the two main products,the following modeling and optimization process takes ethylene as an example.All examples are solved in MATLAB R2018a on Dell PowerEdge T640 (Intel(R) Xeon(R) Gold 6126 CPU @ 2.60 GHz,256 GB RAM memory).

Because of the confidentiality of business data,the values of all of the operating conditions are standardized,which can also make each feature comparable to improve the accuracy and convergence speed of the model.The zero-mean normalization formula used in this paper is as follows:

where μ is the mean of the feature in the data set and σ is the standard deviation of the feature.

3.2.RVM-based prediction results

As shown in Fig.1,the kernel parameterdis the main parameter that affects the performance of the RVM model.Ifdis too small,the number of relevance vectors is very large.In extreme cases,it is even close to the size of the training set.In this case,the model overfits and has poor generalization ability on the test set.Conversely,ifdis too large,the model underfits and performs poorly on the training set,so it is important to choose the appropriate kernel function parameter.Training the model with different kernel function parameter,the relationship between the number of relevance vectors and root mean square error (RMSE) and the kernel function parameter are shown in Fig.2.The error in the test set first decreases and then increases with increasingd,which is consistent with structural risk minimization [33].Whend<1,the number of relevance vectors rapidly increases,indicating that the model is too complex,which may cause a clear difference between real and empirical risk,and poor generalization ability on the test set.

Fig.2.Relationships between the number of relevance vectors and RMSE,and the kernel function parameter.

The kernel function parameterd=1 was used to train the RVM model,and distribution prediction was performed on the training and test sets.The results are shown in Fig.3,where the data is sorted from small to large for easy visualization.

The advantage of the RVM model is shown in Fig.3.The output is not just a value,but a Gaussian probability distribution,which means that each point has different mean and standard deviation.Since the probability distribution of many points cannot be reflected in a graph,the representation method of Fig.3 is used.

To compare the fitting performance,we used a five-layer NN with three hidden layers to predict the yields of ethylene and propylene.The numbers of nodes in the hidden layers were 32,16,and 8,and the ReLU activation function was used[34].In addition,we also trained the SVM and Gaussian process regression(GPR) models for comparison.The RMSE,mean absolute percentage error (MAPE),andR2values of different data-driven modeling methods of the MTO process are given in Table 1,Table 2,and Table 3,respectively.The smaller the RMSE and MAPE and the closerR2to 1 means a better performing model.The RVM is significantly better than other methods in terms of accuracy.

Table 1RMSE values of different data-driven methods

Table 2MAPE values of different data-driven methods

Table 3R2 values of different data-driven methods

Table 4Single-objective optimization results of the MTO process.

Compared with the NN,the RVM has the following advantages.First,the RVM obtains a distribution prediction,not just a point estimate,which is more instructive for processes with uncertainty.Second,the NN structure is difficult to determine [35],including the number of hidden layers,numbers of hidden layer nodes and selection of the activation function,while the RVM has only one main model parameter,which is more convenient for training.NN based method is also poorly reproducible.The training results cannot be reproduced because of the random initialization parameters.However,the training results of the RVM model are unique,which is convenient for determining the model.In addition,as the number of NN layers increases,the complexity of directly representing the predicted output value by the input variables increases,which increases the difficulty of operation optimization.Finally,the RVM is highly expressible,which is convenient for optimization.

3.3.Optimal operation of an industrial MTO reactor based on the RVM prediction model

Optimization of the operating parameters based on the trained RVM model was performed to determine the optimal operating conditions.The operation optimization model based on the RVM is described in Section 2.2.First,considering the single-objective optimization model,still taking the maximum ethylene yield as an example,the mean value of the predicted distribution of the ethylene yield was selected as the objective function,and ten variables that can be directly or indirectly adjusted were selected as the optimized operation variables.The characteristics of this optimization problem are nonconvexity and strong nonlinearity.For such problems,conventional solutions,such as the sequential quadratic programming method[36]and penalty function method[37],can easily fall into a local optimal solution,and the results are greatly affected by the initial value.A genetic algorithm [38–40]with global search capability can be used to jump out of the local optimal solution through the randomness of its selection,crossover and mutation processes to search for the optimal solution in the entire feasible region.Specifically,in the selection step,individuals with high objective function values have a high probability of being selected when generating the next generation.The‘‘roulette”method [41]was used to select and cross individuals in pairs.In the crossover step,an operation variable is randomly selected,and this operation variable and the subsequent operation variables are exchanged in pairs.In the mutation step,a random operation variable of each individual is then mutated according to a certain probability,and a number is randomly generated between the maximum and minimum values of the operation variable to replace the original value.The aim of these three steps is to explore as many feasible solutions as possible in the evolution process to more effectively jump out of the local optimal solution.Each generation retains the best part of the individual after evolution.This elitist strategy ensures that the optimal value of the objective function will not decrease during the evolution process.Compared with other intelligent optimization algorithms,such as the simulated annealing algorithm[42,43]and particle swarm optimization algorithm [44],the genetic algorithm has the advantages of simple implementation and easy parameter determination.

Fig.3.Predicted distribution of the ethylene yield in the (a) training set and (b) test set.The black dots are the actual data and the red dots are the mean of the predicted distribution.The blue dots are positions that are one standard deviation larger than the mean(μ+σ)and the green dots are positions that are one standard deviation smaller than the mean (μ-σ).

The genetic algorithm is called multiple times to solve this single-objective optimization problem.When the number of iterations is sufficient,it will converge to the same result,as shown by the results given in Table 4.The operating condition data is standardized,so some of the values are negative.Analyzing the optimization results combined with the MTO reaction kinetics model,the optimized operating conditions have a larger opening of the circulating slide valve,which means that more catalyst directly returns to the reactor without regeneration,resulting in higher coke content in the catalyst.According to the steric selectivity effect,when the coke content of the catalyst is higher,the reaction is more inclined to produce small molecular products,such as ethylene,so the operation optimization results are consistent with the trend determined by the reaction kinetic model.The maximum value of the objective function is 51.30%.This yield is 2.31%higher than the maximum ethylene yield of 48.99% in the data set.The optimization effect is clear,and it has an important guiding role for production operations.

For the multiobjective optimization problem,the commonly used algorithm is NSGA-II,which has the advantage of using a fast nondominated sorting algorithm to reduce the algorithm complexity.NSGA-II introduces an elitist strategy in the evolution process,combining the parent population with its offspring population to sort and compete to produce the next generation,which is conducive to maintaining the excellent individuals in the parent.At the same time,the crowding distance is introduced as a comparison criterion between individuals with the same rank when sorting.This has the advantage that the population can be evenly expanded to the Pareto front,thereby ensuring the diversity of the population.The disadvantage of this algorithm is that it does not perform well when solving highdimensional problems of the objective function,such as five dimensions and above.This disadvantage of NSGA-II can be ignored.In this problem,there are only two objective functions:the mean and standard deviation.This multiobjective optimization problem was solved by NSGA-II.The population size was set to 100 and the evolution process was 10,000 generations.The results are shown in Fig.4.

Fig.5.Comparison of the operating condition results with single-objective and multiobjective optimization.Operating conditions:1-Temperature at the top of the reactor;2-Pressure at the outlet of the reactor;3-Methanol feed flow;4-Methanol feed temperature;5-Mass flow rate of the spent catalyst;6-Mass flow rate of the regenerated catalyst;7-Opening of the first catalyst circulation slide valve in the reactor; 8-Opening of the second catalyst circulation slide valve in the reactor; 9-Opening of the third catalyst circulation slide valve in the reactor; 10-Opening of the fourth catalyst circulation slide valve in the reactor.

The Pareto front is shown in Fig.4(a).It is smooth and continuous,and the individuals are evenly distributed to ensure the diversity of the population.The algorithm completed 10,000 evolutions in 3.05 s with fast convergence speed,showing that NSGA-II has good searching and solving ability for multiobjective optimization problems based on RVM.Three noninferior solutions on the Pareto front are shown in Fig.4(b).The minimum line has a low mean value,but its distribution is more concentrated,whereas the maximum line has a higher mean value but its confidence interval is wider.On the Pareto front,as the mean value of the ethylene yield distribution increases,its standard deviation also increases,which means that higher yield is accompanied by higher uncertainty,indicating that the decision maker need to make a trade-off between these two objectives.In terms of extreme points,the mean value of the ethylene yield distribution reaches 51.13%(purple line in Fig.4(b)),which is very close to the 51.30%obtained by the single-objective optimization model.Comparison of the operating conditions in these two cases is shown in Fig.5.The two lines almost coincide,which means that the two operating conditions are almost the same.In summary,NSGA-II can capture the extreme points of the RVM-based multiobjective optimization problem,making the solutions spread sufficiently wide.

4.Conclusions

Extensive contributions on the kinetic model and the reactor development for methanol-to-olefins can be found elsewhere.Due to the complex reaction mechanism and the lack of first principle model,optimal operation of an industrial MTO reactorregenerator system from the Process Systems Engineering perspective is rare.This paper developed an RVM-driven optimization framework,which utilized the plentiful industrial data sets,for promoting the optimal operation,i.e.improving the ethylene yield for the industrial MTO reactor-regenerator system.Since RVM provided the distribution prediction instead of point estimation,the established model could indeed provide guidance for actual production operations under uncertainty.

According to the industrial data,Bayesian-based RVM is used to correlate the product yield distribution and the operation conditions.Relationships between the kernel function parameter and the mean and standard deviation of the distribution predicted by the RVM model are investigated to improve the robustness of the model.The above correlation models are then set as the constraints of the single-objective and multi-objective optimization models for the optimal operation which is solved by NSGA-II.The Pareto front revealed the standard deviation of the ethylene yield distribution and the mean of the ethylene yield distribution.

The optimal operation conditions such as the temperature at the top of the reactor,mass flow rate of the regenerated catalyst,and methanol feed flow are simultaneously provided.In the future,the proposed data-driven model will be coupled with the beingdeveloped first-principle model to formulate a comprehensive optimization framework for industrial MTO plants.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The authors gratefully acknowledge the financial support for this work from National Natural Science Foundation of China(21978150,21706143).

Chinese Journal of Chemical Engineering2021年6期

Chinese Journal of Chemical Engineering的其它文章: Effects of coagulation-bath conditions on polyphenylsulfone ultrafiltration membranes; Functional monodisperse microspheres fabricated by solvothermal precipitation co-polymerization; Synthesized graphene oxide and fumed aerosil 380 dispersion stability and characterization with partially hydrolyzed polyacrylamide; Synthesis and characterization of caprolactone based polyurethane with degradable and antifouling performance; Removal of lead (Pb(II)) and zinc (Zn(II)) from aqueous solution using coal fly ash (CFA) as a dual-sites adsorbent; Catalytic performance improvement of volatile organic compounds oxidation over MnOx and GdMnO3 composite oxides from spent lithium-ion batteries:Effect of acid treatment