YANG Fuyunxiang,YANG Leping,ZHU Yanwei,and ZENG Xin
College of Aerospace Science and Engineering,National University of Defense Technology,Changsha 410073,China
Abstract: Current successes in artificial intelligence domain have revitalized interest in neural networks and demonstrated their potential in solving spacecraft trajectory optimization problems.This paper presents a data-free deep neural network(DNN) based trajectory optimization method for intercepting noncooperative maneuvering spacecraft,in a continuous low-thrust scenario.Firstly,the problem is formulated as a standard constrained optimization problem through differential game theory and minimax principle.Secondly,a new DNN is designed to integrate interception dynamic model into the network and involve it in the process of gradient descent,which makes the network endowed with the knowledge of physical constraints and reduces the learning burden of the network.Thus,a DNN based method is proposed,which completely eliminates the demand of training datasets and improves the generalization capacity.Finally,numerical results demonstrate the feasibility and efficiency of our proposed method.
Keywords:non-cooperative maneuvering spacecraft,neural network,differential game,trajectory optimization.
Recently,the trajectory optimization problem for intercepting non-cooperative maneuvering spacecraft has drawn increasing attention from scholars [1?3].The problem can be considered as a two-player zero-sum pursuit-evasion game,which is essentially a two-sided optimization problem with completely conflicting goals [4].Finding the solution of such a problem generally results in solving a high-dimensional two-point boundary value problem (TPBVP) [5].
The traditional methods for this problem usually utilize collocation,swarm optimization,evolutionary algorithm or their combination.Tuomas and Harri [6]divided the problem into two one-sided optimization problems using the direct collocation method.Due to the sensitivity to initial solution guess,it is difficult for this method to obtain global optimal solutions when the solutions are inappropriate initialized.Stupik et al.[7]proposed an approach through particle swarm optimization (PSO) to obtain open-loop optimal trajectories,but computation errors were not well handled in some specific scenarios.Li et al.[8]presented a dimension-reduction method,where differential evolution (DE) was utilized to provide good initial guess for Newton’s method.Horie and Conway [9]proposed a semi-direct method,where optimal control laws of two players were obtained in different methods.One is obtained by analyzing necessary conditions and the other is solved by nonlinear programming.This method is computationally extensive and it needs appropriate initialization.Pontani and Conway [4]improved this semi-direct method by adding a genetic algorithm preprocessor which was utilized to obtain the initial guess,solving a more sophisticated three-dimensional case,but the method of genetic algorithm parameters selection is not mentioned.Sun et al.[10]developed a semi-direct control parameterization (SDCP) method and a hybrid method combined SDCP method proposed with multiple shooting method to solve scenarios in low earth orbit.
Due to the deep reform brought to practical applications by deep neural networks (DNNs),a new kind of methods has emerged.Since the neural network has good approximation capacity [11],Wu et al.[12]proposed a method directly using networks learning optimal control law from datasets generated by traditional methods to derive the optimal interception trajectory.George [13]presented a method in reinforcement learning manner,combining networks and evolutionary algorithm to solve optimal trajectories for interception and rendezvous scenarios with three different thrust models.However,this type of method usually has the following three disadvantages:Firstly,large datasets are required for training and situations beyond the training area are difficult to handle,especially for supervised learning method.Secondly,data generated by orbital dynamics is subject to exact physical constraints and it is challenging to enforce these constraints on neural network models without sacrificing exact physics.Thirdly,learning burden of network is too heavy.When faced with trajectory optimization problem with complex physical constraints,networks not only need to search optimization,but also have to learn physical constraints.
Thanks to the latest achievement in network structure domain,the above difficulties can be effectively solved.It has been demonstrated that a large amount of information is captured by the structure of network instead of any learning ability [14]and this development is quickly applied in style transfer in fonts [15]and data upsampling in medical imaging [16].This idea is also proven to be effective in complex dynamic system beyond natural images domain.Long et al.[17]designed a network structure called PDE-Net.It is used to learn partial differential equations (PDEs) and predict future state of system.Zhu et al.[18]emploied a convolutional encoder-decoder neural network approach as well as a conditional flowbased generative model for the solution of PDEs,surrogate model construction,and uncertainty quantification tasks.Hoyer et al.[19]proposed a DNN-based method which greatly improves the optimization results for the analysis of 116 structural optimization cases.
In this study,a new DNN is designed to help the generation of optimal trajectories for intercepting non-cooperative maneuvering spacecraft,where the initial states of two spacecraft and interception trajectories serve as inputs and outputs,respectively.Orbital dynamics as well as other physical constraints are integrated into the network,expressed by a secondary structure composed of multiple layers of neurons.Then the predicted terminal condition is calculated based on trajectories generated by DNN,serving as a new set of outputs.The difference between predicted terminal condition and real terminal constraints instead of training datasets is utilized to guide the training.The proposed DNN-based method successfully deepens network understanding of physical constraints,reduces the learning burden of network,improves the generalization capacity of method,and completely eliminates the demand for training data.
The rest of the paper is organized as follows:Section 2 describes the derivation of the dynamics of intercepting non-cooperative maneuvering target as well as the process of formulating a standard optimization problem.Section 3 presents our proposed data-free DNN based trajectory optimization method,followed by Section 4,where simulations are conducted to evaluate the effectiveness of the proposed method by comparing with previous methods.Finally,conclusions are summarized in Section 5.
The relative states of two spacecraft are modeled using Hill-Clohessy-Wiltshire (HCW) equations and the problem is written as a TPBVP based on time-free differential game theory at first.Then,according to minimax principle,the relationships between co-states and control laws are derived to transform TPBVP into a standard constrained optimization problem.
Here,the interception scenario is set as two spacecraft orbiting near a circular reference orbit,as shown in Fig.1.Target denotes the target spacecraft and pursuer denotes the interception spacecraft.Since two spacecraft both can maneuver,it is convenient to describe the relative translational motion in the Hill orbit frame whose origin is referred to a virtual spacecraft orbiting on reference orbit with no maneuvers.As illustrated in Fig.1,the Cartesian coordinates ofx,y,andzare aligned with the directions of the orbital radial,orbital velocity vector and normal vector with respect to the orbital plane,respectively.
Fig.1 Illustration of Hill frame
The linearized HCW equations of relative motion are used.The dynamics of spacecraft can be given as follows:
where subscriptidenotes the identity tag;Trepresents the target;Prepresents the pursuer;is the acceleration vector;the spacecraft state is denoted byis average angular velocity of the reference orbit,μ is the geocentric gravitational constant and γrefis the radius of reference orbit,matrixAandBcan be expressed as
Define a low,continuous,and constant thrust in Fig.2,where α and β denote the thrust pointing angles,andFidenotes the constant thrust acceleration.
Fig.2 Illustration of acceleration vector
The acceleration can be expressed as
whereFT Generally,the dynamics of the two players in a zero-sum differential game [21,22]is described by wherexdenotes state vector,μandvcontrol sequences of the two players,respectively. Terminal constraint functionGis described by wheretf,indicating terminal time,is unknown. Find (u?(t),v?(t)) for objective functional,which combines terminal performance indexΦand integral performance indexL. It is equivalent to solving the following problem: To solve the constrained functional extremum problem(7),augmented functional (8) is introduced through Lagrange multiplier method. where κ andλare two sets of Lagrange multipliers. The constrained problem is transformed into an unconstrained functional extremum problem: Hamiltonian function is defined as According to minimax principle,the following equationsare obtained,faced with the condition thattfis unk nown: where (11) is also called the canonical equation. In this work,it can be determined thatv=uT,andu=uPby combining dynamics and differential game theory.The following state equation is defined by combining dynamics and differential game theory: It is considered that when the interception mission ends,both spacecraft will be in the same position.Therefore,the terminal constraints are set as Terminal timetf,also known as interception time can be considered as primary indicator of evaluation task [8].The target will delay the rendezvous time as long as possible,when the pursuer wants to complete the rendezvous as quickly as possible because of the characteristic of non-cooperative maneuvering target.Therefore,objective generalized functional is given as Substituting (15)?(17) into (10)?(14),a TPBVP is obtained: According to (2),(18),(20),and (21),the relationship between control vector and co-state is given As a convenience,(25) and (26) is written as Substitute (27) into (18)?(24),then Considering (28) only determined by four unknown parameterstf,κ1,κ2,κ3,a standard optimization problem with constraints is derived as In the above sections,the problem is transformed into an optimization problem with strong physical constraints.In order to make the neural network better handle physical constraints in the problem,a new DNN structure is designed to help generate optimal interception trajectories,where physics model is represented by secondary structure in network. Fig.3 gives an overview of the whole structure and data flow in forward propagation and gradient backward pass.The inputs are the initial states of two spacecraftx0.Both interception trajectoriesx(t)(t∈[t0,tf]) and predictive valuehserve as outputs,wherex(t)(t∈[t0,tf]) is the solution of the problem andhis used for network training. Fig.3 Proposed network structure There are three sections in this network,namely parameterization section,physical model section and constraints section.Parameterization sectionfθ(σ) is a secondary structure formed by several layers of neurons.θ and σ represent all trainable parameters in this secondary structure.The role of this section is to generatetf,κ1,κ2,κ3for physical model section,autonomously.Physical model section enforces dynamics on neural networks,playing the same role as canonical equation constraints.Given inputs,initial statex0,this section calculates statex(t) and co-state λ(t) with the assistance of generatedtf,κ1,κ2,κ3.While outputting trajectoryx(t)(t∈[t0,tf]),terminal value λ(tf) andx(tf) are transferred into constraints section.In constraints section,λ(tf) andx(tf)are substituted into (23) and (24) to compute predictive valueh,which is another output of our network,guiding the training process. Orbital dynamics and other physical constraints are integrated into the network,expressed by a secondary structure composed of multiple layers of neurons,so that this network is endowed with the knowledge of physical constraints.The task of the network is simplified.Since all trajectories generated by it perfectly accord with dynamics without training,network only needs to focus on searching the optimal trajectories,which also leads to the substantial reduction in training data. After the new DNNs are determined,the optimization method can be proposed.The whole process of method is summarized as the following six steps and an overview of data flow is illustrated in Fig.4. Fig.4 Schema of proposed approach Step 1Initialization.Threshold ε is determined and true valueis given,according to terminal constraints(23) and (24) All trainable parameter in our network,θ and σ,are initialized. Step 2Forward propagation.Inputx0is transferred into our network.Predictive valuehand outputx(t)(t∈[t0,tf])are obtained. Step 3Loss computation.Difference between predicted valuehand true valueis defined as loss function whereWis weight matrix. Step 4Judgment.Result Loss is compared with threshold ε.If Loss>ε ,go to Step 5.If Loss ≤ε,go to Step 6. Step 5Gradient backward pass.Gradient?(Loss)/?his calculated and transferred into network for modifying parameters θ and σ.After parameters update,network regenerates predictive valuehand outputsx(t)(t∈[t0,tf]).Go to Step 3. Step 6Output.The optimal trajectoryx?(t)(t∈[t0,tf])is obtained and outputted. In the proposed method,terminal constraints serve as labeled data for network training,so that all information required for training comes from interception trajectory optimization problem itself.By changing the training mechanism from using training datasets to using terminal constraints,this method completely eliminates the demand of training data,which helps improve generalization capacity of method.A stationary mapping is created between scenarios and optimal trajectories instead of building a time-varying mapping between states and control strategies,which reduces the training data demand and makes training process easier for network.Physical model is embedded in the network structure as part of the network so that the network is bound to the physical process,which also reduces the training difficulty.Since no training data is needed,it will not happen that the training is incomplete due to insufficient data. To verify the DNN in the proposed method endowed with the knowledge of dynamics,a simulation scenario is given in the absence of training data to compare the results of our method with the results of a previous traditional method.Besides,ten cases are utilized for comparing the generalization capacity of our proposed method with a previous DNN based method.The threshold and weight matrix in the method are Learning rate η is self-adaptive and its initial value is given: Reference orbit is geostationary orbit.The initial states are listed in Table 1. Table 1 Scenario parameter setting The thrust magnitudes of initial states can be given as whereg=9.78 m/s2denotes the magnitude of the gravitational acceleration at sea level. This interception scenario is solved independently in our proposed method and method in [8].Our method is based on network.After inputting the initial state,the network outputs the optimal trajectories directly after learning,while method in [8]is a traditional method,where the Newton’s iteration method is used to find the accurate costate vector solution after obtaining an initial guess searched by differential evolution algorithm and the optimal trajectories are generated by using accurate solution.In our method,network generates the optimal trajectories after 58 epochs of learning and the pursuer intercepts the target successfully in 5 913.15 s,while the interception time is 5 912.24 s by the method in [8].The comparison of the results is illustrated in Fig.5?Fig.9. Fig.5 Trajectories of two players in scenario Fig.6 Time histories of the relative distance between two spacecraft Fig.7 Time histories of the position elements Fig.8 Time histories of the velocity elements Fig.9 Time histories of the acceleration The result curves generated by two methods basically coincide,indicating that our method is effective.Integrating dynamics into network is a feasible method to enforce physical constraints on neural network models while maintaining accuracy.The idea of using terminal constraints as labeled data to guide network training works in the absence of training datasets. To show the efficiency and generalization capacity of the proposed method,it is further compared with the method proposed in [12].The method in [12]is a supervised learning method,establishing a network utilizing states and control strategies as inputs and outputs,respectively.Data generated by traditional methods is used to train the network.This method attempts to establish the time sequence correspondence between states and control strategies,which is a time-vary relationship,while our method is an unsupervised learning method,generating the entire trajectory based on initial state without training data and establishing a stationary relationship between scenarios and optimal trajectories.Here,the method in [12]is denoted by Method 1,while our proposed method is denoted by Method 2,for simplicity.Reference orbit is still set as geostationary orbit and thrust magnitudes keep the same as the previous section.Then,500 cases are selected to generate training datasetDfor Method 1.The selection range is shown in Table 2.After the networks in Method 1 are trained,another ten cases are further selected for comparison.In the ten cases,Case 1?Case 7 are inD,but Case 8?Case 10 are not inD. Table 2 Parameter selection range For different methods,the success rate is compared.Success number of times is denoted byns,and success rate is denoted byrs?.All the performance is summarized in Table 3. Table 3 Performance of different methods The performance of Method 1 is unsatisfactory.The poor performance handling cases in training area indicates that the network neither fully understands dynamic characteristics nor learns how to optimize trajectories well.On the one hand,it can be explained that 500 training cases are obviously not enough for network training,on the other hand,it can be considered that learning burden for dual tasks is excessive.The failure in cases beyond training area demonstrates that Method 1 lacks generalization capacity.Method 2 successfully solves all ten cases with no training data,which indicates that integrating the physical constraints into the network and using terminal constraints as labeled data improve the generalization capacity of method. Concentrating on trajectory optimization problem of intercepting non-cooperative maneuvering spacecraft,the paper presents a new designed DNN and a data-free method base on it.Some useful conclusions are drawn as follows: (i) Integrating dynamics into neural network structure is an efficient method to reduce the learning burden of networks; (ii)Resultsgenerated by our method successfully enforces orbital dynamics and other physical constraints on neural network models without sacrificing exact physics; (iii) The proposed DNN based trajectory optimization method completely eliminates the demand of training data,which helps improve generalization capacity of method. In addition,there are still some points worth further study: (i) The performance of the proposed method will be evaluated,when the problem possesses a more complex physical model; (ii) It is worthy analyzing how to extend the one-to-one scenarios to the many-to-many scenarios; (iii) How to modify the proposed method so that it can be utilized to solve scenarios that spacecraft process different thrust configurations.2.2 Two-player zero-sum time-free differential game
2.3 Trajectory optimization model
3.Algorithm
4.Numerical simulation
4.1 Feasibility verification
4.2 Performance comparison
5.Conclusions
Journal of Systems Engineering and Electronics2022年2期