亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

Optimal synchronization control for multi-agent systems with input saturation:a nonzero-sum game*

2022-07-26 02:19:02HongyangLIQinglaiWEI

Frontiers of Information Technology & Electronic Engineering 2022年7期

Hongyang LI，Qinglai WEI，3

1School of Artificial Intelligence，University of Chinese Academy of Sciences，Beijing 100049，China

2The State Key Laboratory for Management and Control of Complex Systems，Institute of Automation，Chinese Academy of Sciences，Beijing 100190，China

3Institute of Systems Engineering，Macau University of Science and Technology，Macau 999078，China

Abstract: This paper presents a novel optimal synchronization control method for multi-agent systems with input saturation.The multi-agent game theory is introduced to transform the optimal synchronization control problem into a multi-agent nonzero-sum game.Then，the Nash equilibrium can be achieved by solving the coupled Hamilton–Jacobi–Bellman (HJB) equations with nonquadratic input energy terms.A novel off-policy reinforcement learning method is presented to obtain the Nash equilibrium solution without the system models，and the critic neural networks (NNs)and actor NNs are introduced to implement the presented method.Theoretical analysis is provided，which shows that the iterative control laws converge to the Nash equilibrium.Simulation results show the good performance of the presented method.

Key words: Optimal synchronization control;Multi-agent systems;Nonzero-sum game;Adaptive dynamic programming;Input saturation;Off-policy reinforcement learning;Policy iteration

1 Introduction

Multi-agent synchronization control has attracted much attention due to its high efficiency and computational performance (Wieland et al.，2011;Wei et al.，2018，2020，2021;Li JQ et al.，2021;Rehák and Lynnyk，2021;Zhang KQ et al.，2021).Generally speaking，synchronization control problems require that the agents synthesize to the same value (Cao et al.，2015;Garcia et al.，2017;Yang JY et al.，2019) or track the trajectories of leaders(Du et al.，2014;Zhao et al.，2014)by designing distributed control laws.Because of the practical significance of multi-agent systems，many researchers have devoted themselves to tackling various synchronization control problems，including switching topologies (Thunberg et al.，2014)，system faults (Ma and Yang，2016) and so on (Han et al.，2013;Wei et al.，2015;He et al.，2018).Among the research works on synchronization control，optimal synchronization control，which requires each agent to minimize its own local performance index function，is a promising research direction.Multi-agent cooperative games provide an effective tool to study multi-agent optimal control problems，and they rely on solving coupled Hamilton–Jacobi(HJ)equations(Vamvoudakis et al.，2012).However，coupled HJ equations are hard to solve，which limits the applications of cooperative game theory in synchronization control problems.

Reinforcement learning is an effective method for solving coupled HJ equations.The main idea of reinforcement learning is to solve the coupled HJ equations forward by time，which can reduce the computational burden (Wang et al.，2009;Wei and Liu，2014;Wei et al.，2014，2016，2017;Zhang HG et al.，2015;Yang N et al.，2019;Zhang LD et al.，2019).In recent years，reinforcement learning has been further developed to solve multi-agent cooperative game problems.Vamvoudakis et al.(2012)proposed an online policy iteration method for optimal synchronization control problems;however，the external disturbance was not considered.In Jiao et al.(2016)，a novel policy iteration method was proposed for the multi-agent zero-sum game problem，and disturbance rejection was achieved.In Wei et al.(2015)，the graphical game was studied for heterogeneous multi-agent systems.The off-policy reinforcement learning method was proposed to solve multi-agent synchronization control problems by Li JN et al.(2017)，and the input constraint was considered by Qin et al.(2019).However，there are few research results considering cooperative game problems with input saturation，which motivates our study.

In this paper，the multi-agent optimal synchronization control problem with input saturation is studied based on cooperative game theory and reinforcement learning.Compared with Qin et al.(2019)，we consider coupled terms with neighboring agents in the performance index functions.The main contributions can be summarized as follows:

1.A novel off-policy reinforcement learning method is presented without the information of system models for cooperative game problems of multiagent systems.The control constraint and coupled terms in the performance index functions are considered which broaden the application scope of the presented method.

2.The characteristics of the presented modelfree off-policy reinforcement learning method，including convergence and optimality，are analyzed，showing that the solutions obtained from the presented method converge to the Nash equilibrium.

3.Critic neural networks (NNs) and actor NNs are used to implement the off-policy reinforcement learning algorithm.Simulation results verify the good performance of the presented method.

2 Problem formulation

2.1 Graph theory

LetGr=(V，ε，E) be a directed graph，whereV={v1，v2，...，vN}denotes the nonempty finite vertex set.Furthermore，ε ?V ×Vis the set of edges.An edge of graphGris denoted asεij，which means that agentjis a neighbor of agenti.E=[eij]∈RN×Nis the adjacency matrix，whereeijrepresents the weight of edgeεij.Ifεij ∈ε，eij ＞0;otherwise，eij=0.Let the set of neighbors of agentibeNi={vj|(vj，vi)∈ε}.DefineG=diag(gi)∈RN×Nas the pinning matrix.If agentihas access to the leader，gi ＞0;otherwise，gi=0.Define the Laplacian matrix asL=D-E，whereD=diag(di)and

2.2 Multi-agent synchronization control

Fori=1，2，...，N，consider the following systems:

wherexi ∈Rnandui ∈Ui ?Rmare the system state and control，respectively.Here，AandBare system matrices with suitable dimensions.Ui={ui|ui ∈Rm，‖ui‖∞≤λi}(λi ＞0 is a known constant).Let the leader dynamics be

wherex0∈Rnis the system state.Then，we can define the synchronization error as

Taking the derivative of Eq.(3)，we have

For system (4)，the performance index function can be given as

where the termu-irepresents the policies of the neighbors of agenti.Qii ＞0，

andΨ-1is the inverse function of the hyperbolic tangent function (i.e.，Ψ-1=arctanh(·)，or equivalently，Ψ(·) ?tanh(·)).Then，Ri(ui) andRi(uj)can be written as

For Eq.(5)，the Nash equilibrium condition can be described as

For agenti，we define the iterative value function as

Then，we can obtain the Bellman equation as

withVi(0)=0.According to the stationary condition(Bertsekas，2007)，it can be derived that

Substituting Eq.(11) into Eq.(10)，we can obtain the Hamilton–Jacobi–Bellman(HJB)equation as

withVi(0)=0，where

We would like to designui，such that the Nash equilibrium condition represented in inequality (8)and the following state synchronization condition are satisfied:

Remark 1The performance index function considered by Qin et al.(2019)is defined as

Comparing Eqs.(5)and(14)，it can be seen that the coupled terms with neighboring agents in the performance index function are considered in this study.In multi-agent systems，the behavior of agentimay have an impact on its neighboring agents.Therefore，the performance index function(5)is more natural for the optimal synchronization control of multiagent systems.

Remark 2Based on Eq.(3)，it can be derived that

“?”is the Kronecker product，andInis an identity matrix with dimensionn.According to Vamvoudakis et al.(2012)，we have

whereσmin(·)represents the minimum singular value of the matrix.Therefore，the stability of the tracking error dynamics represented in Eq.(4)guarantees the state synchronization condition denoted by Eq.(13).

3 Main results

3.1 Multi-agent nonzero-sum game

A theorem is provided，which shows that the solution to the HJB equation(i.e.，Eq.(12))satisfies the Nash equilibrium condition (i.e.，inequality (8))under certain conditions.

Theorem 1Assume that the optimal control lawis given as shown in Eq.(11)，and thatViis the positive definite smooth solution to the HJB equation(12).Then，system(4)is asymptotically stable.The optimal control laws(i=1，2，...，N)constitute the Nash equilibrium，and the solutionVito the HJB equation(12)is the optimal value of the game，i.e.，

ProofChoosing the iterative value functionVias the Lyapunov function，it can be derived that

According to Eq.(12)，we have

Because of the asymptotic stability of system(4)and the boundary conditionVi(0)=0，it can be derived thatVi(δi(∞))=0.Then，substituting Eq.(12)into Eq.(19)and completing the squares，we can obtain

For Eq.(21)，it can be derived that

whereΨ-1=arctanh(·)is monotonically increasing，i.e.，Ψ-1）′＞0.Therefore，based on the mean value theorem for integrals，it can be derived that

3.2 Policy iteration method for solving the HJB equation

In the previous subsection，it was derived that the optimal control，represented in Eq.(11)，can be calculated to construct the Nash equilibrium represented in inequality (8).However，the optimal control in Eq.(11) requires the information of，which can be calculated from the HJB equation(12).The HJB equation (12) is a nonlinear partial differential equation，which is hard to solve analytically.Therefore，a policy iteration method is provided(Algorithm 1) to solve the HJB equation (12) numerically.Then，a theorem can be provided，which shows the convergence of the presented policy iteration method.

Theorem 2Assume that agentiand its neighbors update their control policies according to Algorithm 1.Then，policiesconverge to the Nash equilibrium，andV kiconverges to，whereis the solution to the HJB equation(12).

ProofIntegratingalong the system

we have

Based on Eq.(24)，we have

Subtracting and adding Eqs.(24) and (28) to the right-hand side of Eq.(27)，we have

For Eq.(29)，we have

Therefore，we can rewrite Eq.(27)as

Based on Eq.(22) and inequality (23)，it can be derived that

However，system matrices are still required to solve Eqs.(24) and (25).A novel off-policy reinforcement learning method will be presented in the next subsection without the information of system matrices.

3.3 Model-free off-policy reinforcement learning method for solving the HJB equation

We can rewrite the tracking error dynamics，represented in Eq.(4)，as follows:

Taking the derivative ofalong system (34)，we have

According to Eq.(25)，we have

Then，substituting Eqs.(24)and (36) into Eq.(35)，it can be derived that

Therefore，it can be seen that the system matrices are not included in Eq.(37).Based on the Weierstrass high-order approximation theorem(Abu-Khalaf and Lewis，2005)，the critic and actor NNs can be introduced as

whereφi ∈Rhvandφuil1∈Rhul1(l1=1，2，...，m)are activation functions.are constant weights.Eq.(39) can be written in the following compact form:

Substituting Eqs.(38)and(39)into Eq.(37)，we can obtain Eq.(41)(on the top of the next page)，whererepresents the residual error，and=δi(t′).Then，Eq.(41) can be written in a simplified form as follows:

Based on the least square approach，it can be obtained that

A model-free off-policy reinforcement learning algorithm can be provided in Algorithm 2.

Lemma 1For system (4)，if the iterative value functions and iterative control laws are designed as Eqs.(38) and (39)，respectively，where the weights(l1=1，2，...，m) are updated as Algorithm 2，.

The proof process can be found in Li JN et al.(2017) and Qin et al.(2019)，and thus is omitted here.

Remark 3In Algorithm 2，the selection of the control lawsui(i=1，2，...，N)is the key to the convergence of the algorithm.Generally，the control laws are selected asui=-Kiδi+ξi(i=1，2，...，N)，whereξiis the exploration noise andKiis the stabilizing gain matrix.

Remark 4For traditional on-policy integral reinforcement learning methods(Vrabie and Lewis，2011;

Liu et al.，2021)，the performance index function is evaluated using the inaccurate data，which cause a biased estimation.The presented off-policy reinforcement learning method can avoid this problem and thus obtain results with higher accuracy.

4 Numerical analysis

In this section，a simulation example is provided to show the good performance of the presented method.The structure of the multi-agent systems is shown in Fig.1，with the following dynamics:

Fig.1 Structure of the multi-agent systems

The Laplacian matrix and the pinning matrix are given as

We define the weight matrices of the performance index function，represented by Eq.(5)，as follows:

The simulation is performed withx0(0)=[1 1]T，x1(0)=[0.5-0.5]T，x2(0)=[1-0.5]T，x3(0)=[2-1]T，λ1=2，λ2=1.5，λ3=3.First，we collect the system data{δi，ui}every 0.01 s fori=1，2，3.Then，we solve Eq.(43) iteratively based on the collected system data.The activation functionsφi(δi)andφui(δi)are chosen as

The simulation results are shown in Figs.2–6.The weights of critic and actor NNs are shown in Figs.2 and 3，respectively，which show the stability of Algorithm 2.The synchronization error curves are provided in Fig.4，and the three-dimensional curves are provided in Fig.5.From Figs.4 and 5，it can be derived that the optimal synchronization control is achieved.The control curves are shown in Fig.6，verifying that the control constraint is satisfied.

Fig.2 Weights of critic neural networks of the multiagent systems

Fig.3 Weights of actor neural networks of the multiagent systems

Fig.4 Synchronization errors of the multi-agent systems

Fig.5 Three-dimensional curves of the multi-agent systems

Fig.6 Control laws of the multi-agent systems

5 Conclusions

The nonzero-sum game problem of multi-agent systems with input saturation has been studied based on the model-free off-policy reinforcement learning method.It is shown that the presented off-policy reinforcement learning algorithm can make the iterative control laws converge to the Nash equilibrium without the information of system models.The simulation results showed the good performance of the presented method.

Contributors

Hongyang LI designed the method，conducted the simulation，and drafted the paper.Qinglai WEI revised and finalized the paper.

Compliance with ethics guidelines

Hongyang LI and Qinglai WEI declare that they have no conflict of interest.

Frontiers of Information Technology & Electronic Engineering2022年7期

Frontiers of Information Technology & Electronic Engineering的其它文章: Perspective:Prospects for multi-agent collaboration and gaming:challenge，technology，and application*; Institutionalized and systematized gaming for multi-agent systems; Efficient decoding self-attention for end-to-end speech synthesis*; Cellular automata based multi-bit stuck-at fault diagnosis for resistive memory; Enhanced solution to the surface–volume–surface EFIE for arbitrary metal–dielectric composite objects*; Review:Light field imaging for computer vision:a survey＊#