亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

Perspective:Existence and practice of gaming:thoughts on the development of multi-agent system gaming*

2022-07-26 02:19:00QiDONGZhenyuWUJunLUFengsongSUNJinyuWANGYanyuYANGXiaozhouSHANG

Frontiers of Information Technology & Electronic Engineering 2022年7期

Qi DONG，Zhenyu WU，2，Jun LU?，F(xiàn)engsong SUN，3，Jinyu WANG，3，Yanyu YANG，Xiaozhou SHANG

1China Academy of Electronics and Information Technology，Beijing 100049，China

2School of Information and Electronics，Beijing Institute of Technology，Beijing 100081，China

3School of Information and Communication Engineering，Beijing University of Posts and Telecommunications，Beijing 100876，China

Game is a universal being in the universe.Starting with human understanding of the game process，we discuss the existence and practice of gaming，expound challenges in multi-agent gaming，and put forward a theoretical framework for a multiagent evolutionary game based on the idea of evolution and system theory.Taking the next-generation early warning and detection system as an example，we introduce the applications of multi-agent evolutionary game.We construct a multi-agent selforganizing game decision-making model and develop a multi-agent method based on reinforcement learning，which are significant in studying organized and systematic game behaviors in a high-dimensional complex environment.

1 Introduction

Gaming is everywhere.From biological population to human society，from tribal conflict to superpower games，and from the exchange of goods to financial trade，all these scenarios are permeated with the idea of game.Game has become a universal being in the universe (Heidegger，2013).There has been an in-depth development of human understanding of game.The early research on game theory summarized mainly the experience of war，chess，and card activities(Kant，2020).Sun Tzu’s Art of Waris one of the earliest works on game theory.Von Neumann (1928) proved the mini-max theorem of the zero-sum game and established a mathematical research framework of game theory，marking the birth of classical game theory.In 1944，the emergence of the epoch-making masterpieceTheory of Games and Economic Behaviorlaid a theoretical foundation for applying classical game theory in economics，and represented a breakthrough in the development of basic assumptions and analysis paradigms of traditional economics (Von Neumann and Morgenstern，2007).The Nash equilibrium theory (Nash，1950)made game theory a widely used analysis tool.

After 1970，the ideas of incomplete information games and bounded rational decision-making were integrated，and the practical research and application scope of game theory was dramatically expanded.With the development of computer technology，using trial-and-error data to establish decisionmaking methods has become a new idea to solve gaming problems.For example，AlphaGo，based on intensive learning and training，defeated the world champion Lee Se-Dol in the game of Go(Silver et al.，2016).In recent years，game theory has been continuously improved and has gradually become an analytical framework for solving problems in many fields.For example，political struggle，military confrontation，economic analysis of market behavior(Abu Turab Rizvi，2007)，exploration of cooperative mechanisms in biological populations (Archetti and Pienta，2019)，and policy-making in social governance among significant powers (Wang et al.，2021)can be analyzed or conducted with the game theory.These games have complex institutionalized and systematized characteristics.

Institutionalized and systematized gaming research has expanded from individuals to groups.When groups reach a particular scale，they can often exhibit characteristics that are different from those of individuals (Cavagna et al.，2010;Alsheikh et al.，2015;Hayat et al.，2016).How to analyze and use these characteristics has been of great concern，and the concept of multi-agent system (MAS)is prompted，which is defined as a complex system composed of multiple agents that interact with environments (Shoham and Leyton-Brown，2008).MAS gaming (MASG) provides a theoretical framework for studying the above issues.

Although MASG has been significantly improved in the past decades，it still faces many challenges:

(1)The environment，in which the system is located，is complex and changeable.It is challenging to model the environment directly and predict the environment’s response to the agents’actions accurately.

(2) The system is heterogeneous，and heterogeneous individuals have different decision spaces;it is far from easy for complex systems to achieve coordinated control.

(3) Due to the limitations of distance，power，and other factors，the perception of agents is limited.So，the generated situation is incomplete and inconsistent.

(4) The computing power of a single agent in a system is limited.So，it is difficult to manage a large amount of data generated by the system in the gaming process and to generate the best decision in real time.

Evolutionary game theory (EGT) (Smith，1982)，inspired by Darwin’s theory of evolution，provides efficient approaches for complex problems in situations of incomplete information and bounded rationality.Evolutionary thought has introduced a new idea for solving the above issues.Multi-agent gaming (MAG) algorithms based on evolutionary thought have become a hotspot in gaming research(Nowak，2006;Hilbe et al.，2018;Omidshafiei et al.，2019;Gupta et al.，2021).New algorithms，such as multi-agent reinforcement learning (MARL) (Shao et al.，2019)，the ant colony optimization (ACO) algorithm，and the particle swarm optimization(PSO)algorithm (Liu ZA and Nishi，2022)，have achieved significant success and have been applied in many fields.For example，these algorithms study cooperation in society (Nowak，2006;Hilbe et al.，2018)，cooperative decision-making in multi-party games(Shao et al.，2019)，cooperative control schemes in intelligent transportation (Li et al.，2019)，and selforganizing game decision-making in distributed early warning and detection.

Although these algorithms have achieved significant results in many applications，the theoretical framework is unclear.They have only the correctness of formal logic but no truth of dialectical reasoning.They are like water without a source or a tree without a root，and cannot reveal the truth behind multiagent evolutionary gaming(MAEG)(Heidegger and M?rchen，1988).Therefore，using the ternary system theory，this paper attempts to put forward the theoretical research framework of MAEG and apply it to early warning and detection.By exploring the essence of MAEG in practice，we hope to trigger more scholars’ thinking and research and promote the development of this field.

2 Theoretical framework of multiagent evolutionary gaming

In this section，we propose a theoretical framework for MAEG，as shown in Fig.1.According to the ternary system theory，systemSis an organic whole composed of elementsE，relationsR，and lawsL(Xu，2000;Lu and Shan，2020).We assume that elements represent the collection of agentsNt(which denotes the agent set，including homogeneous and heterogeneous agents) and environmentEtin an MAS.The symboltindicates that agents and the environment continuously evolve with time.RelationsRin an MAS represent the collection of the interaction relationships between not only agents，but also the system and environment.LawsLin an MAS are goal-oriented and include the internal and external power.

Fig.1 Theoretical framework of multi-agent evolutionary gaming (MAEG)

Under the constraint and drive of evolutionary game lawsL，MAEG is directional in obtaining the overall incomeCtof MAS:Through MAEG，the system can evolve from disorder to order，and the steady-state value of incomeCkcan be obtained as follows:

3 Practical research on MAEG

Early warning and detection is significant in rescue and relief work，urban public security，and other situations.Nowadays，a high-dimensional and complex environment raises higher requirements in early warning and detection systems.In this section，we take early warning and detection as an example and focus on the theoretical exploration and practical application of MAEG.

The next-generation early warning and detection system will be distributed，unmanned，and intelligent(Liu M and Lu，2015).It is composed of multiple，cooperative，distributed nodes with independent sensing，decision-making，and action.The system achieves the cooperative sensing and recognition of the environment through the coordinated control and information fusion of nodes.Each node must adapt to the unknown and changing external environment in a distributed early warning and detection system.At the same time，the system must allocate cooperative detection resources among nodes，including perception，communication，and computing.Therefore，a distributed early warning and detection system is a typical MAEG system.Based on the MAEG theory，we study multi-agent collaborative detection methods，construct a multi-agent self-organizing game decision-making model，and establish“internal power”of system evolution using data-driven decision learning methods.In this way，a next-generation early warning and detection system is formed that can adapt to environmental changes，has highly reliable operations，and can create an accurate and effective unified situation，as shown in Fig.2.

Fig.2 Applications of MAEG in early warning and detection

3.1 Multi-agent self-organizing game decision-making model

We construct a self-organizing game decisionmaking model in a distributed early warning and detection system using the MAEG theoretical frame-work，as shown in Fig.3.Considering a system composed ofNagents，the self-organizing game decision-making model is formalized by the tuple(N，S，G，A，F(xiàn)，Π)，whereN={1，2，···，N}denotes the set of agents，Sis the system’s situation information，andis the current situation information of agents.Grepresents the time-varying communication topology of multiple agents in the system in the self-organizing network architecture.Aiis the product of finite action spaces of all agents，known as the joint action space，andAidenotes the action set of agenti.F={F1，F(xiàn)2，···，F(xiàn)N}is the set of agents’ fitness in the system，whereFirepresents the adaptability of agentito the task requirements.Π={Π1，Π2，···，ΠN}denotes the policy set of all agents andΠi={πi1，πi2，···，πik}indicates the optional policy set of agenti.The agents interact with the environment and update policies according to the following protocol:At time stept(t ∈N)，agentipredicts situation informationdrawn from the current situation informationthen work onFi(·)to give an evolution of the current policyπij(i=1，2，···，N，j=1，2，···，k).Considering the environmental informationoi，agentitakes actiondrawn from its joint policyπij，influenced by the environmental information，current and predicted situation information，and fitness functionFi(·).After multiple rounds of environmental interaction and sample accumulation，the agents perform policy updating and optimization.In the early warning and detection system，the internal power is composed of the current situation，the next situation，and agents’fitnessFi(·).At the same time，the environmental cognitive resultsoi’s constitute the external power of the system.The internal power and external power are aggregated to guide the agent iteration for policy evaluation and updating，drive the individuals to complete the selforganizing evolution of group actions based on the game strategy，and finally form the ability of the nodes to adapt to the environment and efficiently cooperate in the distributed early warning and detection system.

Fig.3 Self-organizing game decision-making model

3.2 MAEG method based on reinforcement learning

Based on the self-organizing game decisionmaking model，we further propose an MAEG method based on reinforcement learning，as shown in Fig.4，which is used to solve the problems of multi-agent cooperative decision-making and effective real-time communication in a complex dynamic environment in the next-generation early warning and detection system.Considering the distributed cooperative strategy learning mechanism of MARL，we first construct the temporal difference optimization objectiveLTD(θ)，based on the joint state value functionQtotand agent state value functionQias follows:

Fig.4 A multi-agent evolutionary game method based on reinforcement learning

whereris the reward function，γ ∈(0，1) is the discount factor，andθis the parameter vector of agents’ policyΠ.At time stept(t ∈N)，agentiexecutesεstrategy based onQiand obtains actionai，denoted as

which is used to improve the efficiency of multiagent self-organizing games.This optimization objective solves the collaborative planning，collaborative control，and real-time decision-making problems of large-scale agents.Furthermore，we establish the communication mechanism of the learning strategy based on information entropy.The optimization goal of minimum information entropyLG(θG) is constructed by establishing an intelligent communication model:

whereIθGrefers to the mutual information，mijis the message sent by agentito agentj，HθGis the information entropy，andθGencodes network parameters for messages.The subscript“G”refers to the topological network of multiple agents.LG(θG)is used to maximize the interactive information between messages and action selection to realize full expression of messages.In addition，aiming at the problem of unknown environmental information in complex dynamic environments，we use an incomplete information prediction learning mechanism driven by data and model.This allows us to establish a predictive evaluation model and construct a prediction learning objective driven by data and model.The prediction learning objectiveLω(θω) takes the following form:

whereωis the prediction parameter，DKLis the Kullback–Leibler(KL)divergence，qθω(·)is the environmental model prediction network，andp(·) refers to the dynamic environmental model.According to the mechanisms mentioned，the system can realize environment modeling and evolutionary learning with incomplete information，and infer agent topology network task roles.So，the agents can grasp as much comprehensive information as possible and realize more effective decision-making.After constant policy updating and optimization，the early warning and detection system can finally achieve selforganizing evolution and achieve a systematic capability that is different from that of a single node.

4 Conclusions and prospects

Taking the widespread existence of games in the universe as the starting point，in this paper we describe the process of human’s understanding of gaming，expound challenging problems of the multi-agent game process，put forward the theoretical MAEG research framework，and introduce MAEG’s practical applications using the next-generation early warning and detection system as an example.

MAEG theory is significant for studying institutionalized and systematized gaming in highdimensional complex environments.Existing research has made a preliminary exploration in this field.However，recent research focuses on evolving games with faster convergence，more stability，and better performance based on definable criteria to ensure formal logic correctness.The criteria，based on which concepts are formed and used for judgment and reasoning decisions，are difficult to describe for systematic and organizational games.Existing methods have difficulty in solving such problems，and the truth of dialectical logic needs to be studied to further explore game criteria.Human inquiry about thinking has never stopped，and noetic science is currently the key to tackling challenges.Although studying thinking is difficult，we feel that noetic science is the direction of such research and the ultimate goal of games.

Our view aims at starting thinking and discussing the multi-agent evolutionary game from different dimensions.

Contributors

Qi DONG and Jun LU designed the research.Zhenyu WU and Fengsong SUN drafted the paper.Yanyu YANG and Xiaozhou SHANG helped organize the paper.Jinyu WANG revised and finalized the paper.

Compliance with ethics guidelines

Qi DONG，Zhenyu WU，Jun LU，F(xiàn)engsong SUN，Jinyu WANG，Yanyu YANG，and Xiaozhou SHANG declare that they have no conflict of interest.

Frontiers of Information Technology & Electronic Engineering2022年7期

Frontiers of Information Technology & Electronic Engineering的其它文章: Perspective:Prospects for multi-agent collaboration and gaming:challenge，technology，and application*; Institutionalized and systematized gaming for multi-agent systems; Efficient decoding self-attention for end-to-end speech synthesis*; Cellular automata based multi-bit stuck-at fault diagnosis for resistive memory; Enhanced solution to the surface–volume–surface EFIE for arbitrary metal–dielectric composite objects*; Review:Light field imaging for computer vision:a survey＊#