Borui Zhao,Qimei Cui,*,Shengyuan Liang,Jinli Zhai,Yanzhao Hou,Xueqing Huang,Miao Pan,Xiaofeng Tao
1National Engineering Laboratory for Mobile Network Technologies,Beijing University of Posts and Telecommunications,Beijing,100876,China
2Department of Computer Science,New York Institute of Technology,Old Westbury,NY,11568,USA
3Department of Electrical and Computer Engineering,University of Houston,Houston,TX,77204,USA
*The corresponding author,email: cuiqimei@bupt.edu.cn
Abstract: As Information,Communications,and Data Technology (ICDT) are deeply integrated,the research of 6G gradually rises.Meanwhile,federated learning (FL) as a distributed artificial intelligence (AI) framework is generally believed to be the most promising solution to achieve“Native AI”in 6G.While the adoption of energy as a metric in AI and wireless networks is emerging,most studies still focused on obtaining high levels of accuracy,with little consideration on new features of future networks and their possible impact on energy consumption.To address this issue,this article focuses on green concerns in FL over 6G.We first analyze and summarize major energy consumption challenges caused by technical characteristics of FL and the dynamical heterogeneity of 6G networks,and model the energy consumption in FL over 6G from aspects of computation and communication.We classify and summarize the basic ways to reduce energy,and present several feasible green designs for FL-based 6G network architecture from three perspectives.According to the simulation results,we provide a useful guideline to researchers that different schemes should be used to achieve the minimum energy consumption at a reasonable cost of learning accuracy for different network scenarios and service requirements in FL-based 6G network.
Keywords: 6G; native AI; federated learning; radio access network;green communications
With the large-scale commercial use of 5G networks,the development of artificial intelligence (AI),cloud computing,and edge computing has accelerated the deep integration of Information Technology,Communications Technology and Data Technology (ICDT).At the same time,the prelude to 6G research has been slowly opened,and commercial 6G is expected to launch around 2030[1].6G will further deepen the integration of AI and wireless networks of the existing 5G.As a new type of integrated strategic infrastructure,6G will promote the digitization and intelligence of vertical industries,such as the industrial internet,energy internet,intelligent transportation,and smart medicine[2].
6G also aims to meet users’higher-level intelligent communications and social interaction needs by providing AI application services.With diverse services and scenarios,the intelligent decision can be made timely at different locations,such as the cloud,edge,base station,and user devices.As 6GANA (6G Alliance of Network AI) points out,6G will spark an era featuring pervasive and ubiquitous intelligence,by building an open and integrated new network architecture with fully integrated connectivity,computing capability,data,etc.from both 6G for AI and AI for 6G [3].In other words,6G will realize “Native AI”and will serve as a distributed neural network,rather than a simple addition to the existing network.AI capabilities will penetrate endogenously to the edges and ends of the network,so that ubiquitous terminal nodes and distributed autonomous network elements will have native AI capabilities.
The traditional centralized computing architecture requires terminal nodes to upload their respective data to the cloud,which brings huge communication overhead and interaction delay,and serious risk of data breaches as well.Therefore,distributed computing and intelligence are becoming more and more important to build an end-to-end complete distributed AI training and inference environment within 6G networks.Network AI has significant advantages over Cloud AI in terms of the communications and computing efficiency,data security and privacy,and learning performance.
Among them,federated learning (FL),as a distributed machine learning framework,has attracted widespread attention from academia and industry[4].It allows users to use local datasets for model training,so the data can be kept private with only model parameters being exchanged over networks in the process of training.The FL system analyzes and learns the respective data of multiple users and obtains a model that is better than the individual training of each federated user.The FL may be the most suitable distributed architecture for 6G networks at present,for the following three reasons[5]:
(1)In 6G wireless access network,communications and sensing will be integrated at network devices and nodes,e.g.,cellular base stations,WiFi access points(APs),and mobile devices.Multidimensional network data and user data,which perceive in scenarios such as the Internet of Things (IoT) and the Internet of Vehicles(IoV),are usually generated in a distributed form and stored in different devices and nodes.The FL can support distributed data sharing and model training between devices with heterogenous datasets,where the statistical characteristics of the datasets collected by different devices can vary significantly.
(2) With the popularity of smart devices,each terminal node in 6G networks will have certain computing capabilities.To utilize the large volume of data and computing resources distributed at the edge of the network,6G networks will gradually evolve from ultra-dense deployment of base stations to massive deployment of edge computing devices and nodes.The FL can support massive-scale nodes deployment and guarantee the convergence even if the number of devices participating in model training is far greater than the number of dataset samples available at each device.
(3) With the rapid advancement of research in big data,it has become a worldwide consensus and trend to investigate the increasingly important data security and privacy.For example,the Cybersecurity Law of the People’s Republic of China has been implemented in 2017,the General Data Protection Regulation (GDPR) has been approved by the European Union in 2018,and the Civil Code of the People’s Republic of China has been implemented in 2020.They all aim to protect user privacy and data security.In the FL training process,the data itself does not leave the user’s local area and only the model parameter updates are shared,which can meet the requirements of data privacy,security,and supervision.
Therefore,the combination of FL and 6G networks can 1) make full use of both the multidimensional data held by user terminals and nodes and their respective computing resources,and 2) perform more extensive machine learning on the premise of helping users train high-performance AI models while protecting user data privacy.Both industry and academia generally believe that FL-based network architecture will be the most promising solution for 6G.
Nevertheless,different from the existing 5G and previous mobile networks,some new features of 6G networks will bring new challenges when deploying FL in 6G:
(1) With the massive deployment of distributed edge computing and smart terminal equipment in 6G networks,diversified network nodes will be highly autonomous and have differentiated communication,caching,computing capabilities.This will bring new challenges of efficient collaboration and high robustness when introducing FL in 6G.
(2)The intelligent service required by each node in 6G will be more differentiated and personalized,including not only intelligent application services at the user terminal level,but also AI applications such as intelligent wireless resource management and network parameter configuration optimization at the network level.Therefore,deploying FL in 6G will also face the problem of generalization and adaptability.
(3)Various types of network nodes in 6G will need to perform highly complex computing tasks and transmit high-dimensional data,which are very energyintensive [6].Different from 5G and previous networks,the green communication in 6G should consider more about green AI and a holistic approach for green communication and computing [7].Thus,we believe that efforts towards analysing and addressing new energy consumption issues in FL over 6G are necessary for a more sustainable future.
This article focuses exactly on green concerns in FL over 6G network.The main contributions of this article are listed as follows:
(1) We present,to the best of our knowledge,the first comprehensive analysis and summary of major energy consumption challenges caused by technical characteristics of FL and the dynamical heterogeneity of 6G networks.We reveal insights into the importance of paying enough attention to these challenges in the green design of intelligent network architecture in the future.
(2) We carry out the modeling and analysis of energy consumption in FL over 6G from aspects of computation and communications.Different from the energy efficiency(EE)considered in most research work,what we consider is the overall energy consumption of the FL process,which is more meaningful for achieving energy sustainability and environmental friendliness.
(3) According to the modeling and analysis above,we classify and summarize the basic ways to reduce the overall energy consumption,and present several feasible green designs for FL-based 6G network architecture from three perspectives,namely,multi-layer FL nodes deployment,device heterogeneity processing,and local FL model processing.
(4)We verify the feasibility and effectiveness of the green designs based on simulation results.Furthermore,through the in-depth analysis of these results,we provide a useful guideline to researchers that different schemes or their combinations should be used to achieve the minimum energy consumption at a reasonable cost of learning accuracy for different network scenarios and service requirements in FL-based 6G network.
The rest of the article is organized as follows.In Section II,energy consumption challenges and problems brought by FL over 6G are proposed.In Section III,energy consumption model in FL over 6G is discussed,including the basic FL model,computation energy consumption model,and communication energy consumption model.In Section IV,several feasible green designs for FL-based 6G network architecture are proposed and analyzed from three perspectives respectively.In Section V,simulations and discussions are carried out.In Section VI,conclusions are provided,followed by several open issues.
In this section,we introduce the energy consumption problems in wireless communications and several new challenges brought by the introduction of FL in 6G.
With the rapid development and continuous expansion of the information and communications industry,the mobile communications system is challenged by the greatly increasing energy consumption [8].Statistics from the International Energy Agency show that the total energy consumed by the communications industry currently accounts for 2%-4% of the total global energy consumption [9].Meanwhile,wireless access networks account for about 70% of the total energy consumed by the entire mobile communications network.The power consumption of a single 5G base station with a massive antenna and wider spectrum is 3 or 4 times higher than that of 4G[10].Meanwhile,the higher spectrum frequency and smaller coverage of the 5G base station result in ultra-dense deployment of base stations for ubiquitous coverage.Since the energy consumption of base stations dominates that of the wireless access networks,the dense deployment of more energy-demanding base stations will keep pumping up the energy consumption of wireless access networks[11].
The system capacity and peak data rate of 6G are expected to exceed that of 5G,so its energy-related challenge may become one of the main bottlenecks for the sustainable development of the communications industry [12].Statistics from world-renowned operators show that with the exponential growth of 5G traffic,major operators have to pay high energy bills every year,and electricity bills may exceed the profits of the wireless communications industry[13].
On the other hand,China has already committed to peak carbon dioxide emissions before 2030 and achieve carbon neutrality before 2060 [14].The current situation,that the energy consumption and exhaust emissions caused by the operation of mobile communication networks are becoming more and more serious,does not meet the appeal and requirements of building an energy-saving society.Therefore,solving energy consumption problems in wireless communications can not only reduce network costs,but also reflect a sense of social responsibility.It is a response to the sustainable development of human beings and the construction of a low-carbon and environmentally friendly society.Green communications will fully guarantee the realization of economic benefits,social benefits,and ecological benefits.
Green communications,as the name suggests,is to minimize the energy consumption of wireless networks through the combination of traditional wireless communications technology and the concept of green,low-carbon,and environmental protection.The most traditional understanding of green communications is high energy efficiency(EE),which is defined as the ratio of the total amount of data transmitted to the total energy consumed.But it will be inappropriate and insufficient in 6G,because increasing EE will not necessarily lead to a reduction in total energy consumption when the amount of data to be transmitted grows faster than EE [7].Green communication in 6G should be manifested in reducing the total energy consumption in order to achieve energy sustainability and environmental friendliness.
After the introduction of FL in 6G networks,new energy consumption challenges emerged because of the technical characteristics of FL,and the dynamics and heterogeneity of 6G network,and the existing traditional solutions for energy saving cannot be completely applied.The major energy consumption challenges include:
(1) Transmission of FL model parameters on the wireless communication link will bring high communication load and communication energy consumption.When FL is deployed in the 6G wireless access network,the communication load refers to the model parameters that FL needs to upload after AI training on local devices.The size of the model parameters is related to the user’s intelligent application,generally ranging from tens of megabytes to hundreds of megabytes.The size of the model parameters may be even larger with the development of deep learning models and the increasing demand for AI algorithm accuracy by intelligent applications.When conditions such as wireless bandwidth and channel state are fixed,a higher communication load will bring a higher communication energy consumption.
(2) Highly dynamic and heterogeneous wireless nodes will bring differential communication energy consumption in FL-based 6G architecture.There are a large number of terminal devices (such as mobile phones,computers,smart cars,drones) and various types of server equipment(such as base stations,edge nodes) in 6G RAN with highly dynamic and heterogeneous communication capabilities and locations.Different terminal devices may need different transmission power to communicate with server equipment,and the server generally need a larger transmission power for broadcasting in order to meet the requirements of wireless coverage.Therefore,the dynamic difference in communication characteristics of 6G wireless nodes will bring a huge difference in communication energy consumption of the FL-based 6G networks.
(3) Highly complex computation tasks in FL will bring high local computation energy consumption on distributed devices.After the introduction of FL in 6G networks,various equipment such as base stations,edge nodes,and terminal devices in RAN will perform highly complex computation tasks,such as user behavior prediction,personalized service recommendation system,image classification,next-word prediction,gesture/voice recognition and anomaly detection.This will bring relatively high local computation energy consumption,which is different from uploading all the complex computation tasks to the server for processing in the traditional centralized learning.
(4) The heterogeneity of computing capabilities of servers and devices in 6G networks will bring differential computing energy consumption in FL-based 6G architecture.The performance of FL is closely related to the computing capability of the equipment in 6G,however,computing capabilities are highly heterogeneous because of the large number and variety of equipment [15].For instance,different equipment such as mobile phones,computers,smart cars,drones,and IoT terminals may have CPUs (Central Processing Units),GPUs(Graphic Processing Units),or even TPUs (Tensor Processing Units) and NPUs (Neural Network Processing Units) with different computing capabilities[16].This leads to large differences in the computation energy consumption in FL.At the same time,the computation energy consumption of some wireless device nodes is also limited by battery energy.
(5)The performance of FL in 6G networks is closely related to the availability and reliability of wireless network connections.Some mobile devices (such as mobile phones and drones) may shut down or stop communication due to energy constraints.And devices may have a high bit error rate or frequent disconnection due to the deterioration of the wireless channel conditions between the device and the server.What’s more,devices may also join or withdraw from FL due to changes in the needs of smart applications.Therefore,F(xiàn)L in 6G network cannot guarantee the participation of all devices,and the devices participating in each round may be different.This affects the learning performance of FL to a certain extent,increases the number of iterations to meet the performance requirements of smart applications,and thus affects the overall energy consumption of the system.
In this section,we introduce the energy consumption model according to the above analysis of energy consumption challenges brought by FL over 6G.It includes basic FL model in 6G,computation energy consumption model and communication energy consumption model.
In this work,we first consider the basic FL model in 6G cellular network consisting of one server deployed in a base station (BS) and a setM={1,2,...,m,...,M}of mobile devices,as shown in Figure 1.In our model,the server and all devices cooperatively train a shared machine learning (ML)model over wireless networks for one intelligence application requirement.This shared model is called the global FL model,and represented by the parameter setω.The ML model trained by each device according to its local dataset is called the local FL model.
Figure 1. Basic federated learning model in 6G.
For devicem,letDmdenote its local dataset and|Dm|be the size of each dataset.We usef(ω;xsm,ysm)to denote the loss function for each data samples ∈Dm,which represents the prediction error of the modelωon the training samplexsmwith regard to its labelysm.Then,the local loss function can be defined as
Accordingly,the global loss function forMdevices on all the distributed datasets is given by
The objective of FL process is to find desirable model parameter setωthat minimize the global loss functionF(ω),i.e.,ω*=arg minωF(ω).
The detailed procedure is described as follows.In each global FL round,the server broadcasts the current global FL modelω(i)and selection indicators{λm}to all devices,wherei ∈ {1,...,NGlobal}denotes the global iteration round and the indicatorλm= 1or0 represents whether devicemis scheduled by the server.SupposeKidevices are scheduled for participation in thei-th round.Then based on the received modelω(i),each scheduled devicektrains local FL model with its own dataset.The change of the model on each devicekis represented byafterNLocal,klocal iteration rounds by using learning algorithm such as gradient descent,stochastic gradient descent,or stochastic gradient descent with momentum.After completing the local computation,are transmitted to the server.Then the server will aggregate these model gradients by
and update the model parameters by
whereηdenotes the learning step size.
AfterNGlobalglobal iterations,the global model parametersω(NGlobal)at the server are set as the desirable solution for FL,i.e.,ω* ←ω(NGlobal).
The time duration of each FL global round includes downlink broadcasting time,local computing time,and uplink uploading time.Since the aggregation operation is negligible for the server,the time duration and the energy consumption of server aggregation are ignored here.
The next-generation heterogeneous SoC (System-ona-Chip) in 6G networks will be able to support multiple workloads such as communication,signal processing,learning,and inference [17],but most of the related work in FL assume single-processor devices(such as single CPU or single GPU) [18,19].In our model,the CPU and GPU of the device cooperate to complete a single task,in order to make full use of the computing resources of the two processors [20].In addition,the heterogeneity of the computing capabilities of the device is also reflected in the difference in the CPU and GPU capabilities.For instance,mobile phones and computers may have CPUs and GPUs with similar computing capabilities or more powerful GPUs,while CPUs in smart cars and drones may be more powerful than GPUs,and some IoT terminals may only have CPUs but no GPU.
(1)Computation workload model
We consider the number of floating-point operations(FLOPs) needed for each data sample is constant in each iteration,denoted asnFLOPs[in FLOPs/sample].Then the total workload for a computation task at devicekis given asWk=nFLOPs×|Dk|[in FLOPs].
The local datasetDkat devicekwill be partitioned into two sub-datasets,and,for workload partitioning using data parallel processing scheme.As a result,the partitioned workloads for CPU and GPU at devic ekcan be written asWkCPU=whileWkCPU+WkGPU=Wk.
(2)CPU and GPU energy consumption model
In our model,the dynamic voltage and frequency scaling (DVFS) technique is applied to adaptively adjust the CPU/GPU frequency to match the computation demand [21].The power consumption of CPU/GPU is proportional to the square of CPU/GPU chip’s voltage and the operating clock frequencyfCPUandfGPU[in cycle/s],while the voltage is approximately linear with respect to the clock frequency[22].Therefore,the power consumption at devicekfor local computation by CPU/GPU can be respectively specified asandPkGPU=,where?kCPUand?kGPUare constant coefficients which depends on the CPU/GPU chip architecture and they characterize the computation efficiency of the CPU/GPU.We useCkCPUandCkGPU[in FLOPs/cycle]to denote the number of FLOPs within a CPU/GPU cycle at each devicek.Then the local computing time duration for workloadWkCPUandWkGPUis given byandrespectively.
Therefore,the total CPU computation energy consumption of devicekper local iteration is
So similarly,the total GPU computation energy consumption of devicekper local iteration is
According to the procedure of FL in 6G described in Subsection 3.1,communication energy consumption is mainly composed of the uplink energy consumption of devices and the downlink energy consumption of the server.Since the energy required to receive data is much less than the energy required to transmit data in any wireless equipment,we only consider the transmission energy consumption of the server and devices in FL system,and the receiving energy consumption is negligible.
(1)Uplink energy consumption model
We consider that local model gradients are transmitted to the server through allocated sub-channels by deploying Orthogonal Frequency Division Multiple Access (OFDMA) scheme to avoid severe interference between devices.Bkis defined as the bandwidth allocation for devicekwith a bandwidth constraint,whereBis the total uplink bandwidth.
Furthermore,letPkdenote the transmit power at devicek,and lethk=denote the channel gain between devicekand the server,whereokis the Rayleigh fading parameter used to represent smallscale fading,anddkis the distance between devicekand the server,withβbeing the path loss exponent.Then,the achievable rate of this device is given by
whereN0is the noise power spectral density andInis the interference caused by wireless equipment located in other cells using the same channel.
Since the uploaded content is the model gradientgkof each device after local training,the time duration of uplink can be written as,where|gk|is the data size ofgk[in bit].Therefore,the uplink energy consumption of each device is given by
(2)Downlink energy consumption model
We notice that most of the related work do not consider the downlink energy consumption,because they believe that the server has continuous power supply,which will not affect their optimization problems[23,24].In fact,when a FL server (such as a BS)broadcasts model parameters to all devices,its transmit power is usually much greater than the upload power of the device in order to ensure the coverage of the signal.In addition,the size of the model parameters broadcast by the server is as large as the uploaded one.Therefore,the transmission time and energy consumption of downlink broadcasting should not be ignored,which is an important part of communication energy consumption[25].
LetPDLdenote the transmit power of the server,hdenote the broadcast channel gain,andIDdenote the interference caused by other servers not participating in the FL algorithm.We assume the bandwidth that the server used to broadcast the global FL model after aggregation is equal to the total uplink bandwidthB,then the data rate is given by
According to the procedure of FL,the broadcast content includes the global FL model parametersωand the selection indicator{λm}.The data size ofωis similar to that of each devicek’s local FL model gradientgk,and the data size of{λm}can be ignored.Hence,the time duration of uplink can be specified as.Therefore,the downlink energy consumption of the server is given by
Therefore,the overall energy consumption of the basic FL model in 6G can be expressed by
wherejdenotes the local training rounds andidenotes the global FL rounds.
The 6G network needs to densely deploy heterogeneous access nodes and computing nodes,in order to meet the exponential growth of data rate and demands of dynamic,differential,and customized services.Therefore,it is challenging for designing a green FL-based 6G network architecture,which can not only effectively coordinate the data and computing capabilities of base stations and devices adapting to the procedure of FL,but also support the high dynamics and heterogeneity of 6G networks.
In this section,we propose that green FL-based 6G network architecture can be achieved through three perspectives: multi-layer FL nodes deployment over 6G,device heterogeneity processing in FL over 6G,and local FL model processing over 6G.We provide and analyze several feasible schemes respectively,including hierarchical heterogeneous FL,clustered federated learning combined with D2D communications,two-way access selection of heterogeneous devices in FL,bandwidth compensation of heterogeneous devices in FL,model quantization and sparsification in FL,and model training tricks and strategies in FL.We classify and summarize the basic ways to reduce the energy consumption of each scheme into Table 1.
Table 1. Green designs for FL-based 6G network architecture.
The 6G network has a large number of communication nodes with different functions.Different deployment methods of these nodes will have an impact on network performance.In this subsection,we will explain two multi-layer FL nodes deployment schemes over 6G,which can effectively reduce the energy consumption of the wireless access network,as shown in Figure 2.
Figure 2. Two multi-layer FL nodes deployment schemes over 6G:hierarchical heterogeneous FL over 6G,and clustered FL combined with D2D communications over 6G.
(1)Hierarchical heterogeneous FL over 6G
With the development of mobile communication networks,in order to meet the high-speed requirements of advanced-eMBB scenarios,6G needs more continuous large-bandwidth spectrum resources on one hand,such as the development of millimeter wave communications,terahertz communications,and the use of unlicensed spectrum for communication.And it is necessary to further improve the spatial reuse rate of the spectrum on the other hand,e.g.flexibly deploying the small cell(Smallcell-xNB)within the coverage of the macro cell (Macrocell-xNB) to form a hierarchical heterogeneous network with overlapping coverage.Among them,xNB includes the eNB of 4G network,gNB of 5G network,and the new type of base station with integrated communication and perception that may appear in 6G network.
When the smart devices distributed in each small cell participate in FL,it is hoped that all devices can be connected in order to obtain more extensive data sharing and achieve better learning performance.However,the traditional FL is based on the device-server interaction in star topology structure,so it requires all devices to communicate with the central MacrocellxNB in 6G network.In order to ensure the coverage,the transmit power of Macrocell-xNB is much higher than other wireless access nodes.Therefore,if all devices in the macro cell communicate with it frequently for global aggregation,higher communication energy consumption will be generated.
Therefore,it has become a feasible scheme to usethe auxiliary access node(AAN)closer to the device to perform hierarchical FL.There are some recent works on hierarchical FL.They mainly focused on specific three-layer client-edge-cloud network structures,either only studied the delay without considering the overall energy consumption [26],or did not consider the heterogeneity of 6G networks[27,28].The AANs in 6G networks can be Smallcell-xNB,Wi-Fi APs,etc.Among them,Smallcell-xNB has also derived many types: Microcell Base Station,Picocell Base Station,F(xiàn)emtocell Base Station,etc.By combining FL and the multi-layer network topology of 6G network,model aggregation can be carried out in cloud core network,Macrocell-xNB,or multiple Smallcell-xNB,and then form multi-layer heterogeneous FL.
For example,the Smallcell-xNB is used as the firstlayer FL parameter server.After several rounds of local FL model training,mobile devices in the cell can upload their respective model gradient values to the Smallcell-xNB.Smallcell-xNB performs model aggregation and updates the first-layer global model according to the received model gradient,and sends it back to its associated devices.So that devices can continue local FL model training.After several rounds of the first-layer FL,each Smallcell-xNB uploads its current first-layer global FL model to the MacrocellxNB,where the second-layer global FL model aggregation is performed.Then,the Macrocell-xNB broadcasts the second-layer global FL model parameters to all devices in the macro cell.In order to aggregate model parameters trained by more devices,after several rounds of the second-layer FL,a higher-layer global FL model aggregation can also be considered with the central server of the cloud core network.
The basic reason of reducing energy consumption is that low-layer model aggregations at the edge of the network are much closer to the device with data,so they have lower communication costs.And the convergence of FL will also be quite fast,while the highlayer global FL model aggregation can achieve more extensive data sharing.Therefore,the hierarchical heterogeneous FL method can reduce the frequency of long-distance communication in high-layer global aggregation,thereby reducing the overall energy consumption of the system.
(2)Clustered FL combined with D2D communications over 6G
Devices in hotspots in 6G networks tend to be densely distributed,especially large-scale IoT terminal devices in industrial and other scenarios.When lots of smart devices participate in FL due to smart application requirements,a large amount of communication energy consumption will be generated,especially when these devices are far away from the server.The rapid development of D2D communication technology in the 5G era can not only appropriately alleviate the problem of the lack of spectrum resources in wireless communication systems,but also enable communications between large-scale devices in a low-power and low-cost manner.Some related works have given the exploration of clustered FL under the traditional network architecture [29-31].According to the above analysis,clustered FL combined with D2D communications will be a feasible scheme in green designs for FL-based 6G network architecture.
For example,large-scale devices that want to participate in FL over 6G network are first divided into several clusters according to different clustering rules(such as actual location,possible transmission range,mutual correlation),and one of the devices is selected as the cluster-head.The clustering scheme can either be fixed until the end of FL,or it can be dynamically changed in each FL round.After all devices perform multiple rounds of local FL model training according to the global model in the previous round,they can upload their respective model gradient values to the cluster-head of the corresponding cluster through D2D communications.The cluster-head collects and averages the model gradients of all devices in the cluster preliminarily,and then uploads the average to the associated base station.The base station performs model aggregation according to these model gradient averages from all the cluster-heads,and updates the global FL model.Then the base station starts the next global round from broadcasting the updated global FL model to each of its associated devices.
It can be seen that unlike traditional FL,which needs to upload model gradients from all devices to the base station for model aggregation,clustered FL combined with D2D communications only needs to upload one model gradient per cluster.So this method can save uplink communication resources and reduce the frequency of long-distance communication between the device and the base station.For the devices in the cluster,they only need to transmit their model gradient to the cluster-head instead of the base station far away.In general,the energy consumption of short-distance D2D communications between devices is much less than that between the device and the base station.Therefore,clustered FL combined with D2D communications can effectively reduce the energy consumption of the FL system.
The 6G network has a large number and variety of equipment,usually with a high degree of heterogeneity.It is challenging for solving energy consumption problems caused by equipment heterogeneity.In this subsection,we will explain two basic device heterogeneity processing in FL over 6G,which can effectively reduce the energy consumption of the wireless access network.
(1)Two-way access selection of heterogeneous devices in FL over 6G
In 6G networks,the dataset sizes,computing capabilities and channel conditions of each device may change dynamically,and it brings a high degree of heterogeneity between devices.The traditional FL process may encounter problems such as long time duration,difficulty in convergence,and poor performance,resulting in higher energy consumption,due to the lack of consideration of the heterogeneity.Considering that the improvement of the final performance of the system by each device in each round is actually very limited,the access of some devices with weak computing and communication capabilities can be selectively reduced,in order to effectively reduce energy consumption[32,33].Therefore,the two-way access selection scheme can be used to avoid the energy consumption problem caused by frequent access of“weak”devices.
The heterogeneity of devices can ultimately be reflected in the differences in their respective time duration.For example,the computing capability of the device affects the required computing time,and the channel condition affects the uplink communication time,etc.Therefore,we assume that a device is marked as“weak”when the total time it required far exceeds the average level of other devices or reaches a certain threshold.
For example,the server dynamically predicts the time required for each device to complete the local computing and communication in the next global FL round,based on the information of each device obtained in the previous round.The information includes the dataset sizes related to the current training task,computing capabilities (such as how much CPU and GPU resources can be used for FL),wireless channel conditions,etc.Then the server selects as many devices as possible with similar time duration,while excluding“weak”devices with weak computing and communication capabilities,and broadcasts selection indicators{λm}to notify them whether they are scheduled for participation.After that,the selected devices continue to perform local computing and model gradient uploading according to the regular FL process,while the unselected devices can suspend the local computing and wait for the global model broadcast by the server in the next round,as shown in Figure 3.
Figure 3. Two device heterogeneity processing schemes in FL over 6G:two-way access selection of heterogeneous devices,and bandwidth compensation of heterogeneous devices.
In addition,due to the dense deployment of base stations,each device may be under the coverage of multiple servers.If a device receives the broadcast information of different servers allowed to participate in its respective FL,it can select the nearest server or the server with the best channel condition in order to reduce the uplink energy consumption.And one device can also switch to different servers according to a certain strategy,to interact with more different devices to obtain greater data sharing and improve learning performance.What’s more,the server may receive incorrect model parameters due to the deterioration of the wireless channel during the upload process of some devices.This will seriously affect the effect of model aggregation,and even cause model collapse.Therefore,the server needs to distinguish and discard wrong model parameters according to additional strategies to ensure good learning performance.
Through the above-mentioned server and device two-way access selection,the influence of“weak”devices and deterioration of channel conditions can be avoided to a large extent.After“weak”devices know that they cannot participate in this FL round through selection indicators,they can suspend the local computing and wait for the next global model.As can be seen from Formula(11)of the overall energy consumption,the device access selection can effectively reduce the energy consumption of local computing and uplink communication of these“weak”devices in each global round.
(2)Bandwidth compensation of heterogeneous devices in FL over 6G
There are usually dense distribution of devices in some scenarios of 6G,such as densely populated hotspots and large-scale IoT,and it will cause the problem of limited communication bandwidth (BW).The impact of this problem on FL over 6G is obvious,because FL generally requires a large uplink bandwidth to support the transmission of a large number of model parameters.At the same time,the high heterogeneity of devices in terms of dataset sizes,computing capabilities,channel conditions,etc.brings challenges to the application of traditional bandwidth allocation and scheduling in FL.
The two-way access selection mentioned previously may not be applicable when the heterogeneity among devices is large but has not yet reached the discard threshold or the devices cannot be discarded due to limitations in certain scenarios.At this time,bandwidth compensation of heterogeneous devices suitable for FL will be a feasible scheme to reduce system energy consumption.
For example,the server can dynamically adjust the uplink bandwidth allocation of each round under the condition of guaranteeing the bandwidth constraint expressed in Subsection 3.3.The dynamic bandwidth compensation scheme based on greedy algorithm can be adopted here to reduce the overall energy consumption.As shown in Algorithm 1,all available bandwidth for uplinkBis divided into small allocation unitbfor each bandwidth compensation step,and the device with the best energy saving effect is always selected for each bandwidth compensation step,so as to achieve the optimal result when all available bandwidth is allocated.
The model gradients of all devices are uploaded according to the optimal bandwidth compensation scheme after multiple rounds of local FL model training.Then the server performs model aggregation,and continues to adjust the bandwidth allocation of the next global FL round.In this way,the dynamic bandwidth allocation and scheduling scheme can effectively reduce the uplink energy consumption in each global round while adapting to the constantly changing network environment,and thereby reducing the overall energy consumption of the system according to Formula(11).
Algorithm 1. Bandwidth compensation algorithm.Input: number of devices K;available BW for uplink B;minimum BW allocation unit b Output: optimal BW compensation scheme x*1: Initial BW compensation scheme x0 : Bk =B0,(k =1,2,...,K);2: for each step s=1 to(B-K·B0)/b do 3: for device k =1,2,...,K do 4:computing EULk according to Formula(8);5:Bk′ =Bk+b 6:computing EUL′according to Formula(8);7: end for 8: selecting device k with max(EUL k′);9: Bk =Bk+b 10: end for k -EULk
The deep integration of 6G and AI has made the network pay more and more attention to the algorithms,parameters,performance,and costs of ML models.Local ML model training is the most important part of FL,so operations and processing for the ML model also affect the communication network,including energy consumption.In this subsection,we will explain several schemes to reduce the energy consumption from the perspective of local FL model processing over 6G,as shown in Figure 4.
Figure 4. Several local FL model processing schemes(take uplink as examples): gradient quantization,sparsification,dynamic batch sizes,model pruning,etc.
(1)Model quantization and sparsification in FL over 6G
In order to meet higher demands of intelligent services in 6G,the ML algorithms in AI continue to develop.The model parameters that FL needs to transmit in 6G network will become more and more,and the number of them will exceed millions or tens of millions.High communication load brings high communication energy consumption,so model compression is a simple and feasible scheme to this problem[34,35].
For example,quantization can be used for compression of model parameters.When each device completes local FL model training,we can quantify the obtained model gradients before uploading them to the server.When the server completes model aggregation,the updated global model can also be quantized before being broadcast to devices.The storage accuracy of the original data is reduced after quantification since fewer bits are used to represent the data,thus significantly reducing the communication load per round.Higher-level quantization can bring lower communication load,and lower energy consumption of each FL round.However,due to the decrease in data accuracy,the learning performance of the FL system is sacrificed.In order to achieve a certain learning accuracy,more global FL rounds may be needed,thereby increasing the overall energy consumption.Therefore,the gradient quantization level needs to be controlled according to a certain strategy to reduce the overall energy consumption.
Similar to quantization,gradient sparsification can also effectively reduce communication energy consumption through model compression[36].For example,after the local FL model training is done,all devices can upload only a fraction of gradients with significant magnitudes according to a certain sparsification strategy(such as setting the sparsification threshold or sparsification percentage).Gradient sparsification also reduces the communication energy consumption of each round by reducing the communication load.But it faces the same problem that excessive sparsification will reduce the learning efficiency of FL and increase the number of global FL rounds.
In addition,the heterogeneity and dynamics of the devices participating in the FL can be further considered in terms of computing and communication.A more flexible model compression control strategy can be used to determine the compression parameters of each device.
(2)Model training tricks and strategies in FL over 6G
The research on ML models and algorithms in AI has made rapid progress,so some tricks and strategies in ML model training can also help solve the energy consumption problem of FL in 6G networks[37-39].
For example,dynamic batch size adjustment can better adapt to the intrinsic states during the local FL model training.In the early stage,small batch size can help the model training avoid trapping in a local minimum,since it corresponds to large-scale random fluctuations.Then in the later stage,large batch size corresponded to small-scale fluctuations can help the model training fine-tune the parameters to the optimal solution.Therefore,dynamic batch size adjustment can achieve FL performance requirements faster with a well-designed increment strategy,thereby helping reduce the global FL rounds and overall energy consumption.
In addition to adjusting the batch size,adjusting the gradient descent strategy,epoch,learning rate,etc.in model training based on experience and the latest AI technology may also affect the energy consumption of FL by reducing communication rounds.
On the other hand,according to the computation energy consumption expressions (5)and (6),AI technical solutions that can more efficiently use the capabilities of CPU/GPU hardware chips and parallel computing can be further developed to reduce the computation energy consumption.What’s more,pre-classification and pre-processing of training data,optimization of datasets,and more reasonable workloads allocation according to certain strategies can also significantly reduce computation energy consumption.
In this section,we simulate schemes to reduce energy consumption mentioned in Section IV to prove their feasibility and effectiveness.
In our simulations,we take handwritten digits recognition as the intelligent application task in FL.And we use the well-known MNIST dataset,which includes 60,000 training data and 10,000 test data,and the size of each picture is 28×28 pixels.The local ML model is implemented using a CNN model by each device,where the learning rate is set to 0.001 and batch-size is set to 100.We set different types of base stations in 6G as the FL server in different scenarios.In order to reflect the heterogeneity of the data,communication capabilities and computing capabilities of the network nodes,the training dataset for each device is non-i.i.d and imbalanced,which means the proportion of samples of different categories is different,and the number of samples in different devices also varies.And the following parameters are all set to random values within a certain range.The transmit power of each devicePk ∈[20 dBm,29 dBm].The channel gainshkare modeled as Rayleigh fading withokfollows an exponential distributionok ~Exp(1) andβ ∈[1,5].The OFDMA system is considered with the bandwidthB=60 MHz and the noise power spectral densityN0= -174 dBm/Hz [40].The computing parametersfkCPUandfkGPUof devices take the value in[2.5 GHz,3 GHz]respectively,CkCPUandCkGPUtake the value in[100,160].
(1)Hierarchical heterogeneous FL
For the simulation of hierarchical heterogeneous FL,we deploy 30 devices randomly in a square area of size 1100m×1100m with one Macro-xNB located at its center.Four Small-xNBs are added to form a three-tier hierarchical FL,of which the second-layer performs its aggregation once after the first-layer aggregates for 10 times,and the baseline is set to traditional FL without Small-xNB.The transmit power of Macro-xNB is set to 40 W,and the transmit power of Small-xNB is set to 5 W.
Figure 5 shows the performance comparison between hierarchical FL and non-hierarchical FL.From Figure 5(a),we observe that the accuracy of the FL model in two schemes gradually converges with the increase of first-layer global FL rounds.Among them,the convergence of the hierarchical FL is slower,and shows a step-up shape every 10 rounds.The accuracy of the hierarchical FL is 97.57%after 100 rounds,which is slightly lower than the 97.70% of the nonhierarchical FL.The relationship between the overall energy consumption and the accuracy is shown by Figure 5(b).For the same energy consumption,the accuracy of hierarchical FL is higher.At 96% accuracy,the overall energy consumption of hierarchical FL is about 13J,which is 27.78%lower than 18J consumed by non-hierarchical FL.When finally reaching 97.57% accuracy (after 100 rounds),hierarchical FL can achieve 39.44%overall energy reduction from 71J to 43J compared to non-hierarchical FL.
Figure 5. The comparison between hierarchical FL and non-hierarchical FL.The performance of the accuracy versus first-layer global FL rounds is shown in (a),and the performance of the accuracy versus the overall energy consumption is shown in(b).
Simulation results agree with the preceding discussion in Subsection 4.1.Hierarchical heterogeneous FL forms multi-layer aggregation by introducing auxiliary access nodes,which reduces the communication frequency of high-layer global aggregation with high communication costs,so it can effectively reduce the energy consumption.However,there is less data sharing in the early stage,so the convergence is slower.After several high-layer model aggregations later,the accuracy rate can gradually approach the baseline.This shows that hierarchical heterogeneous FL can achieve a large reduction in energy consumption at a small cost of FL model learning performance.
(2)Clustered FL combined with D2D communications
For the simulation of clustered FL combined with D2D communications,we deploy 10,20,and 30 devices randomly in a circle with a radius of 500m,and a base station located at its center.The transmit power of the base station is set to 1 W.The K-means algorithm is used for clustering.We perform 100 simulations of the device at the random position for each cluster number,and each simulation contains 200 global FL rounds.
Figure 6 shows the overall energy consumption under different number of clusters.And three lines represent 10,20,and 30 devices in FL system respectively.The right endpoint of each line represents the case of the original FL where no clustering is performed.For the case of 10 devices(blue line),we can observe that when the number of clusters gradually increases from 1 to 10,the overall energy consumption drops first and then rises.When the number of clusters is 4,the energy consumption reaches the minimum value,0.83J lower than the energy consumption of original FL without clustering.For the case of 20 devices(red line),the overall energy consumption also drops first and then rises.The energy consumption is minimum when the number of clusters is 6,with the energy consumption reduction of 2.25J.For the case of 30 devices (yellow line),the optimal number of clusters is 9,and the corresponding energy consumption reduction is 4.19J.
Figure 6. The overall energy consumption under a different number of clusters.Three lines represent 10,20,and 30 devices in the FL system respectively.
The simulation results show that clustered FL combined with D2D communications can effectively reduce the overall energy consumption of the FL system,which is consistent with the previous analysis.This is because devices in the cluster only need to transmit their model gradient to the cluster-head through shortdistance D2D communications with low energy consumption,instead of the base station far away.A reasonable clustering method and the number of clusters are also very important.Too many or too few clusters are difficult to achieve the ideal energy consumption reduction.
(3)Heterogeneous device access selection in FL
For the simulation of device access selection in FL over 6G,we deploy 15 devices randomly in a circular area with a radius of 500 m with one base station located at its center.In order to simulate the wireless network communication scenario with limited bandwidth,we set the bandwidthB=3 MHz here and after.For each round,we randomly select one-third of the devices as “weak” devices whose channels are worse than the others.For FL system with a device access selection scheme,five “weak” devices will not be allowed to access the base station.And for the baseline,“weak”devices will still access the base station.
Figure 7 shows the performance comparison between the device access selection FL and the baseline FL.From Figure 7(a),we observe that the accuracy of the FL model in two schemes gradually converges with the increase of FL rounds.The accuracy of the device access selection FL is 95.26%after 50 rounds,which is slightly lower than the 95.83%of the baseline FL.The relationship between the overall energy consumption and the accuracy is shown in Figure 7(b).For the same energy consumption,the accuracy of the device access selection FL is higher in most cases.As the accuracy increases,the energy-saving gain of the device access selection FL gradually increases,then gradually decreases until it disappears.At 90%accuracy,the overall energy consumption of the device access selection FL is about 13J,which is 31.58%lower than 19J consumed by the baseline FL.
Figure 7. The comparison between the device access selection FL and the baseline FL.The performance of the accuracy versus FL rounds is shown in(a),and the performance of the accuracy versus the overall energy consumption is shown in(b).
Simulation results are in line with the preceding discussion in Subsection 4.2.FL system with access selection reduces the energy consumption of uplink transmission and local computing by preventing the access of“weak”devices.However,due to the reduction of training devices compared with the baseline,the accuracy is slightly lower than the baseline FL in the later stages of training.This indicates that the access selection FL can achieve relatively significant reductions in energy consumption under most accuracy performance constraints.
(4)Dynamic bandwidth compensation in FL
For the simulation of the FL system with dynamic bandwidth allocation and scheduling,we deploy 3-15 devices randomly in a circular area with the radius of 220m with one base station located at its center.The transmit power of base station is set to 5 W.To validate the dynamic bandwidth compensation scheme based on Algorithm 1,we choose average allocation and random allocation for comparison.
Figure 8 shows the simulation results.From this figure,we observe that as the number of devices increases,the overall energy consumption is increasing.Regardless of the number of devices,the FL system with dynamic bandwidth compensation consumes the least energy,and the random bandwidth allocation scheme consumes the most.
Figure 8. The overall energy consumption comparison between the dynamic bandwidth compensation,average bandwidth allocation and random bandwidth allocation as the number of devices increases.
The simulation results are consistent with the previous discussion,dynamic bandwidth compensation can work in a constantly changing network environment,effectively reduce the uplink energy consumption in each round of FL,and thereby reducing the overall energy consumption of the system.
(5)Model quantization and sparsification in FL
To simulate the FL system with model compression,we deploy 5-30 devices in a circular area with a radius of 220m,in which a base station is located at the center and the power is set to 5W.We use three different gradient quantization levels(8-bit,16-bit,and 24-bit)in our simulation,while the original data needs to be represented by 32 bits.We also simulate the scheme of model gradient sparsification.
Figure 9(a) shows the accuracy performance comparison between FL systems with quantization/sparsification and without any model compression in the case of deploying 15 devices.The accuracy of the FL model in five schemes gradually converges with the increase of FL rounds.With the decrease of quantization accuracy (increase of compression rate),the FL accuracy decreases from 98.61%(no quantization or sparsification) to 98.33% (8-bit quantization),and the accuracy of model sparsification is 98.15%.From Figure 9(b),we can observe that the overall energy consumption of FL system with quantization or sparsification after 200 rounds is lower than that of the traditional FL model without any model processing.Similarly,taking 15 devices as an example,the FL system with 8-bit/16-bit/24-bit quantization can get 17.52%/11.68%/5.84%energy reduction compare to the traditional FL,respectively.And the overall energy consumption of the FL system with sparsification is 12.64%lower than that of the traditional FL.
Figure 9. The comparison between FL systems with quantization/sparsification and without any model compression.The performance of the accuracy versus FL rounds is shown in(a),and the performance of the overall energy consumption versus the number of devices is shown by the bar graph in(b).
Simulation results agree with the preceding discussion in Subsection 4.3.Quantization or sparsification reduces the size of data transmitted in FL,thereby significantly reducing the overall energy consumption.However,model compression will inevitably lead to the reduction of model accuracy to a certain extent,so a reasonable model compression level needs to be set in practical application.
In this article,we focus on green concerns in FL over 6G networks,and reveal insights into the importance of the green design of intelligent network architecture,while people generally believe that FL-based 6G network architecture will be the most promising solution to achieve“Native AI”.
We first analyze and summarize five major energy consumption challenges and problems brought by the introduction of FL in 6G,and model the FL energy consumption issues from aspects of computation and communication.We then propose green designs for FL-based 6G network architecture to reduce energy consumption from three perspectives.Several feasible schemes are provided and analyzed respectively.After that,simulation results prove the feasibility and effectiveness of these green designs.
We can conclude that desirable energy reduction can be achieved at the cost of a small learning accuracy loss.Based on the in-depth analysis of these results,we provide a useful guideline to researchers that different schemes or their combinations should be used to achieve the minimum energy consumption at a reasonable cost of learning accuracy for different network scenarios and service requirements in FL-based 6G network.We show that the benefits of further research in green communications can help researchers gain significant insights when designing intelligent 6G networks.
For open issues,here are some possible schemes not mentioned above,which are also future research directions.Initially,to reduce the standby power of the base station,the base station sleeping can reduce energy consumption when the base station is under a low communication load.Furthermore,asynchronous model update can be introduced to increase the training efficiency of the equipment.Additionally,improving the performance-to-power ratio of the hardware chip can also reduce the computing energy consumption of the AI task.What’s more,new methods in the AI domain such as model pruning and knowledge distillation can also reduce the amount of model calculations,thereby reducing the overall energy consumption.Last but not the least,it is possible that all the coefficients involved in FL over 6G can be trained automatically by combining FL with transfer learning or meta-learning,and the optimal parameter configuration for energy saving can be found.All these deserve a thorough investigation following the progress of 6G research in the future.
ACKNOWLEDGEMENT
This research was supported by the National Key Research and Development Program of China(Grant No.2020YFB1806804),and the U.S.National Science Foundation(Grant US CNS-1801925,CNS-2029569,and CNS-2107057).