亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

        ?

        Distributed Momentum-Based Frank-Wolfe Algorithm for Stochastic Optimization

        2023-03-27 02:38:06JieHouXianlinZengGangWangJianSunSeniorandJieChen
        IEEE/CAA Journal of Automatica Sinica 2023年3期

        Jie Hou, Xianlin Zeng,, Gang Wang,,Jian Sun,Senior , and Jie Chen,

        Abstract—This paper considers distributed stochastic optimization, in which a number of agents cooperate to optimize a global objective function through local computations and information exchanges with neighbors over a network.Stochastic optimization problems are usually tackled by variants of projected stochastic gradient descent.However, projecting a point onto a feasible set is often expensive.The Frank-Wolfe (FW) method has well-documented merits in handling convex constraints, but existing stochastic FW algorithms are basically developed for centralized settings.In this context, the present work puts forth a distributed stochastic Frank-Wolfe solver, by judiciously combining Nesterov’s momentum and gradient tracking techniques for stochastic convex and nonconvex optimization over networks.It is shown that the convergence rate of the proposed algorithm isfor convex optimization, and O(1/log2(k)) for nonconvex optimization.The efficacy of the algorithm is demonstrated by numerical simulations against a number of competing alternatives.

        I.INTRODUCTION

        DISTRIBUTED stochastic optimization is a basic problem that arises widely in diverse engineering applications,including unmanned systems [1]–[3], distributed machine learning [4], and multi-agent reinforcement learning [5]–[7],to name a few.The goal is to minimize a shared objective function, which is defined as the expectation of a set of stochastic functions subject to general convex constraints, by means of local computations and information exchanges between working agents.

        A popular approach to solving problem (1) is the projected stochastic gradient descent (pSGD) [12]–[14].In pSGD and its variants, the iteration variable is projected back ontoX after taking a step in the direction of negative stochastic gradient [15]–[20].Such algorithms are efficient when the computational cost of performing the projection is low, e.g., projecting onto a hypercube or simplex.In many practical situations of interest, however, the cost of projecting onto X can be high, e.g., dealing with a trace norm ball or a base polytopeX in submodular minimization [21].

        An alternative for tackling problem (1) is the projection-free methods, including the Frank-Wolfe (FW) [22] and conditional gradient sliding [23].In this paper, we focus on the FW algorithm, which is also known as conditional gradient method [22].Classical FW methods circumvent the projection step by first solving a linear minimization subproblem over the constraint set X to obtain a sort of conditional gradient θk, which is followed by updatingxk+1through a convex combination of the current iteration variablesxkand θk.On top of this idea, a number of modifications have been proposed to improve or accelerate the FW method in algorithm design or convergence analysis, see e.g., [8]–[10] and[24]–[31].

        Nonetheless, most existing pSGD and Frank-Wolfe variants are designed for constrained centralized problems, and they cannot directly handle distributed problems.Therefore, it is necessary to develop distributed stochastic projection-free methods for problem (1).In addition, stochastic FW methods may not converge even in the centralized convex case, without increasing the batch size [8].In this context, a natural question arises: is it possible to develop a distributed FW method by using any fixed batch size for problem (1), while enjoying a convergence rate comparable to that of centralized stochastic FW methods? In this paper, we answer this question affirmatively, by carefully designing a distributed stochastic FW algorithm, which converges for any fixed batch size (can be as small as 1) and enjoys a comparable convergence rate as in the centralized stochastic setting.

        A.Related Work

        Although considerable results have been reported for distributed FW in deterministic settings, they cannot be directly applied in and/or generalized to stochastic settings.The reason is twofold: 1) FW may diverge due to the non-vanishing variance in gradient estimates; and, 2) the desired convergence rate of FW for stochastic optimization is not guaranteed to be comparable to pSGD, even for the centralized setting.

        To address these challenges, the present paper puts forth a distributed stochastic version of the celebrated FW algorithm for stochastic optimization over networks.The main idea behind our proposal is a judicious combination of the recursive momentum [39] and the Nesterov’s momentum [40].On the theory side, it is shown that the proposed algorithm can not only attenuate the noise in gradient approximation, but also achieve a convergence guarantee comparable to pSGD in convex case.Comparison of the proposed algorithm in context is provided in Table I.

        TABLE I CONVERGENCE RATE FOR STOCHASTIC OPTIMIZATION

        B.Our Contributions

        In succinct form, the contributions of this work are summarized as follows.

        1) We propose a projection-free algorithm, referred to as the distributed momentum-based Frank-Wolfe (DMFW), for convex and nonconvex stochastic optimization over networks.Compared with the centralized FW methods [8]–[10], [28]and [31], DMFW is considerably different in algorithm design and convergence analysis.

        II.PRELIMINARIES AND ALGORITHM DESIGN

        A.Notation and Preliminaries

        B.Algorithm Design

        To solve problem (1), we propose a distributed momentumbased Frank-Wolfe algorithm, which is summarized in Algorithm 1.

        Algorithm 1 Distributed Momentum-Based Frank-Wolfexi1∈X yi1=?fi(?xi1,ξi1)=si1?i∈N Input: Number of iterations K, initial conditions , and for.1: for all do 2: Average consensus:k=1,2,...,K?xik=∑cijxjk(2)j∈Niwhere is the set of neighbors of node i.3: Momentum update:Ni yik=(1?γk)yik?1+?fi(?xik,ξik)?(1?γk)?fi(?xik?1,ξik) (3)where is a step size.4: Gradient tracking:γk∈(0,1]sik=∑cijsjk?1+yik?yik?1(4)j∈Ni∑pik=cijsjk.(5)j∈Ni5: Frank-Wolfe step:θik∈argmin ?∈X〈pik,?〉(6)xik+1= ?xik+ηk(θik??xik) (7)where is a step size.6: end for ηk∈(0,1]xik7: return for all.+1i∈N

        III.MAIN RESULTS

        In this section, we establish the convergence results of the proposed algorithm for convex and nonconvex problems,respectively.Before providing the results, we outline some standing assumptions and facts.

        A.Assumptions and Facts

        Assumption 1 (Weight rule):The weighted adjacency matrixCis a doubly stochastic matrix, i.e., the row sum and the column sum ofCare all 1.

        Assumption 1 indicates that for each round of theAverage consensusstep of Algorithm 1, the agent takes a weighted average of the values from its neighbors according toC.

        Assumption 2 (Connectivity):The network G is connected.

        B.Convergence Rate for Convex Stochastic Optimization

        This subsection is dedicated to the performance analysis of Algorithm 1.Let us start by defining the following auxiliary vectors:

        C.Convergence Rate for Nonconvex Optimization

        This subsection provides the convergence rate of the proposed DMFW for (1) with nonconvex objective functions.To show the convergence performance of DMFW for nonconvex case, we introduce the FW-gap, which is defined as

        IV.NUMERICAL TESTS

        A.Binary Classification

        TABLE II REAL DATA FOR BLACK-BOX BINARY CLASSIFICATION

        Fig.1.Multi-agent communication topology.

        Fig.2.The error ‖ Pˉk?yˉk‖ of DMFW on a9adataset.

        Fig.4 shows the FW-gap of SFW, MSHFW, DeFW and DMFW for solving nonconvex problem (16).From the results,it can be observed that stochastic algorithms (SFW, DMFW and MSHFW) perform better compared to deterministic algorithm DeFW in all tested datasets.This indicates that the stochastic FW algorithms are more efficient than the deterministic FW algorithms in solving nonconvex problems (16).Comparing DMFW with centralized algorithms MSHFW and SFW, DMFW slightly outperforms SFW, especially in datasetsa9a, but is slower than MSHFW.

        B.Stochastic Ridge Regression

        V.CONCLUSIONS

        Fig.3.The comparison between SFW, MSHFW, DeFW and DMFW on three datasets.(a) covtype.binarydataset; (b) a9adataset; (c) w8adataset.

        Fig.4.The comparison between SFW, MSHFW, DeFW and DMFW on three datasets.(a) covtype.binarydataset; (b) a9adataset; (c) w8adataset.

        Fig.5.The comparison between SFW, MSHFW and DMFW.(a) l1- norm ball constraint; (b) l2-norm ball constraint; (c) l5-norm ball constraint.

        APPENDIX A TECHNICAL LEMMAS 1

        Before proving Lemma 1, we first give the following technical lemmas.

        APPENDIX B TECHNICAL LEMMAS

        APPENDIX C PROOF OF LEMMA 2

        APPENDIX D PROOF OF LEMMA 3

        APPENDIX E PROOF OF LEMMA 4

        ˉPkˉyk‖?F(ˉxk)?pik‖2

        Proof:Adding and subtracting and to ,we have

        APPENDIX G PROOF OF THEOREM 2

        Proof:It follows from the definition of FW-gap (15) that:

        国产一区二区白浆在线观看| 欧美性猛交xxxx黑人| 美腿丝袜在线一区二区| 无码一区二区三区中文字幕| 人妻丰满熟妇av无码处处不卡| 国产综合激情在线亚洲第一页| 超级碰碰人妻中文字幕| 亚洲精品国产综合久久| 粗大的内捧猛烈进出小视频| 最新国产av无码专区亚洲| 亚州毛色毛片免费观看| 国产综合一区二区三区av| 亚洲一区二区三区偷拍女| 国产香蕉国产精品偷在线| 思思99热精品免费观看| 日韩精品极品视频在线免费 | 免费一级淫片日本高清| 亚洲日韩一区二区三区| 国产成人av综合亚洲色欲| 中文字幕有码高清| 在线观看一区二区三区在线观看 | 日韩精品视频高清在线| 女人被弄到高潮的免费视频| 国产久热精品无码激情 | 亚洲视频一区二区三区免费| 免费国产在线视频自拍白浆| 久久综合九色综合欧美狠狠| 无码一区东京热| 精品一区二区三区长筒靴| 最新中文字幕一区二区| 国产精品久久久久久久久岛| 99亚洲乱人伦精品| 亚洲中文字幕乱码一二三| 久久久久久久波多野结衣高潮| h在线国产| 中文字幕一区二区在线看| 免费观看成人欧美www色| 欧美黑人巨大xxxxx| 精品一区二区三区四区少妇| 五月婷婷开心五月播五月| 午夜精品久久久久久久久|