亚洲免费av电影一区二区三区,日韩爱爱视频,51精品视频一区二区三区,91视频爱爱,日韩欧美在线播放视频,中文字幕少妇AV,亚洲电影中文字幕,久久久久亚洲av成人网址,久久综合视频网站,国产在线不卡免费播放

        ?

        Distributed Momentum-Based Frank-Wolfe Algorithm for Stochastic Optimization

        2023-03-27 02:38:06JieHouXianlinZengGangWangJianSunSeniorandJieChen
        IEEE/CAA Journal of Automatica Sinica 2023年3期

        Jie Hou, Xianlin Zeng,, Gang Wang,,Jian Sun,Senior , and Jie Chen,

        Abstract—This paper considers distributed stochastic optimization, in which a number of agents cooperate to optimize a global objective function through local computations and information exchanges with neighbors over a network.Stochastic optimization problems are usually tackled by variants of projected stochastic gradient descent.However, projecting a point onto a feasible set is often expensive.The Frank-Wolfe (FW) method has well-documented merits in handling convex constraints, but existing stochastic FW algorithms are basically developed for centralized settings.In this context, the present work puts forth a distributed stochastic Frank-Wolfe solver, by judiciously combining Nesterov’s momentum and gradient tracking techniques for stochastic convex and nonconvex optimization over networks.It is shown that the convergence rate of the proposed algorithm isfor convex optimization, and O(1/log2(k)) for nonconvex optimization.The efficacy of the algorithm is demonstrated by numerical simulations against a number of competing alternatives.

        I.INTRODUCTION

        DISTRIBUTED stochastic optimization is a basic problem that arises widely in diverse engineering applications,including unmanned systems [1]–[3], distributed machine learning [4], and multi-agent reinforcement learning [5]–[7],to name a few.The goal is to minimize a shared objective function, which is defined as the expectation of a set of stochastic functions subject to general convex constraints, by means of local computations and information exchanges between working agents.

        A popular approach to solving problem (1) is the projected stochastic gradient descent (pSGD) [12]–[14].In pSGD and its variants, the iteration variable is projected back ontoX after taking a step in the direction of negative stochastic gradient [15]–[20].Such algorithms are efficient when the computational cost of performing the projection is low, e.g., projecting onto a hypercube or simplex.In many practical situations of interest, however, the cost of projecting onto X can be high, e.g., dealing with a trace norm ball or a base polytopeX in submodular minimization [21].

        An alternative for tackling problem (1) is the projection-free methods, including the Frank-Wolfe (FW) [22] and conditional gradient sliding [23].In this paper, we focus on the FW algorithm, which is also known as conditional gradient method [22].Classical FW methods circumvent the projection step by first solving a linear minimization subproblem over the constraint set X to obtain a sort of conditional gradient θk, which is followed by updatingxk+1through a convex combination of the current iteration variablesxkand θk.On top of this idea, a number of modifications have been proposed to improve or accelerate the FW method in algorithm design or convergence analysis, see e.g., [8]–[10] and[24]–[31].

        Nonetheless, most existing pSGD and Frank-Wolfe variants are designed for constrained centralized problems, and they cannot directly handle distributed problems.Therefore, it is necessary to develop distributed stochastic projection-free methods for problem (1).In addition, stochastic FW methods may not converge even in the centralized convex case, without increasing the batch size [8].In this context, a natural question arises: is it possible to develop a distributed FW method by using any fixed batch size for problem (1), while enjoying a convergence rate comparable to that of centralized stochastic FW methods? In this paper, we answer this question affirmatively, by carefully designing a distributed stochastic FW algorithm, which converges for any fixed batch size (can be as small as 1) and enjoys a comparable convergence rate as in the centralized stochastic setting.

        A.Related Work

        Although considerable results have been reported for distributed FW in deterministic settings, they cannot be directly applied in and/or generalized to stochastic settings.The reason is twofold: 1) FW may diverge due to the non-vanishing variance in gradient estimates; and, 2) the desired convergence rate of FW for stochastic optimization is not guaranteed to be comparable to pSGD, even for the centralized setting.

        To address these challenges, the present paper puts forth a distributed stochastic version of the celebrated FW algorithm for stochastic optimization over networks.The main idea behind our proposal is a judicious combination of the recursive momentum [39] and the Nesterov’s momentum [40].On the theory side, it is shown that the proposed algorithm can not only attenuate the noise in gradient approximation, but also achieve a convergence guarantee comparable to pSGD in convex case.Comparison of the proposed algorithm in context is provided in Table I.

        TABLE I CONVERGENCE RATE FOR STOCHASTIC OPTIMIZATION

        B.Our Contributions

        In succinct form, the contributions of this work are summarized as follows.

        1) We propose a projection-free algorithm, referred to as the distributed momentum-based Frank-Wolfe (DMFW), for convex and nonconvex stochastic optimization over networks.Compared with the centralized FW methods [8]–[10], [28]and [31], DMFW is considerably different in algorithm design and convergence analysis.

        II.PRELIMINARIES AND ALGORITHM DESIGN

        A.Notation and Preliminaries

        B.Algorithm Design

        To solve problem (1), we propose a distributed momentumbased Frank-Wolfe algorithm, which is summarized in Algorithm 1.

        Algorithm 1 Distributed Momentum-Based Frank-Wolfexi1∈X yi1=?fi(?xi1,ξi1)=si1?i∈N Input: Number of iterations K, initial conditions , and for.1: for all do 2: Average consensus:k=1,2,...,K?xik=∑cijxjk(2)j∈Niwhere is the set of neighbors of node i.3: Momentum update:Ni yik=(1?γk)yik?1+?fi(?xik,ξik)?(1?γk)?fi(?xik?1,ξik) (3)where is a step size.4: Gradient tracking:γk∈(0,1]sik=∑cijsjk?1+yik?yik?1(4)j∈Ni∑pik=cijsjk.(5)j∈Ni5: Frank-Wolfe step:θik∈argmin ?∈X〈pik,?〉(6)xik+1= ?xik+ηk(θik??xik) (7)where is a step size.6: end for ηk∈(0,1]xik7: return for all.+1i∈N

        III.MAIN RESULTS

        In this section, we establish the convergence results of the proposed algorithm for convex and nonconvex problems,respectively.Before providing the results, we outline some standing assumptions and facts.

        A.Assumptions and Facts

        Assumption 1 (Weight rule):The weighted adjacency matrixCis a doubly stochastic matrix, i.e., the row sum and the column sum ofCare all 1.

        Assumption 1 indicates that for each round of theAverage consensusstep of Algorithm 1, the agent takes a weighted average of the values from its neighbors according toC.

        Assumption 2 (Connectivity):The network G is connected.

        B.Convergence Rate for Convex Stochastic Optimization

        This subsection is dedicated to the performance analysis of Algorithm 1.Let us start by defining the following auxiliary vectors:

        C.Convergence Rate for Nonconvex Optimization

        This subsection provides the convergence rate of the proposed DMFW for (1) with nonconvex objective functions.To show the convergence performance of DMFW for nonconvex case, we introduce the FW-gap, which is defined as

        IV.NUMERICAL TESTS

        A.Binary Classification

        TABLE II REAL DATA FOR BLACK-BOX BINARY CLASSIFICATION

        Fig.1.Multi-agent communication topology.

        Fig.2.The error ‖ Pˉk?yˉk‖ of DMFW on a9adataset.

        Fig.4 shows the FW-gap of SFW, MSHFW, DeFW and DMFW for solving nonconvex problem (16).From the results,it can be observed that stochastic algorithms (SFW, DMFW and MSHFW) perform better compared to deterministic algorithm DeFW in all tested datasets.This indicates that the stochastic FW algorithms are more efficient than the deterministic FW algorithms in solving nonconvex problems (16).Comparing DMFW with centralized algorithms MSHFW and SFW, DMFW slightly outperforms SFW, especially in datasetsa9a, but is slower than MSHFW.

        B.Stochastic Ridge Regression

        V.CONCLUSIONS

        Fig.3.The comparison between SFW, MSHFW, DeFW and DMFW on three datasets.(a) covtype.binarydataset; (b) a9adataset; (c) w8adataset.

        Fig.4.The comparison between SFW, MSHFW, DeFW and DMFW on three datasets.(a) covtype.binarydataset; (b) a9adataset; (c) w8adataset.

        Fig.5.The comparison between SFW, MSHFW and DMFW.(a) l1- norm ball constraint; (b) l2-norm ball constraint; (c) l5-norm ball constraint.

        APPENDIX A TECHNICAL LEMMAS 1

        Before proving Lemma 1, we first give the following technical lemmas.

        APPENDIX B TECHNICAL LEMMAS

        APPENDIX C PROOF OF LEMMA 2

        APPENDIX D PROOF OF LEMMA 3

        APPENDIX E PROOF OF LEMMA 4

        ˉPkˉyk‖?F(ˉxk)?pik‖2

        Proof:Adding and subtracting and to ,we have

        APPENDIX G PROOF OF THEOREM 2

        Proof:It follows from the definition of FW-gap (15) that:

        亚洲av噜噜狠狠蜜桃| 国产欧美日韩精品专区| 熟女人妻在线视频| 国产免费网站看v片元遮挡| 亚洲国产人成自精在线尤物| 美腿丝袜日韩在线观看| 中国老太婆bb无套内射| 特级毛片a级毛片免费播放| 尤物yw午夜国产精品视频| 一个人午夜观看在线中文字幕 | 亚洲国产中文字幕精品| 成人精品视频一区二区| 在线观看免费a∨网站| 2022AV一区在线| 久久精品国产在热亚洲不卡| 夹得好湿真拔不出来了动态图| 九九久久精品无码专区| 久久精品—区二区三区无码伊人色| 人妻丰满熟妇av一区二区| 三上悠亚亚洲精品一区| 国产白嫩护士被弄高潮| 天天狠狠综合精品视频一二三区| 亚洲一区丝袜美腿在线观看| 野花视频在线观看免费| 欧美成人秋霞久久aa片| 久久人妻少妇嫩草av蜜桃| 亚洲老熟妇愉情magnet| 国产亚洲熟妇在线视频| 手机看片久久国产免费| 麻豆AV免费网站| 我和丰满老女人性销魂| 亚洲乱码中文字幕在线播放| 欧美性猛交xxxx乱大交3| 夜夜综合网| 国产成人久久精品二区三区| 亚洲国产成人精品无码区在线播放 | 久久精品国产热| 亚洲国产综合一区二区| 人妻诱惑中文字幕在线视频| 特级婬片国产高清视频| 国产成人亚洲综合小说区|