曾妮,陳俊豪,傅清爽
摘要:為解決單目標(biāo)玩家在僅知道當(dāng)天的天氣狀況下如何規(guī)劃最佳行動(dòng)策略的問(wèn)題,提出一種基于貪心算法的動(dòng)態(tài)規(guī)劃策略。通過(guò)分析單目標(biāo)玩家的狀態(tài)轉(zhuǎn)移過(guò)程,提出基于Floyd算法得出最短路徑以及貪心算法的最優(yōu)后續(xù)決策期望方法,分析最終收益的期望值,從而選擇一種最佳行動(dòng)策略,并通過(guò)蒙特卡洛模擬對(duì)天氣進(jìn)行隨機(jī)模擬,將出現(xiàn)概率最大的視為最佳路線進(jìn)行對(duì)比檢驗(yàn)。分析結(jié)果表明:該策略能夠使玩家在一般情況的未知天氣組合下選擇出最佳行動(dòng)路線,使得最終資金收益值達(dá)到最大。
關(guān)鍵詞:動(dòng)態(tài)規(guī)劃模型;蒙特卡洛模擬;貪心算法;Floyd算法;決策模型
中圖分類號(hào):TP391.9? ? ?文獻(xiàn)標(biāo)識(shí)碼:A
文章編號(hào):1009-3044(2021)20-0141-03
Dynamic Programming Strategy Based on Greedy Algorithm
ZENG Ni1, CHEN Jun-hao2, FU Qing-shuang3
(1.School of Science, Jiangxi University of Science and Technology, Ganzhou 341000,China;2.School of Civil and Surveying Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China; 3.School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China)
Abstract: To solve the problem of single target player under just know the day's weather conditions due to the problem of how to plan the best strategies in this paper, a dynamic planning strategy based on greedy algorithm, through the analysis of the status of the single target player transfer process resource state function model is established and the optimal decision model of funds, it is concluded that the shortest path based on Floyd algorithm, and the optimal expected follow-up decision-making method based on greedy algorithm, the analysis of the subsequent decisions ultimately earnings expectations, to choose a best course of action strategy, and through monte carlo simulation to stochastic simulation of the weather, will be regarded as the best route with the highest probability compared test The analysis results show that this strategy enables the player to choose the best course of action under the general circumstance of unknown weather combination, which makes the final capital gain reach the maximum.
Key words: dynamic programming model; monte carlo simulation; greedy algorithm; floyd algorithm;decision-making mode
1 引言
近年來(lái),越來(lái)越多的探險(xiǎn)家為了領(lǐng)略沙漠壯觀的景色以及對(duì)自己毅力的考驗(yàn)進(jìn)而選擇徒步穿越沙漠,為了更方便地對(duì)探險(xiǎn)家行走方式進(jìn)行研究,將此過(guò)程模擬成一款穿越沙漠的小游戲,從沙漠的起點(diǎn)出發(fā)前往所規(guī)劃的終點(diǎn)過(guò)程,會(huì)受到多種因素的限制,而探險(xiǎn)家穿越沙漠希望能夠在預(yù)計(jì)時(shí)間內(nèi)到達(dá)終點(diǎn)且此過(guò)程花費(fèi)的成本最少,因此途中如何進(jìn)行決策將面臨挑戰(zhàn)。
程凱等[1]通過(guò)將地圖數(shù)字化后,通過(guò)歷遍前往礦山以及村莊的所有路徑,從中得到在天氣已知的情況下第一關(guān)和第二關(guān)的最優(yōu)解,其次,在天氣未知的情況下,通過(guò)最大似然估計(jì)得到未來(lái)天氣的分布函數(shù)來(lái)預(yù)測(cè)未來(lái)天氣,但具體最佳行動(dòng)策略仍未得出確切解。臧洋等[2]根據(jù)Bellman-Ford算法和最短路的思想,通過(guò)確定目標(biāo)函數(shù)和約束條件,搭建線性規(guī)劃模型,得到在天氣已知的情況下每種情況的最優(yōu)策略,但對(duì)于單個(gè)玩家在天氣未知的情況下沒(méi)有給出具體的分析。
筆者基于貪心算法在單目標(biāo)玩家僅知當(dāng)天天氣狀況下,對(duì)比得出最優(yōu)后續(xù)決策期望的選擇策略,并通過(guò)蒙特卡洛模擬對(duì)天氣進(jìn)行隨機(jī)模擬,將出現(xiàn)概率最大的視為最佳路線進(jìn)行對(duì)比檢驗(yàn),驗(yàn)證了該種選擇策略方法的可行度。
2 模型建立
2.1 問(wèn)題提出