歷史相依決策模型的建立及相應(yīng)過(guò)程的構(gòu)造

2017-11-28 17:15:23莫曉云周杰明金芳

湖南師范大學(xué)學(xué)報(bào)·自然科學(xué)版 2017年5期

莫曉云+周杰明+金芳

摘要歷史相依決策模型（HDDM）及歷史相依決策過(guò)程（HDDP）是決策模型及相應(yīng)的決策過(guò)程的一般情形. 馬氏決策模型（MDM）及馬氏決策過(guò)程（MDP）是HDDM及HDDP的特殊情形.本文嚴(yán)格地建立了歷史相依決策模型，并證明了相應(yīng)的歷史相依決策過(guò)程的存在性，證明是構(gòu)造性的. 作為HDDM及HDDP的特殊情形，建立了馬氏決策模型（MDM），并構(gòu)造了相應(yīng)的馬氏決策過(guò)程（MDP）.

關(guān)鍵詞歷史相依決策模型的建立；歷史相依決策過(guò)程的存在性和構(gòu)造；馬氏決策模型及馬氏決策過(guò)程；馬氏過(guò)程

中圖分類號(hào) O212.5 文獻(xiàn)標(biāo)識(shí)碼 A 文章編號(hào) 1000-2537（2017）05-0088-07

Establishment of History Dependent Decision Models and Construction of Corresponding Processes

MO Xiao-yun1，2， ZHOU Jie-ming2， JIN Fang3*

（1. College of Mathematics and Statistics， Hunan University of Finance and Economics， Changsha 410205， China；

2. College of Mathematics and Computer Science， Key Laboratory of High Performance Computing and Stochastic

Information Processing， Ministry of Education of China， Hunan Normal University， Changsha， 410081， China；

3.College of Mathematics and Computing Science， Hunan City University， Yiyang， 413000， China）

Abstract History Dependent Decision Model （HDDM） and History Dependent Decision Process （HDDP） are the most general cases of the decision model and their corresponding processes. The Markov Decision Model （MDM） and Markov Decision Process （MDP） are special cases of HDDM and HDDP. In this work， the history dependent decision model has been established， and the existence of corresponding history dependent decision process has been proved. The proof is constructive. As special cases of HDDM and HDDP， the Markov decision model has been established and the Markov decision process has been constructed.

Key words history dependent decision model； Markov decision model； Markov decision process； Markov process

在描述馬氏決策模型（MDM）及相應(yīng)的馬氏決策過(guò)程（MDP）的決策控制系統(tǒng)中，系統(tǒng)將來(lái)的狀態(tài)只依賴于系統(tǒng)現(xiàn)在的狀態(tài)和現(xiàn)在采取的決策行動(dòng).如果系統(tǒng)將來(lái)的狀態(tài)依賴于系統(tǒng)的歷史狀態(tài)和歷史決策行動(dòng)，這就是歷史相依決策模型（HDDM）及相應(yīng)的歷史相依決策過(guò)程（HDDP）.由于HDDM和HDDP過(guò)于一般，較難深入研究.但對(duì)馬氏決策模型及相應(yīng)過(guò)程，已經(jīng)有深刻的研究，有豐富的成果[ 1-5 ].關(guān)于馬氏決策模型及相應(yīng)過(guò)程的諸多專著和論文中，總是簡(jiǎn)單地提及歷史相依決策模型及相應(yīng)過(guò)程，然而卻沒(méi)有詳細(xì)和準(zhǔn)確地給出歷史相依決策模型的建立以及相應(yīng)過(guò)程的構(gòu)造. 因此，完成這個(gè)建立和構(gòu)造很有必要.我們對(duì)于諸多相類似的模型及其過(guò)程的構(gòu)造，已經(jīng)有很好的研究[6-10]，本文將利用文獻(xiàn)[6-11]中的思想和方法.

1 歷史相依決策模型

設(shè)有某個(gè)受決策者控制的系統(tǒng)，該系統(tǒng)的狀態(tài)依賴于時(shí)間、系統(tǒng)的歷史狀態(tài)和決策者的歷史決策行動(dòng). 時(shí)間可以是連續(xù)的，但離散時(shí)間更接近于實(shí)際的操作. 假定時(shí)間為n=0，1，2，…，N. N是正整數(shù)，也稱期末時(shí). 設(shè)在某個(gè)時(shí)刻，系統(tǒng)處于某個(gè)狀態(tài)x，在該時(shí)刻決策者可以作出某個(gè)決策行動(dòng)a，下一時(shí)刻，系統(tǒng)的狀態(tài)將從x轉(zhuǎn)移到某個(gè)狀態(tài)y. 如果在每個(gè)時(shí)刻n∈{0，1，2，…，N-1}，決策者都做出一個(gè)決策行動(dòng)，這N個(gè)行動(dòng)全體就構(gòu)成一個(gè)決策策略. 策略和行動(dòng)不同. 研究決策模型的目標(biāo)之一是選擇最好的策略，使得系統(tǒng)的某個(gè)指標(biāo)達(dá)到最優(yōu).例如，考慮某個(gè)投資者，他是決策者，系統(tǒng)的狀態(tài)就是他的財(cái)富，如果他希望期末時(shí)財(cái)富最多，如何投資就是他的策略.

定理6說(shuō)明，對(duì)于歷史相依決策過(guò)程，如果僅僅只研究其值函數(shù)，則只要研究馬氏決策過(guò)程.

致謝感謝“風(fēng)險(xiǎn)理論與隨機(jī)控制”討論班的老師們提出的研究問(wèn)題和寶貴建議.

參考文獻(xiàn)：

[1] BAUERLE N， RIEDER U. Markov decision processes with applications to finance [M]. Berlin： Springer-Verlag， 2011.endprint

[2] GUO X P， HEMANDEZ-LEMA O. Continuous-time Markov decision processes [M]. Berlin： Springer-Verlag， 2009.

[3] GUO X P， HEMANDEZ-LEMA O， PRIETO-RUMEAU T. A survey of recent results on continuous-time Markov decision processes [J]. Top， 2006，14（2）：177-246.

[4] HINDERER K. Foundations of non-stationary dynamic programming with discrete time parameter [M]. Berlin： Springer-Verlag， 1970.

[5] 嚴(yán)加安. 測(cè)度論講義（第二版）[M]. 北京：科學(xué)出版社，2004.

[6] 莫曉云. 用獨(dú)立乘積空間構(gòu)造相依隨機(jī)變量的組裝法 [J]. 湖南師范大學(xué)自然科學(xué)學(xué)報(bào)， 2010，33（2）：3-6.

[7] 莫曉云，歐輝，周杰明. Markov相依風(fēng)險(xiǎn)模型的等價(jià)定理及概率構(gòu)造 [J]. 經(jīng)濟(jì)數(shù)學(xué)， 2012，29（1）：61-64.

[8] MO X Y，YANG X Q. Criterion of semi-Markov dependent risk model [J]. Acta Math Sin， 2014，30B（7）：1237-1280.

[9] MO X Y，ZHOU J M， OU H， et al. Double Markov risk model [J]. Acta Math Sci， 2013，33B（2）：330-340.

[10] 莫曉云，楊向群. Markov調(diào)制風(fēng)險(xiǎn)模型的軌道刻劃和概率構(gòu)造[J]. 應(yīng)用數(shù)學(xué)學(xué)報(bào)， 2012，35（3）：385-394.

[11] ZHOU J M， MO X Y， OU H， et al. Expected present value of total dividends in the compound binomial model with delayed claims and random income[J]. Acta Math Sci， 2013，33B（6）：1639-1651.endprint

猜你喜歡

馬氏相依決策者

熱浪滾滾：新興市場(chǎng)決策者竭力應(yīng)對(duì)通脹升溫精讀

英語(yǔ)文摘(2021年12期)2021-12-31 03:26:20

一類時(shí)間變換的強(qiáng)馬氏過(guò)程

數(shù)學(xué)物理學(xué)報(bào)(2021年3期)2021-07-19 06:02:44

有環(huán)的可逆馬氏鏈的統(tǒng)計(jì)確認(rèn)

數(shù)學(xué)物理學(xué)報(bào)(2020年6期)2021-01-14 01:00:44

關(guān)于樹指標(biāo)非齊次馬氏鏈的廣義熵遍歷定理

數(shù)學(xué)年刊A輯(中文版)(2020年1期)2020-05-19 00:30:50

家國(guó)兩相依

音樂(lè)教育與創(chuàng)作(2020年1期)2020-05-13 09:18:04