增強(qiáng)學(xué)習(xí)四要素

增強(qiáng)學(xué)習(xí)四個(gè)要素

  1. policy policy指的是一個(gè)函數(shù)或者規(guī)則,輸入為環(huán)境狀態(tài),輸出為action(Roughly speaking, a policy is a mapping from perceived states of the environment to actions to be taken when in those states.)
  2. reward reward翻譯為獎(jiǎng)勵(lì),指在某個(gè)action之后環(huán)境給你的反饋。和環(huán)境狀態(tài)和action有關(guān)。reward表示的是即使收益(On each time step, the environment sends to the reinforcement learning agent a single number, a reward. The agent’s sole objective is to maximize the total reward it receives over the long run. The reward signal thus defines what are the good and bad events for the agent)
  3. value function。value function表示的是一種長期回報(bào)。一般寫作v(s),指的是agent從狀態(tài)s出發(fā),將來收益的期望。(Roughly speaking, the value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state).某個(gè)狀態(tài)的reward可以很低,但是value function可以很高。因?yàn)閺倪@個(gè)狀態(tài)轉(zhuǎn)到其他狀態(tài),其他狀態(tài)的reward可以很高。舉例:(To make a human analogy, rewards are somewhat like pleasure (if high) and pain (if low), whereas values correspond to a more refined and farsighted judgment of how pleased or displeased we are that our environment is in a particular state.)。在選擇action的時(shí)候,優(yōu)先選擇value大的state。(We seek actions that bring about states of highest value, not highest reward, because these actions obtain the greatest amount of reward for us over the long run),增強(qiáng)學(xué)習(xí)的核心就是估計(jì)狀態(tài)的value function
  4. model of the environment. model作為環(huán)境的模擬,可以根據(jù)此時(shí)的狀態(tài)和做出的ation,預(yù)測下一刻的狀態(tài)以及agent獲得的reward。model主要用來做規(guī)劃。表示我們知道環(huán)境的運(yùn)行原理,方法為model-based。對應(yīng)的是model-free。model-free需要不斷的嘗試,試錯(cuò)來預(yù)估。
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

推薦閱讀更多精彩內(nèi)容

  • **2014真題Directions:Read the following text. Choose the be...
    又是夜半驚坐起閱讀 9,940評(píng)論 0 23
  • 1、誰動(dòng)了我的 “白龍馬”? 說孩子,誰帶誰親;車子,看來是誰開誰親。 我們家白色的小車,我喜歡叫它小白。剛開始我...
    我是敗類閱讀 358評(píng)論 1 0
  • "Time always softens the pain and makes things look like ...
    世羽君閱讀 394評(píng)論 0 1
  • 前幾日,給學(xué)生上完課,順便去格桑老師那兒蹭杯茶喝。偶然間聊起關(guān)于本命年一說,我問老師:為啥有本命年的說法,本命年真...
    何說紛紜閱讀 983評(píng)論 0 0
  • 外婆懷里的歌 唱綠了河邊的垂柳 唱紅了岸邊的桃花 唱開了孩子的笑臉 唱響了多彩的童年 外婆懷里的歌 唱過四季的風(fēng) ...
    夏木魚閱讀 191評(píng)論 0 2