Markov decision process

Markov decision process (MDP), also called a stochastic dynamic program or stochastic control problem, is a model for sequential decision making when outcomes are uncertain.^[1]

Originating from operations research in the 1950s,^[2]^[3] MDPs have since gained recognition in a variety of fields, including ecology, economics, healthcare, telecommunications and reinforcement learning.^[4]

^ Puterman, Martin L. (1994). Markov decision processes: discrete stochastic dynamic programming. Wiley series in probability and mathematical statistics. Applied probability and statistics section. New York: Wiley. ISBN 978-0-471-61977-2.
^ Schneider, S.; Wagner, D. H. (1957-02-26). "Error detection in redundant systems". Papers presented at the February 26-28, 1957, western joint computer conference: Techniques for reliability. IRE-AIEE-ACM '57 (Western). New York, NY, USA: Association for Computing Machinery: 115–121. doi:10.1145/1455567.1455587. ISBN 978-1-4503-7861-1.
^ Bellman, Richard (1958-09-01). "Dynamic programming and stochastic control processes". Information and Control. 1 (3): 228–239. doi:10.1016/S0019-9958(58)80003-0. ISSN 0019-9958.
^ Sutton, Richard S.; Barto, Andrew G. (2018). Reinforcement learning: an introduction. Adaptive computation and machine learning series (2nd ed.). Cambridge, Massachusetts: The MIT Press. ISBN 978-0-262-03924-6.