Reward hacking

Specification gaming or reward hacking occurs when an AI optimizes an objective function—achieving the literal, formal specification of an objective—without actually achieving an outcome that the programmers intended. DeepMind researchers have analogized it to the human behavior of finding a "shortcut" when being evaluated: "In the real world, when rewarded for doing well on a homework assignment, a student might copy another student to get the right answers, rather than learning the material—and thus exploit a loophole in the task specification."[1]

  1. ^ "Specification gaming: the flip side of AI ingenuity". DeepMind. 21 April 2020. Retrieved 21 June 2020.