Reinforcement learning is a type of problem requiring machines to discover the solution. The machines usually accomplish this by trial and error—whatever path returns the largest reward must be the best. They have to take into account all rewards along a path, not just immediate ones. Any solution to one of these problems is called a reinforcement learning method.
Reinforcement learning is often used in prediction, and an automated chess player probably makes use of it. The computer goes through all options available and chooses the one with the maximum reward, which in this case is the option that will allow the computer to win the game. Newborn children and animals use it as well. As they have nothing else to go on, they use trial and error to figure out how to walk, eat, and throw things. Their efforts become more and more sophisticated as they recall attempts that have worked in the past and experiment from there to find even better ways to accomplish their goals.
Computer programs that use reinforcement learning contain four specific elements: a policy, a reward function, a value function and a model.
The process of trial and error is as old as time, but reinforcement learning is a relatively recent development. It has ties to the 1950’s program pattern known as “optimal control,” a process that contains a value function (optimal control function) and tries to reach the value established in it. Reinforcement learning, which is based more on trial and error, is referenced as early as 1911, and was finally applied to computation in 1954 separately by Marvin Minsky and by Farly and Clark. Many psychologists and computational experts experimented with it throughout the ‘60s and ‘70s, though it wasn't until the ‘80s that its growth really took off.
Reinforcement learning is sometimes combined with supervised learning, the theory behind neural networks. However, supervised learning is based more on rewards and punishments, while reinforcement learning makes much more use of trial and error.