Reinforcement learning is the paradigm where an agent learns a policy by interacting with an environment and updating its strategy based on a reward signal. Sutton and Barto's 1980s work laid the foundation; AlphaGo brought it to a global audience. In the LLM era, variants like RLHF and DPO play a central role in aligning models with human preferences. Recent Reasoning Models such as OpenAI o1 and DeepSeek R1 also pick up their long-form thinking ability through RL.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Intermediate · 1980
Reinforcement Learning
A machine learning paradigm in which an agent learns a policy by interacting with an environment and optimizing a reward signal.
- EN — English term
- Reinforcement Learning
- TR — Turkish term
- Pekiştirmeli Öğrenme