Reinforcement Learning

A machine learning paradigm in which an agent learns a policy by interacting with an environment and optimizing a reward signal.

EN — English term: Reinforcement Learning
TR — Turkish term: Pekiştirmeli Öğrenme

Reinforcement learning is the paradigm where an agent learns a policy by interacting with an environment and updating its strategy based on a reward signal. Sutton and Barto's 1980s work laid the foundation; AlphaGo brought it to a global audience. In the LLM era, variants like RLHF and DPO play a central role in aligning models with human preferences. Recent Reasoning Models such as OpenAI o1 and DeepSeek R1 also pick up their long-form thinking ability through RL.