Top-P (Nucleus) Sampling

Top-P, also called nucleus sampling, was proposed by Holtzman et al. in the 2019 paper 'The Curious Case of Neural Text Degeneration'. Unlike Top-K, it doesn't fix the candidate count — instead it forms the smallest set whose cumulative probability crosses a threshold P (say 0.9) and samples from that 'nucleus'. This adapts to the model's confidence: where the next-token distribution is sharp, the nucleus is small; where it is flat, the nucleus widens, which usually produces more natural-feeling output. In practice most teams tune Temperature and top-p together while leaving top-k disabled.