Model FLOPs Utilization (MFU)

MFU (Model FLOPs Utilization) measures how much of a hardware platform's theoretical peak FLOPs a training run actually delivers. Google's 2022 PaLM paper made the metric widely known; before then 40-50% MFU was already considered strong on large-scale runs. High MFU signals well-tuned memory management, distributed parallelism strategy (TP, PP, DP, FSDP), communication overlap and data-loading pipeline; low MFU means millions of dollars of compute are leaking in real time. On modern frontier training, 50-60% MFU is considered excellent — and hitting it is itself a serious infrastructure engineering achievement.