MFU (Model FLOPs Utilization) measures how much of a hardware platform's theoretical peak FLOPs a training run actually delivers. Google's 2022 PaLM paper made the metric widely known; before then 40-50% MFU was already considered strong on large-scale runs. High MFU signals well-tuned memory management, distributed parallelism strategy (TP, PP, DP, FSDP), communication overlap and data-loading pipeline; low MFU means millions of dollars of compute are leaking in real time. On modern frontier training, 50-60% MFU is considered excellent — and hitting it is itself a serious infrastructure engineering achievement.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Advanced · 2022
Model FLOPs Utilization (MFU)
How much of a model's theoretical peak FLOPs is actually delivered during real training — a key efficiency metric.
- EN — English term
- Model FLOPs Utilization (MFU)
- TR — Turkish term
- Model FLOPs Kullanımı (MFU)