LoRA, published by Microsoft Research in 2021, is a parameter-efficient adaptation technique that reshaped how teams approach Fine-tuning. It freezes the original model weights and trains only small, 'low-rank' matrices added alongside them, slashing both the number of trainable parameters and the GPU memory required. This made it suddenly feasible to specialise 7B-70B parameter models on consumer-grade hardware; the follow-up QLoRA later pushed costs down even further with 4-bit quantisation. Because LoRA adaptors are small and portable, it is operationally easy to keep dozens of specialised variants alive around a single base model.
External Links