llama.cpp is an LLM inference library written in pure C/C++ with minimal external dependencies, started by Georgi Gerganov in early 2023 — and arguably the single biggest catalyst behind the local-LLM explosion. It runs anywhere, including on CPU, with optional GPU acceleration, and uses Quantization (the GGUF format) to bring large models down to commodity hardware. The ecosystem layered on top of it — Ollama, LM Studio, KoboldCpp, text-generation-webui — forms the substrate of today's 'I run an 8B model on my laptop' reality. What started as a side project is now the most widely deployed local-inference runtime; in 2026 it remains one of the open-source community's most influential creations.
External Links