Ollama

Ollama is a tool built on top of llama.cpp that reduces running an LLM locally to a single command — 'ollama run llama3' downloads the model and serves it as a REST API. With a Docker-like user experience, it brought the local-LLM ecosystem into the mainstream and pushed the 'let's try a small model' barrier to near zero. Its model library is large — Llama 3, Mistral, Qwen, Phi, DeepSeek and many specialised fine-tunes — and an OpenAI-compatible API layer makes integration with existing tools trivial. It runs on macOS, Linux and Windows, and is one of the most visible faces of the On-Device LLM era.