C/C++ engine for running LLMs on consumer hardware
llama.cpp is the foundational C/C++ project that enabled running large language models on consumer hardware. It implements efficient quantized inference (GGUF format) that runs LLMs on CPUs, Apple Silicon, and consumer GPUs. The project spawned an entire ecosystem of local AI tools and remains the performance baseline for edge LLM deployment.
No reviews yet. Be the first!