llama.cpp

C/C++ engine for running LLMs on consumer hardware

toolFree

Infrastructure#open-source#local#inference#quantization

llama.cpp is the foundational C/C++ project that enabled running large language models on consumer hardware. It implements efficient quantized inference (GGUF format) that runs LLMs on CPUs, Apple Silicon, and consumer GPUs. The project spawned an entire ecosystem of local AI tools and remains the performance baseline for edge LLM deployment.

Visit Website →GitHub

1 views0 clicksAdded 3/14/2026

Reviews

No reviews yet. Be the first!

Loading reviews...