



Single-file executable that bundles LLM weights and llama.cpp runtime. Distribute and run LLMs locally with no installation, including embedding generation via built-in server.
Loading more......
llamafile lets you distribute and run LLMs with a single file. It combines llama.cpp with Cosmopolitan Libc into one framework that collapses LLM complexity into a single-file executable that runs locally on most computers with no installation.
Start embeddings server:
./model.llamafile --server --nobrowser --embedding
Available embedding models:
LlamafileEmbeddings classGitHub: mozilla-ai/llamafile
./model.llamafilePre-built llamafiles available for:
Free and open-source: