



Local LLM inference engine supporting GGUF models with hardware acceleration on Metal, CUDA, ANE, WebGPU. Features Flash Attention, MicroLoRA, RoPE, quantization (Q4-Q8, π-Quantization), MoE routing, and streaming tokens for browser and edge deployment.
ruvllm enables local AI without cloud APIs.
npm install @ruvector/ruvllm
npm install @ruvector/ruvllm-wasm
cargo add ruvllm
Free and open-source.
Loading more......