ruvllm

Local LLM inference engine supporting GGUF models with hardware acceleration on Metal, CUDA, ANE, WebGPU. Features Flash Attention, MicroLoRA, RoPE, quantization (Q4-Q8, π-Quantization), MoE routing, and streaming tokens for browser and edge deployment.

Visit Website

Overview

ruvllm enables local AI without cloud APIs.

Features

ruvllm-wasm: Browser WASM runtime (435 KB)
Hardware: Metal/CUDA/ANE/WebGPU
Quantization: Q4, Q5, Q8, π-Quantization (2-bit weights)
MoE Memory-Aware Routing
Streaming tokens, KV cache

Installation

npm install @ruvector/ruvllm
npm install @ruvector/ruvllm-wasm
cargo add ruvllm

Pricing

Free and open-source.

Surveys

Loading more......

Information

Websitecrates.io

PublishedApr 7, 2026

Tags

4 Items

#llm-inference #Wasm #Quantization #Open Source

Similar Products

ruvllm-wasm

Browser-based LLM inference using WebGPU for RuVector ecosystem, enabling lightweight AI model execution in WASM environments.

000

VectorChord

PostgreSQL extension for scalable, high-performance vector search, successor to pgvecto.rs. Features RaBitQ quantization enabling 6x cost savings vs Pinecone. Fully compatible with pgvector. This is an OSS extension.

000

micro-hnsw-wasm

WASM library for brain-inspired neuromorphic HNSW vector search in 11.8KB. Optimized for edge devices with spiking neurons for energy-efficient similarity search.

000

ClickHouse

ClickHouse is a columnar OLAP database with vector indexes (ANN via AMM, brute-force), supporting SQL queries over vectors + structured data at petabyte scale. Excels in aggregations with vectors. For analytics workloads with embeddings; faster ingestion than Postgres pgvector for big data.

000

RuVector

Self-optimizing on-device vector database with HNSW, graph RAG, and WASM deployment for low-latency edge AI ops across browsers/IoT/mobile. Supports real-time self-learning retrieval; lighter and offline vs cloud Qdrant.

000

Victor

Web-optimized Rust vector DB for low-latency on-device storage/search via WASM, with efficient formats and PCA compression for browsers/edge. Supports JS/Rust APIs; compact vs cloud Qdrant.

000

ruvllm

Overview

Features

Installation

Pricing

Information

Categories

Tags

Similar Products

Connect with us

Stay Updated

Product

Clients

Company

Resources

ruvllm

Overview

Features

Installation

Pricing

Information

Categories

Tags

Similar Products