FlashRank

Ultra-lite and super-fast Python reranking library based on SoTA cross-encoders and LLMs, running on CPU with the tiniest reranking model in the world at ~4MB with no PyTorch dependency.

Visit Website

Overview

FlashRank is an ultra-lite and super-fast Python library to add re-ranking to your existing search & retrieval pipelines. It is based on SoTA LLMs and cross-encoders, created by Prithiviraj Damodaran.

Key Features

Lightweight Design

No Torch or Transformers needed
Runs on CPU
Boasts the tiniest reranking model in the world, ~4MB
ONNX-optimized for very fast performance on CPU

Model Support

Supports SoTA Listwise and Pairwise reranking:

Cross-encoder based pairwise/pointwise rerankers (Max tokens = 512)
LLM-based listwise rerankers (Max tokens = 8192)

Performance Benefits

Designed as a very lightweight and fast reranking library
Leverages smaller, optimized transformer models (often distilled or pruned versions)
Lowest $ per invocation for serverless deployments
Shorter cold start times and quicker re-deployments
Smaller package size reduces Lambda/serverless costs

Integration

FlashRank integrates with various frameworks including:

LangChain
The rerankers library
Custom search pipelines

Use Cases

Improving search relevance in RAG systems
Re-ranking retrieval results
Production deployments where cost and latency matter
Serverless and edge computing environments

Pricing

Free and open-source, available on GitHub and PyPI.

Surveys

Loading more......

Information

Websitegithub.com

PublishedMar 13, 2026

Tags

3 Items

#reranking #lightweight #open-source

Similar Products

ClickHouse

ClickHouse is a columnar OLAP database with vector indexes (ANN via AMM, brute-force), supporting SQL queries over vectors + structured data at petabyte scale. Excels in aggregations with vectors. For analytics workloads with embeddings; faster ingestion than Postgres pgvector for big data.

000

Meilisearch

Open-source search engine with support for vector and hybrid search for fast semantic retrieval.

000

embedded-vector-db

Lightweight Node.js library for low-latency on-device vector similarity search using HNSW and BM25 hybrid, with CRUD, metadata filtering, and persistence for edge RAG pipelines. Enables real-time semantic search without servers; more lightweight than cloud Qdrant.

000

nano-vectordb-rs

Minimal Rust library for fast on-device cosine similarity search with Rayon parallelism and embedded persistence, ideal for low-latency prototyping on edge hardware. Supports quick inserts/queries for real-time AI; lighter than full DBs like Qdrant edge.

000

tinyvector

Pure Rust embedding database as lightweight Axum server for low-latency on-device vector search scaling to 100M+ vectors in memory. High accuracy/speed for edge RAG; simpler than Qdrant edge.

000

ChromaDB

Chroma is an open-source embedding database optimized for LLM apps, with in-memory/persistent storage and simple Python API. Features: HNSW indexing, automatic batching, metadata filtering, integrations with LangChain/LlamaIndex. Ideal for local dev, prototyping RAG; vs pgvector, easier for Python users; vs full DBs like Milvus, lighter but less scalable.

000

Overview

Key Features

Lightweight Design

No Torch or Transformers needed
Runs on CPU
Boasts the tiniest reranking model in the world, ~4MB
ONNX-optimized for very fast performance on CPU

Model Support

Supports SoTA Listwise and Pairwise reranking:

Cross-encoder based pairwise/pointwise rerankers (Max tokens = 512)
LLM-based listwise rerankers (Max tokens = 8192)

Performance Benefits

Designed as a very lightweight and fast reranking library
Leverages smaller, optimized transformer models (often distilled or pruned versions)
Lowest $ per invocation for serverless deployments
Shorter cold start times and quicker re-deployments
Smaller package size reduces Lambda/serverless costs

Integration

FlashRank integrates with various frameworks including:

LangChain
The rerankers library
Custom search pipelines

Use Cases

Improving search relevance in RAG systems
Re-ranking retrieval results
Production deployments where cost and latency matter
Serverless and edge computing environments

Pricing

Free and open-source, available on GitHub and PyPI.

FlashRank

Overview

Key Features

Lightweight Design

Model Support

Performance Benefits

Integration

Use Cases

Pricing

Information

Categories

Tags

Similar Products

Connect with us

Stay Updated

Product

Clients

Company

Resources

FlashRank

Overview

Key Features

Lightweight Design

Model Support

Performance Benefits

Integration

Use Cases

Pricing

Information

Categories

Tags

Similar Products