Late Interaction

Retrieval paradigm where query and document tokens are encoded separately and interactions computed at search time, combining efficiency of bi-encoders with expressiveness of cross-encoders.

Visit Website

Surveys

Loading more......

Information

Websitearxiv.org

PublishedMar 13, 2026

Tags

3 Items

#retrieval #colbert #neural-search

Similar Products

Multi-Vector Embeddings

Embedding approach where documents/images are represented by multiple vectors (one per token/patch) rather than a single vector, enabling fine-grained semantic matching.

000

Dense Retrieval

An information retrieval approach using dense vector representations (embeddings) to encode queries and documents. Unlike sparse methods like BM25, dense retrieval captures semantic meaning in continuous vector spaces, enabling neural search and forming the foundation of modern RAG systems.

000

Late Interaction Retrieval

A retrieval paradigm where query and document encodings are kept separate until a late interaction stage, enabling more expressive and efficient similarity computations. Pioneered by ColBERT and extended by ColPali and ColQwen, this approach maintains fine-grained representations while enabling fast retrieval.

000

ColBERT

State-of-the-art late interaction retrieval model that produces multi-vector token-level representations, enabling efficient and effective passage search with rich contextual understanding.

000

ASMR Technique

Agentic Search and Memory Retrieval technique by Supermemory using parallel reader agents and search agents that achieved ~99% accuracy on LongMemEval benchmark.

000

Cascading Retrieval

Advanced retrieval approach combining dense vectors, sparse vectors, and reranking in a multi-stage pipeline, achieving up to 48% better performance than single-method retrieval.

000

How Late Interaction Works

Encoding Phase

Encode query into multiple token vectors (e.g., 32 tokens → 32 vectors)

Encode document into multiple token vectors (e.g., 180 tokens → 180 vectors)

Store document vectors in index

Retrieval Phase

For each query token vector, find its maximum similarity with any document token vector

Sum these maximum similarities across all query tokens

This computes the query-document relevance score

Key Advantages

Better Than Bi-Encoders

Captures fine-grained token-level interactions

Higher accuracy for complex queries

Better handling of multi-aspect queries

Faster Than Cross-Encoders

Pre-compute document representations

No need to encode query-document pairs at search time

Can leverage vector search infrastructure

Late Interaction

Information

Categories

Tags

Similar Products

Connect with us

Stay Updated

Product

Clients

Company

Resources

Late Interaction

Information

Categories

Tags

Similar Products

Overview

How Late Interaction Works

Encoding Phase

Retrieval Phase

Key Advantages

Better Than Bi-Encoders

Faster Than Cross-Encoders

ColBERT: Popular Implementation

Performance Characteristics

Modern Applications

Storage Considerations

Use Cases

Pricing