• Home
  • Categories
  • Pricing
  • Submit
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies
    Decorative pattern
    Decorative pattern
    1. Home
    2. Research Papers & Surveys
    3. ConstBERT

    ConstBERT

    Novel approach to reduce storage footprint of multi-vector retrieval by encoding each document with a fixed, smaller set of learned embeddings. Reduces index sizes by over 50% compared to ColBERT while retaining most effectiveness.

    Surveys

    Loading more......

    Information

    Websitegithub.com
    PublishedMar 26, 2026

    Categories

    1 Item
    Research Papers & Surveys

    Tags

    3 Items
    #multi-vector#compression#colbert

    Similar Products

    6 result(s)

    Accelerating ANNS in Hierarchical Graphs via Shortcuts

    VLDB 2025 paper proposing efficient level navigation with shortcuts for accelerating approximate nearest neighbor search in hierarchical graph indexes, improving traversal speed across multi-layer graph structures.

    Accelerating Graph-based ANNS with Adaptive Awareness

    SIGKDD 2025 paper proposing adaptive awareness capabilities for graph-based approximate nearest neighbor search, enabling the search algorithm to dynamically adjust its strategy based on local graph characteristics and query properties.

    Accelerating Graph Indexing for ANNS on Modern CPUs

    SIGMOD 2025 paper proposing optimizations for graph-based approximate nearest neighbor search indexing on modern CPU architectures, leveraging SIMD instructions and cache-aware algorithms for improved index construction performance.

    ACORN

    ACORN is a performant and predicate-agnostic search system for vector embeddings and structured data, enhancing the capability of vector databases to handle complex queries over high-dimensional data efficiently.

    A Brief Survey of Vector Databases

    This survey paper provides an overview of the landscape, technologies, and applications of vector databases, making it a valuable resource for understanding the field.

    A Comprehensive Survey on Vector Database

    A comprehensive academic survey that explores the architecture, storage, retrieval techniques, and challenges associated with vector databases. It categorizes algorithmic approaches to approximate nearest neighbor search (ANNS) and discusses how vector databases can be integrated with large language models, offering valuable insights and foundational knowledge for understanding and building vector database systems.

    Overview

    ConstBERT is a novel approach to reduce the storage footprint of multi-vector retrieval by encoding each document with a fixed, smaller set of learned embeddings. ConstBERT compresses token-level BERT embeddings into a fixed number (C) of document-level vectors using a learned linear projection.

    Problem Addressed

    Multi-vector retrieval methods, exemplified by the ColBERT architecture, have shown substantial promise for retrieval. However, they come at a high cost in terms of storage since a (potentially compressed) vector needs to be stored for every token in the input collection.

    Key Benefits

    Storage Efficiency

    Unlike ColBERT, which scales linearly with the number of token embeddings per document, ConstBERT maintains a consistent index size by using a fixed number of embeddings. This efficiency extends across the BEIR datasets, with ConstBERT consistently reducing index sizes by over 50% compared to ColBERT at equivalent effectiveness.

    Better Memory Management

    Document representations become of a fixed size on disk, allowing for better OS paging management.

    Computational Efficiency

    Instead of iterating over dozens or hundreds of vectors per document, ConstBERT enables efficient late interaction scoring across a compact set of learned vectors. This results in lower query latency and better computational efficiency.

    Performance

    Through experiments using the MSMARCO passage corpus and BEIR with the ColBERT-v2 architecture, the research finds that passages can be effectively encoded into a fixed number of vectors while retaining most of the original effectiveness.

    Availability

    Code: https://github.com/pisa-engine/ConstBERT Presented at ECIR 2025 (European Conference on Information Retrieval) Published: April 2025

    Pricing

    Free and open-source research implementation.