• Home
  • Categories
  • Pricing
  • Submit
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies
    Decorative pattern
    Decorative pattern
    1. Home
    2. Machine Learning Models
    3. CSRv2

    CSRv2

    Contrastive Sparse Representation learning approach for ultra-sparse embeddings that achieves 7x speedup over Matryoshka Representation Learning with 300x improvements in compute and memory efficiency.

    Surveys

    Loading more......

    Information

    Websitearxiv.org
    PublishedMar 24, 2026

    Categories

    1 Item
    Machine Learning Models

    Tags

    3 Items
    #sparse-embeddings#efficiency#research

    Similar Products

    6 result(s)

    RaDeR

    RaDeR (Reasoning-aware Dense Retrieval) is a research model specifically trained on datasets that require reasoning, enabling it to learn how to retrieve relevant theorems and principles during intermediate reasoning steps. This approach allows the retriever to better generalize to diverse reasoning-intensive retrieval tasks.

    MUVERA

    Multi-Vector Retrieval Algorithm that reduces multi-vector similarity search to single-vector similarity search via Fixed Dimensional Encodings. Achieves 10% improved recall with 90% lower latency compared to existing approaches.

    Featured

    Matryoshka Embeddings

    Representation learning approach encoding information at multiple granularities, allowing embeddings to be truncated while maintaining performance. Enables 14x smaller sizes and 5x faster search.

    Featured

    REAPER

    REAPER (Reasoning based Retrieval Planning for Complex RAG Systems) is a research framework that addresses multi-step retrieval planning in complex Retrieval-Augmented Generation scenarios. It enables retrieval systems to plan and execute reasoning-aware retrieval strategies rather than relying on simple similarity-based matching.

    Exploring Distributed Vector Databases Performance on HPC Platforms

    SC'25 Workshop paper characterizing Qdrant vector database performance on high-performance computing platforms, bridging AI and HPC workloads.

    The Novel Vector Database

    Research paper proposing a decoupled storage architecture for vector databases that improves update speed by 10.05x for insertions and 6.89x for deletions through innovative design.

    Overview

    CSRv2 is a principled training approach designed to make ultra-sparse embeddings viable, stabilizing sparsity learning through progressive k-annealing and enhancing representational quality via supervised contrastive objectives.

    Contrastive Sparse Representation (CSR)

    CSR (Contrastive Sparse Representation Learning) combines contrastive retrieval and reconstructive autoencoding objectives to preserve the original feature semantics. CSR takes a pretrained encoding model (with frozen weights), and trains a simple sparse autoencoder on top for mapping the pretrained dense embeddings into a sparse embedding with up to k non-zero elements (i.e., k-sparse).

    Key Benefits

    Extensive experiments on image, text, and multimodal benchmarks demonstrate that CSR consistently outperforms MRL in terms of both accuracy and retrieval speed—often by large margins—while also cutting training time to a fraction of that required by MRL. Under the same compute budget, CSR rivals MRL's performance by 9%, 15%, and 7% on ImageNet classification, MTEB text retrieval, and MS COCO retrieval, respectively.

    CSRv2 (2026)

    Major Improvements

    • CSRv2 effectively reduces dead neurons from >80% to ~20%
    • Delivers a 14% accuracy gain at k=2 compared to prior methods
    • Yields up to 300x improvements in compute and memory efficiency relative to dense embeddings and achieves a 7x speedup over Matryoshka Representation Learning (MRL)

    Performance

    CSRv2 with only 2 active dimensions matches the performance of Matryoshka Representation Learning (MRL) at 16 dimensions. CSRv2 achieves 7%/4% improvement over CSR when k=4 and further increases this gap to 14%/6% when k=2 in text/vision representation.

    The research was published at the 2026 International Conference on Learning Representations (ICLR).

    Pricing

    Open-source research implementation.