Progressive K-Annealing

Training technique in CSRv2 that stabilizes sparsity learning by gradually increasing sparsity constraints, reducing dead neurons from >80% to ~20%.

Visit Website

Overview

Progressive K-Annealing is a training technique used in CSRv2 that stabilizes sparsity learning by gradually increasing the sparsity constraint (reducing k) during training.

How It Works

Instead of starting with ultra-sparse representations (k=2 or k=4), the training begins with higher k values and progressively reduces k over the training process. This allows the model to first learn good dense representations before being compressed.

Benefits

Reduces dead neurons from >80% to ~20%
Delivers 14% accuracy gain at k=2 compared to prior methods
Enables stable training of ultra-sparse embeddings
Improves final model quality

Technical Details

The annealing schedule typically:

Starts with k=64 or k=128
Gradually reduces to target k (e.g., k=2 or k=4)
Uses smooth transitions to avoid training instability
Combines with supervised contrastive objectives

Results

CSRv2 with progressive k-annealing achieves up to 300x improvements in compute and memory efficiency relative to dense embeddings and 7x speedup over Matryoshka Representation Learning.

Pricing

Research technique, open-source implementation.

Surveys

Loading more......

Information

Websitearxiv.org

PublishedMar 24, 2026

Overview

Progressive K-Annealing is a training technique used in CSRv2 that stabilizes sparsity learning by gradually increasing the sparsity constraint (reducing k) during training.

How It Works

Benefits

Reduces dead neurons from >80% to ~20%
Delivers 14% accuracy gain at k=2 compared to prior methods
Enables stable training of ultra-sparse embeddings
Improves final model quality

Technical Details

The annealing schedule typically:

Starts with k=64 or k=128
Gradually reduces to target k (e.g., k=2 or k=4)
Uses smooth transitions to avoid training instability
Combines with supervised contrastive objectives

Results

CSRv2 with progressive k-annealing achieves up to 300x improvements in compute and memory efficiency relative to dense embeddings and 7x speedup over Matryoshka Representation Learning.

Pricing

Research technique, open-source implementation.

Progressive K-Annealing

Overview

How It Works

Benefits

Technical Details

Results

Pricing

Information

Categories

Tags

Similar Products

Connect with us

Stay Updated

Product

Clients

Company

Resources

Progressive K-Annealing

Overview

How It Works

Benefits

Technical Details

Results

Pricing

Information

Categories

Tags

Similar Products