

Library built on Sentence Transformers for flexible training, inference, and retrieval with state-of-the-art ColBERT models. Features FastPLAID index for efficient multi-vector late interaction retrieval with 10x storage compression and sub-200ms latency.
Loading more......
PyLate is a library built on top of Sentence Transformers, designed to simplify and optimize fine-tuning, inference, and retrieval with state-of-the-art ColBERT models.
Unlike traditional bi-encoders that pool all token representations into a single one, ColBERT models retain all token representations and use late interaction (MaxSim) to compute query/document similarity.
PyLate has enabled the development of state-of-the-art models including GTE-ModernColBERT and Reason-ModernColBERT, demonstrating its practical utility for both research and production environments.
Available on PyPI:
pip install pylate
Released under MIT license.
Free and open-source.