



Neural reranking architecture that examines full query-document pairs simultaneously for deeper semantic understanding, achieving higher accuracy than bi-encoders at the cost of computational efficiency.
Cross-encoders are neural models that perform full-attention over input pairs, examining both the query and document simultaneously. They achieve higher performance than bi-encoders (like Sentence-BERT) but are more time-consuming.
Cross-encoders produce an output value between 0 and 1 indicating similarity of sentence pairs, but do not produce sentence embeddings. The model looks at both sentences at once, allowing the interpretation of one sentence to affect the other.
The BERT cross-encoder takes two sentences A and B separated by a [SEP] token as input, with a feedforward layer on top that outputs a similarity score.
Clustering 10,000 sentences with cross-encoders requires computing about 50 million sentence combinations (taking about 65 hours), while bi-encoders compute embeddings for each sentence in only 5 seconds.
Cross-encoders achieve higher performance than bi-encoders, however they do not scale well for large datasets.
In semantic search scenarios:
This gives you the speed of vector search + the accuracy of cross-encoders. You retrieve 50 chunks fast, rerank to find the best 5, and pass only high-quality context to the LLM.
Recent findings show that embeddings from earlier layers of cross-encoders can be used within information retrieval pipelines.
Various open-source implementations available (Sentence-Transformers, Hugging Face, etc.)
Loading more......