



A RAG optimization technique that compresses retrieved documents by extracting only the most relevant portions relative to the query. Reduces token usage and improves LLM response quality by removing irrelevant context.
Loading more......
Contextual Compression is a technique that improves RAG by compressing retrieved documents, extracting only the parts most relevant to the user's query. This reduces context length, lowers costs, and often improves answer quality.
Standard RAG retrieves full document chunks:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=vector_retriever
)
compressed_docs = compression_retriever.get_relevant_documents(query)
Advantages:
Costs:
Implementation-dependent. May add small latency but reduces LLM costs.