



A RAG technique that indexes small chunks for precise matching but retrieves larger parent documents for LLM context. Balances retrieval precision with comprehensive context by separating indexing granularity from context size.
Parent Document Retriever is a RAG technique that separates what you index from what you retrieve. It indexes small, focused chunks for precise matching but returns larger parent documents to provide comprehensive context to the LLM.
Large Chunks:
Small Chunks:
When a small chunk matches, return its entire parent document.
Parent Doc: [A B C D E F G H I J]
↓ split into
Child Chunks: [A B] [C D] [E F] [G H] [I J]
↓ embed and index
Vector DB: stores child chunk embeddings with parent doc IDs
Query → matches [C D]
↓ retrieve parent
Return: [A B C D E F G H I J]
from langchain.retrievers import ParentDocumentRetriever
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Child splitter (for indexing)
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)
# Parent splitter (for retrieval)
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
retriever = ParentDocumentRetriever(
vectorstore=vectorstore,
docstore=docstore,
child_splitter=child_splitter,
parent_splitter=parent_splitter,
)
Advantages:
Costs:
Implementation-dependent. Requires vector DB + document store (can use same database).
Loading more......