



Components in LLM frameworks that fetch and parse data from various sources (PDFs, websites, databases) into a standardized format for processing. Essential first step in RAG pipelines for converting raw data into processable documents.
Loading more......
Document Loaders are components in LLM frameworks (LangChain, LlamaIndex, Haystack) that fetch data from various sources and convert it into a standardized document format for downstream processing in RAG pipelines.
Loaders typically output documents with:
{
"page_content": "The actual text content",
"metadata": {
"source": "file.pdf",
"page": 1,
"author": "...",
"created_at": "..."
}
}
Tools like Unstructured.io and LlamaParse:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("document.pdf")
documents = loader.load()
Most loaders are free and open-source. Commercial options: