



Open-source library for preprocessing unstructured documents (PDFs, Word, HTML, images) for RAG and LLM applications. Handles extraction, chunking, and cleaning of diverse document types.
Unstructured is an open-source library for preprocessing unstructured documents into formats suitable for RAG and LLM applications.
Extraction:
Processing:
Open-source Python library
Managed API service
Loading more......