• Home
  • Categories
  • Tags
  • Pricing
  • Submit
    Decorative pattern
    1. Home
    2. Tools
    3. Unstructured

    Unstructured

    Open-source library for preprocessing unstructured documents (PDFs, Word, HTML, images) for RAG and LLM applications. Handles extraction, chunking, and cleaning of diverse document types.

    🌐Visit Website

    About this tool

    Overview

    Unstructured is an open-source library for preprocessing unstructured documents into formats suitable for RAG and LLM applications.

    Supported Formats

    • PDF
    • Word (DOCX)
    • HTML
    • Images (with OCR)
    • PowerPoint
    • Markdown
    • Email (MSG, EML)
    • CSV/Excel

    Features

    Extraction:

    • Text extraction
    • Table detection
    • Image handling
    • Layout analysis

    Processing:

    • Semantic chunking
    • Metadata extraction
    • Element classification
    • Cleaning and normalization

    Use Cases

    • RAG document ingestion
    • Knowledge base building
    • Document search indexing
    • Data pipeline preprocessing

    Integration

    • LangChain
    • LlamaIndex
    • Haystack
    • Custom pipelines

    Availability

    Open-source Python library

    Managed API service

    Surveys

    Loading more......

    Information

    Websiteunstructured.io
    PublishedMar 20, 2026

    Categories

    1 Item
    Tools

    Tags

    4 Items
    #Document Processing#Etl#Rag#Open Source

    Similar Products

    6 result(s)
    ARES

    RAG evaluation framework that trains lightweight judges for retrieval and generation scoring, refining evaluation by training specialized LLM judges on synthetic datasets to provide more reliable, confidence-aware judgments.

    BGE Reranker Base

    Open-source cross-encoder reranking model from BAAI that enhances RAG retrieval quality by examining query-document pairs individually. Self-hostable with Apache 2.0 licensing for cost-effective production deployments.

    ARES

    Automatic RAG Evaluation System - a framework for assessing RAG system quality through automated evaluation of retrieval relevance and generation accuracy without human labels.

    Agentic RAG
    Featured

    An advanced RAG architecture where an AI agent autonomously decides which questions to ask, which tools to use, when to retrieve information, and how to aggregate results. Represents a major trend in 2026 for more intelligent and adaptive retrieval systems.

    BGE-VL
    Featured

    State-of-the-art multimodal embedding model from BAAI supporting text-to-image, image-to-text, and compositional visual search. Trained on the MegaPairs dataset with over 26 million retrieval triplets.

    Apache Cassandra Vector Search
    Featured

    Distributed NoSQL database with vector search capabilities via Storage-Attached Indexes (SAI) in Cassandra 5.0+. Uses Lucene HNSW for approximate nearest neighbor search. This is an OSS database under Apache 2.0 license.

    Decorative pattern
    Built with
    Ever Works
    Ever Works

    Connect with us

    Stay Updated

    Get the latest updates and exclusive content delivered to your inbox.

    Product

    • Categories
    • Tags
    • Pricing
    • Help

    Clients

    • Sign In
    • Register
    • Forgot password?

    Company

    • About Us
    • Admin
    • Sitemap

    Resources

    • Blog
    • Submit
    • API Documentation
    All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this repository, related repositories, and associated websites are for identification purposes only. The use of these names, logos, and brands does not imply endorsement, affiliation, or sponsorship. This directory may include content generated by artificial intelligence.
    Copyright © 2025 Awesome Vector Databases. All rights reserved.·Terms of Service·Privacy Policy·Cookies