txtai
GitHub Repository
Category: SDKs & Libraries
Tags: open-source, semantic-search, vector-databases, ai
Description
txtai is an open-source, all-in-one AI framework for semantic search, LLM orchestration, and language model workflows. It provides an embeddings database that combines vector indexes (sparse and dense), graph networks, and relational databases, enabling advanced vector search and serving as a powerful knowledge source for large language model (LLM) applications.
Features
- Vector Search: Supports SQL, object storage, topic modeling, graph analysis, and multimodal indexing.
- Embeddings: Create embeddings for text, documents, audio, images, and video.
- LLM-Powered Pipelines: Run prompts, question-answering, labeling, transcription, translation, summarization, and more using language models.
- Workflows: Join pipelines together and aggregate business logic; supports both microservices and multi-model workflows.
- Autonomous Agents: Build agents that intelligently connect embeddings, pipelines, workflows, and other agents to solve complex problems.
- APIs: Web and Model Context Protocol (MCP) APIs; bindings available for JavaScript, Java, Rust, and Go.
- Batteries Included: Comes with sensible defaults for quick setup.
- Deployment: Can be run locally or scaled out using container orchestration.
- Integration: Built with Python 3.10+, integrates with Hugging Face Transformers, Sentence Transformers, and FastAPI.
- Model Support: Recommended models for tasks like embeddings, image captions, zero-shot/fixed labeling, LLMs, summarization, text-to-speech, transcription, and translation.
- Retrieval Augmented Generation (RAG): Enables RAG pipelines, including citation and advanced graph traversal for data retrieval.
- Semantic Search: Build search systems that understand natural language meaning, not just keywords.
- Language Model Workflows: Connects various language models for tasks such as summarization, transcription, and translation.
- Example Notebooks: Over 60 example notebooks and applications covering all major functionalities.
- Open Source: Licensed under Apache 2.0.
Use Cases
- Semantic/similarity/vector/neural search applications
- LLM orchestration and RAG (retrieval augmented generation)
- Knowledge base construction and querying
- Autonomous agent-based workflows
- Multimodal search (text, image, audio, video)
- Language model pipelines for QA, summarization, translation, etc.
Installation
- Install via pip:
pip install txtai
- Python 3.10+ required
- Optional dependencies and container support available
Pricing
- txtai is open-source and free to use under the Apache 2.0 license.
Documentation & Resources
Powered Applications
- rag: Retrieval Augmented Generation application
- ragdata: Knowledge base builder for RAG
- paperai: Semantic search and workflows for medical/scientific papers
- annotateai: Automatic annotation of papers with LLMs
License: Apache-2.0