DataFusion
A general-purpose analytical engine with built-in vector processing capabilities, excelling at traditional analytical workloads and efficient handling of vector operations. It is an example of a vector engine.
About this tool
Apache DataFusion
Apache DataFusion is an extensible query engine written in Rust that uses Apache Arrow as its in-memory format. It provides libraries and binaries for developers to build fast and feature-rich database and analytic systems customized to particular workloads.
Features
- Extensible Query Engine: Written in Rust, utilizing Apache Arrow as its in-memory format.
- APIs: Offers both SQL and DataFrame APIs.
- Performance: Excellent performance, as highlighted by benchmarks.
- Built-in Format Support: Supports CSV, Parquet, JSON, and Avro data formats out of the box.
- Customization: Extensive customization options, allowing for additional data sources, query languages, functions, and custom operators.
- Query Planner: Features a full query planner.
- Execution Engine: Includes a columnar, streaming, multi-threaded, vectorized execution engine.
- Partitioned Data Sources: Supports partitioned data sources.
- Python Bindings: Python Bindings are available for integration.
Related Subprojects
- DataFusion Python: Provides a Python interface for SQL and DataFrame queries.
- DataFusion Ray: Offers a distributed version of DataFusion that scales out on Ray clusters.
- DataFusion Comet: An accelerator for Apache Spark based on DataFusion.
Pricing
As an Apache Software Foundation project, DataFusion is open-source and free to use.
Loading more......
Information
Categories
Similar Products
6 result(s)A general-purpose analytical engine with built-in vector processing capabilities, excelling at traditional analytical workloads and efficient handling of vector operations. It is an example of a vector engine.
Qdrant is an open‑source vector database designed for high‑performance similarity search and AI applications such as RAG, recommendation systems, advanced semantic search, anomaly detection, and AI agents. It provides scalable storage and retrieval of vector embeddings with features like filtering, hybrid search, and production‑grade APIs for integrating with machine learning workloads.
A distributed vector database designed for scalable and efficient vector similarity search. It is purpose-built for handling large-scale vector data and search workloads.
ClickHouse is an open-source column-oriented database that supports vectorized computation and now offers vector search features. Its architecture enables efficient real-time analytics and vector operations, making it a relevant choice for vector database use cases.
Cottontail DB is an open-source vector database for storing and searching high-dimensional data, with features geared towards research and production environments.
Trieve provides an all-in-one infrastructure for vector search, recommendations, retrieval-augmented generation (RAG), and analytics, accessible via API for seamless integration.