DataFusion
A general-purpose analytical engine with built-in vector processing capabilities, excelling at traditional analytical workloads and efficient handling of vector operations. It is an example of a vector engine.
About this tool
Apache DataFusion
Apache DataFusion is an extensible query engine written in Rust that uses Apache Arrow as its in-memory format. It provides libraries and binaries for developers to build fast and feature-rich database and analytic systems customized to particular workloads.
Features
- Extensible Query Engine: Written in Rust, utilizing Apache Arrow as its in-memory format.
- APIs: Offers both SQL and DataFrame APIs.
- Performance: Excellent performance, as highlighted by benchmarks.
- Built-in Format Support: Supports CSV, Parquet, JSON, and Avro data formats out of the box.
- Customization: Extensive customization options, allowing for additional data sources, query languages, functions, and custom operators.
- Query Planner: Features a full query planner.
- Execution Engine: Includes a columnar, streaming, multi-threaded, vectorized execution engine.
- Partitioned Data Sources: Supports partitioned data sources.
- Python Bindings: Python Bindings are available for integration.
Related Subprojects
- DataFusion Python: Provides a Python interface for SQL and DataFrame queries.
- DataFusion Ray: Offers a distributed version of DataFusion that scales out on Ray clusters.
- DataFusion Comet: An accelerator for Apache Spark based on DataFusion.
Pricing
As an Apache Software Foundation project, DataFusion is open-source and free to use.