DataFusion

A general-purpose analytical engine with built-in vector processing capabilities, excelling at traditional analytical workloads and efficient handling of vector operations. It is an example of a vector engine.

About this tool

Apache DataFusion

Apache DataFusion is an extensible query engine written in Rust that uses Apache Arrow as its in-memory format. It provides libraries and binaries for developers to build fast and feature-rich database and analytic systems customized to particular workloads.

Features

  • Extensible Query Engine: Written in Rust, utilizing Apache Arrow as its in-memory format.
  • APIs: Offers both SQL and DataFrame APIs.
  • Performance: Excellent performance, as highlighted by benchmarks.
  • Built-in Format Support: Supports CSV, Parquet, JSON, and Avro data formats out of the box.
  • Customization: Extensive customization options, allowing for additional data sources, query languages, functions, and custom operators.
  • Query Planner: Features a full query planner.
  • Execution Engine: Includes a columnar, streaming, multi-threaded, vectorized execution engine.
  • Partitioned Data Sources: Supports partitioned data sources.
  • Python Bindings: Python Bindings are available for integration.

Related Subprojects

  • DataFusion Python: Provides a Python interface for SQL and DataFrame queries.
  • DataFusion Ray: Offers a distributed version of DataFusion that scales out on Ray clusters.
  • DataFusion Comet: An accelerator for Apache Spark based on DataFusion.

Pricing

As an Apache Software Foundation project, DataFusion is open-source and free to use.

Information

PublisherFox
PublishedJul 1, 2025

Categories

1 item