Back to Charts

Databench

Benchmark iterations by task for each benchmark suite.

Full Stats (Row count: 1)

Operations per second (higher is better)

Full Stats (Row count: 10)

Operations per second (higher is better)

Full Stats (Row count: 100)

Operations per second (higher is better)

Full Stats (Row count: 1000)

Operations per second (higher is better)

Full Stats (Row count: 10000)

Operations per second (higher is better)

Full Stats (Row count: 100000)

Operations per second (higher is better)

Full Stats (Row count: 1000000)

Operations per second (higher is better)

Parse (Row count: 1)

Operations per second (higher is better)

Parse (Row count: 10)

Operations per second (higher is better)

Parse (Row count: 100)

Operations per second (higher is better)

Parse (Row count: 1000)

Operations per second (higher is better)

Parse (Row count: 10000)

Operations per second (higher is better)

Parse (Row count: 100000)

Operations per second (higher is better)

Parse (Row count: 1000000)

Operations per second (higher is better)

Takeaways:

  • Raw JavaScript is the fastest on small datasets. Somewhere between 1000 and 10k rows, Rust-based options become markedly faster.
  • Apache Datafusion was hard to use, but was lightning fast on massive datasets. On 1M rows, it parsed 25x faster and ran statistical operations almost 2000x faster than raw JS.
  • Apache Arrow-based solutions are pretty great for large datasets.
  • JS-based dataframe libraries cannot handle large datasets well.
  • DuckDB was sort of middle of the pack in most cases. It was nice to use though!