Databench
Benchmark iterations by task for each benchmark suite.
Full Stats (Row count: 1)
Operations per second (higher is better)
Full Stats (Row count: 10)
Operations per second (higher is better)
Full Stats (Row count: 100)
Operations per second (higher is better)
Full Stats (Row count: 1000)
Operations per second (higher is better)
Full Stats (Row count: 10000)
Operations per second (higher is better)
Full Stats (Row count: 100000)
Operations per second (higher is better)
Full Stats (Row count: 1000000)
Operations per second (higher is better)
Parse (Row count: 1)
Operations per second (higher is better)
Parse (Row count: 10)
Operations per second (higher is better)
Parse (Row count: 100)
Operations per second (higher is better)
Parse (Row count: 1000)
Operations per second (higher is better)
Parse (Row count: 10000)
Operations per second (higher is better)
Parse (Row count: 100000)
Operations per second (higher is better)
Parse (Row count: 1000000)
Operations per second (higher is better)
Takeaways:
- Raw JavaScript is the fastest on small datasets. Somewhere between 1000 and 10k rows, Rust-based options become markedly faster.
- Apache Datafusion was hard to use, but was lightning fast on massive datasets. On 1M rows, it parsed 25x faster and ran statistical operations almost 2000x faster than raw JS.
- Apache Arrow-based solutions are pretty great for large datasets.
- JS-based dataframe libraries cannot handle large datasets well.
- DuckDB was sort of middle of the pack in most cases. It was nice to use though!