This benchmark compares DuckDB, Polars, Pandas, Dask, and Zig against Ireland's Property Price Register β 783K residential property transactions, 92 MB, spanning 2010βMay 2026 (fresh from the weekly cron job). Each engine runs 11 identical analytical queries under a 4-core constraint on a Proxmox LXC container (16 GiB RAM, 512 MB swap).
3 categories of stress: Standard aggregations (Q1βQ6: GROUP BY, window functions, string ops), compute-intensive (Q7βQ8: self-joins, YoY shifts via lag), and IO/hash-table intensive (Q9βQ11: 5D GROUP BY, quantile distributions, dense ranking). Designed to push each engine to its architectural limits β Python-boundary overhead, distributed shuffle costs, in-memory vs mmap tradeoffs.
Key takeaways: Zig dominates (0.46s) β compiled to a native binary with zero overhead. Polars leads Python engines (1.03s). DuckDB is 3Γ slower than Zig but 8Γ more memory efficient than Pandas. Dask is 24Γ slower than Zig β at 92 MB, distributed compute is pure overhead. The memory watchdog prevented yesterday's swap-crash from recurring.
| Rank | Engine | Ingest | Queries | Total | Γ Zig | Peak RSS |
|---|---|---|---|---|---|---|
| π₯ | Zig | 0.305s | 0.158s | 0.464s | 1.0Γ | 0.8 GiB |
| π₯ | Polars | 0.096s | 0.929s | 1.025s | 2.2Γ | 0.5 GiB |
| π₯ | DuckDB | 0.265s | 1.207s | 1.472s | 3.2Γ | 0.1 GiB |
| Pandas | 1.373s | 3.177s | 4.550s | 9.8Γ | 1.0 GiB | |
| Dask | 1.085s | 10.082s | 11.168s | 24.1Γ | 0.8 GiB |
| # | Type | Description | Stress Target |
|---|---|---|---|
| Q1βQ5 | Standard | Avg by county, monthly volume, top-10 median, price histogram, countyΓyear pivot | Baseline aggregation |
| Q6 | String + agg | Address normalization + duplicate detection | String ops, Python boundary |
| Q7 | Self-join | Price turnaround: first vs last sale per property | CTE + self-join on 690K rows |
| Q8 | Window | Year-over-year price change per county | Window shift/lag over partitions |
| Q9 | 5D GROUP BY | County Γ year Γ quarter Γ bucket Γ size class | Hash table with ~75K+ groups |
| Q10 | Quantile | Percentile distribution (P10βP90) per county/year | Ordered aggregation Γ 5 quantiles |
| Q11 | Dense rank | Top-5 and bottom-5 per county per year | Window sort per partition |
Zig dominates at 0.464s β 2.2Γ faster than Polars and 3.2Γ faster than DuckDB. All 11 queries run in 0.158s of pure computation. Zig compiles to a single binary with zero runtime overhead, no query planner, no garbage collector β just hash maps and arrays. Ingest is slower than Polars (0.305s vs 0.096s) because CSV parsing is hand-written; a faster parser would push Zig below 0.3s total.
Polars leads the Python engines at 1.025s β Rust-native CSV ingestion (0.096s) keeps it ahead of DuckDB. The query time gap (0.929s vs 1.207s) is narrow; Polars wins on faster parsing.
DuckDB dominates memory efficiency β 0.1 GiB RSS vs Polars' 0.5 GiB. DuckDB processes everything in a single in-process SQL engine with no intermediate copies. Pandas peaks at 1.0 GiB due to .copy() calls and intermediate DataFrames.
Dask is 24Γ slower than Zig β 11.2s with the threaded scheduler. Even with processes=False, Dask's shuffle scheduler triggers on every GROUP BY β data is re-partitioned even within a single process. At 92 MB, Dask's distributed overhead is ~9s of pure tax. On a 10 GB+ dataset this would amortize; at this scale it's the wrong tool.
Yesterday's crash: Running Dask with 4 workers and no memory cap on a 16GiB LXC with 512MB swap β OOM kill. Lost ~5 in-progress expanded queries that were never saved to disk.
Fixes applied to the benchmark script:
β’ RLIMIT_AS set to 4Γ physical RAM (not 1Γ β DuckDB uses mmap for virtual address space, and a tight cap causes spurious "memory allocation of N bytes failed" crashes even with 14GiB free)
β’ psutil watchdog checks RSS after every engine and aborts at 85% (MemoryError before swap)
β’ Dask: 2 workers, memory_limit='4GiB' β kills a worker before it OOMs the machine
β’ gc.collect() between engines to prevent memory accumulation
β’ Backup first: always cp script.py script.py.bak.$(date +%Y%m%d_%H%M%S) before running. The expanded ~11-query version was lost when the machine swap-crashed.
Q9 (5D GROUP BY) β DuckDB's hash table handles this natively. Polars' grouped aggregation is vectorized. Pandas creates a single wide table. Dask shuffles across workers.
Q10 (Percentiles) β DuckDB's QUANTILE_CONT is fastest (the Q7-CTE-style approach worked cleanly). Polars' .quantile() per group is slightly slower. Pandas uses lambda-per-group which is slower. Dask must compute to Pandas for quantiles.
Q11 (Dense rank) β All engines handle ranking well. DuckDB: DENSE_RANK() OVER(). Polars: .rank("dense").over(). Dask: .rank() triggers a shuffle. This query stresses the sort/partition path.