PPR Engine Shootout β€” 11 Queries

Full national dataset Β· Stress queries target worst-case engine performance Β· 5 engines
πŸ† Zig 0.464s πŸ₯ˆ Polars 1.025s πŸ₯‰ DuckDB 1.472s 783,755 transactions Β· 92 MB Β· 2010 – May 2026

πŸ“‹ What This Benchmark Tests

This benchmark compares DuckDB, Polars, Pandas, Dask, and Zig against Ireland's Property Price Register β€” 783K residential property transactions, 92 MB, spanning 2010–May 2026 (fresh from the weekly cron job). Each engine runs 11 identical analytical queries under a 4-core constraint on a Proxmox LXC container (16 GiB RAM, 512 MB swap).

3 categories of stress: Standard aggregations (Q1–Q6: GROUP BY, window functions, string ops), compute-intensive (Q7–Q8: self-joins, YoY shifts via lag), and IO/hash-table intensive (Q9–Q11: 5D GROUP BY, quantile distributions, dense ranking). Designed to push each engine to its architectural limits β€” Python-boundary overhead, distributed shuffle costs, in-memory vs mmap tradeoffs.

Key takeaways: Zig dominates (0.46s) β€” compiled to a native binary with zero overhead. Polars leads Python engines (1.03s). DuckDB is 3Γ— slower than Zig but 8Γ— more memory efficient than Pandas. Dask is 24Γ— slower than Zig β€” at 92 MB, distributed compute is pure overhead. The memory watchdog prevented yesterday's swap-crash from recurring.

783,755
Transactions
92 MB
CSV Data
11
Queries
700K
Repeat-sale pairs
2010–2026
Date Range (fresh weekly cron)
16 GiB
RAM (LXC)
512 MB
Swap

11-Query Benchmark Β· Full national PPR (dashboard cron, fresh through May 2026)

RankEngineIngestQueriesTotalΓ— ZigPeak RSS
πŸ₯‡Zig0.305s0.158s0.464s1.0Γ—0.8 GiB
πŸ₯ˆPolars0.096s0.929s1.025s2.2Γ—0.5 GiB
πŸ₯‰DuckDB0.265s1.207s1.472s3.2Γ—0.1 GiB
Pandas1.373s3.177s4.550s9.8Γ—1.0 GiB
Dask1.085s10.082s11.168s24.1Γ—0.8 GiB

Total Time by Engine β€” 11 Queries

Ingest Time

Query Execution Time

Query Breakdown β€” Stress Profile

#TypeDescriptionStress Target
Q1–Q5StandardAvg by county, monthly volume, top-10 median, price histogram, countyΓ—year pivotBaseline aggregation
Q6String + aggAddress normalization + duplicate detectionString ops, Python boundary
Q7Self-joinPrice turnaround: first vs last sale per propertyCTE + self-join on 690K rows
Q8WindowYear-over-year price change per countyWindow shift/lag over partitions
Q95D GROUP BYCounty Γ— year Γ— quarter Γ— bucket Γ— size classHash table with ~75K+ groups
Q10QuantilePercentile distribution (P10–P90) per county/yearOrdered aggregation Γ— 5 quantiles
Q11Dense rankTop-5 and bottom-5 per county per yearWindow sort per partition

⚑ Key Findings

Zig dominates at 0.464s β€” 2.2Γ— faster than Polars and 3.2Γ— faster than DuckDB. All 11 queries run in 0.158s of pure computation. Zig compiles to a single binary with zero runtime overhead, no query planner, no garbage collector β€” just hash maps and arrays. Ingest is slower than Polars (0.305s vs 0.096s) because CSV parsing is hand-written; a faster parser would push Zig below 0.3s total.

Polars leads the Python engines at 1.025s β€” Rust-native CSV ingestion (0.096s) keeps it ahead of DuckDB. The query time gap (0.929s vs 1.207s) is narrow; Polars wins on faster parsing.

DuckDB dominates memory efficiency β€” 0.1 GiB RSS vs Polars' 0.5 GiB. DuckDB processes everything in a single in-process SQL engine with no intermediate copies. Pandas peaks at 1.0 GiB due to .copy() calls and intermediate DataFrames.

Dask is 24Γ— slower than Zig β€” 11.2s with the threaded scheduler. Even with processes=False, Dask's shuffle scheduler triggers on every GROUP BY β€” data is re-partitioned even within a single process. At 92 MB, Dask's distributed overhead is ~9s of pure tax. On a 10 GB+ dataset this would amortize; at this scale it's the wrong tool.

πŸ›‘οΈ Memory Safety β€” Lessons Learned (the hard way)

Yesterday's crash: Running Dask with 4 workers and no memory cap on a 16GiB LXC with 512MB swap β†’ OOM kill. Lost ~5 in-progress expanded queries that were never saved to disk.

Fixes applied to the benchmark script:

β€’ RLIMIT_AS set to 4Γ— physical RAM (not 1Γ— β€” DuckDB uses mmap for virtual address space, and a tight cap causes spurious "memory allocation of N bytes failed" crashes even with 14GiB free)

β€’ psutil watchdog checks RSS after every engine and aborts at 85% (MemoryError before swap)

β€’ Dask: 2 workers, memory_limit='4GiB' β€” kills a worker before it OOMs the machine

β€’ gc.collect() between engines to prevent memory accumulation

β€’ Backup first: always cp script.py script.py.bak.$(date +%Y%m%d_%H%M%S) before running. The expanded ~11-query version was lost when the machine swap-crashed.

πŸ“Š Query Stress Analysis

Q9 (5D GROUP BY) β€” DuckDB's hash table handles this natively. Polars' grouped aggregation is vectorized. Pandas creates a single wide table. Dask shuffles across workers.

Q10 (Percentiles) β€” DuckDB's QUANTILE_CONT is fastest (the Q7-CTE-style approach worked cleanly). Polars' .quantile() per group is slightly slower. Pandas uses lambda-per-group which is slower. Dask must compute to Pandas for quantiles.

Q11 (Dense rank) β€” All engines handle ranking well. DuckDB: DENSE_RANK() OVER(). Polars: .rank("dense").over(). Dask: .rank() triggers a shuffle. This query stresses the sort/partition path.