Skip to content

Instantly share code, notes, and snippets.

@lmmx
Created August 12, 2025 20:14
Show Gist options
  • Save lmmx/06483e614e5c5d826cb074031d16f109 to your computer and use it in GitHub Desktop.
Save lmmx/06483e614e5c5d826cb074031d16f109 to your computer and use it in GitHub Desktop.
Polars profiling
import polars as pl
lf = pl.LazyFrame({"a": [*range(1_000_000)], "b": [*range(1_000_000)]})
perm = lf.with_columns(pl.col("a").shuffle()).with_columns((pl.col("a") * pl.col("b")).alias("c"))
agg = (perm.sort(by="a").group_by_dynamic("a", every="10000i", period="50000i")
.agg([pl.col("b").sum().alias("b_sum"),
pl.col("c").std().alias("c_std"),
pl.col("c").quantile(0.95).alias("c_p95")
]))
result, profile_df = agg.profile()
# Add duration and percentage calculations
timings = profile_df.with_columns([
(pl.col("end") - pl.col("start")).alias("duration")
]).with_columns([
(pl.col("duration") / pl.col("duration").sum() * 100).alias("percent_total")
])
print(timings)
# shape: (5, 5)
# ┌──────────────────┬───────┬─────────┬──────────┬───────────────┐
# │ node ┆ start ┆ end ┆ duration ┆ percent_total │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ str ┆ u64 ┆ u64 ┆ u64 ┆ f64 │
# ╞══════════════════╪═══════╪═════════╪══════════╪═══════════════╡
# │ optimization ┆ 0 ┆ 94 ┆ 94 ┆ 0.008489 │
# │ with_column(a) ┆ 94 ┆ 8160 ┆ 8066 ┆ 0.728446 │
# │ with_column(c) ┆ 8170 ┆ 10283 ┆ 2113 ┆ 0.190826 │
# │ sort(a) ┆ 10287 ┆ 34732 ┆ 24445 ┆ 2.207644 │
# │ group_by_dynami) ┆ 34737 ┆ 1107308 ┆ 1072571 ┆ 96.864595 │
# └──────────────────┴───────┴─────────┴──────────┴───────────────┘
@lmmx
Copy link
Author

lmmx commented Aug 12, 2025

Screenshot from 2025-08-12 21-11-57

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment