Skip to content

Instantly share code, notes, and snippets.

View lmmx's full-sized avatar
💡
lights, camera, action

Louis Maddox lmmx

💡
lights, camera, action
View GitHub Profile
import datetime as dt
import random
from functools import wraps
import polars as pl
from narlogs import print_step
def load_dog_data():
"""Load sample dog registry data"""
import datetime as dt
import random
from functools import wraps
import polars as pl
from narlogs import print_step
def load_dog_data():
"""Load sample dog registry data"""
@lmmx
lmmx / lf_profile.py
Created August 12, 2025 20:14
Polars profiling
import polars as pl
lf = pl.LazyFrame({"a": [*range(1_000_000)], "b": [*range(1_000_000)]})
perm = lf.with_columns(pl.col("a").shuffle()).with_columns((pl.col("a") * pl.col("b")).alias("c"))
agg = (perm.sort(by="a").group_by_dynamic("a", every="10000i", period="50000i")
.agg([pl.col("b").sum().alias("b_sum"),
pl.col("c").std().alias("c_std"),
pl.col("c").quantile(0.95).alias("c_p95")
@lmmx
lmmx / hf-upload-verification.md
Last active August 9, 2025 16:59
HuggingFace upload verification

How do you ensure the upload integrity of a dataset of maybe 1M+ files?

Imagine you are processing 10k source files by facetting them into 500+ languages.

  • You have 10,000 files of 2,000 IDs each so a total of about 20M IDs.
  • Each row contains 1 or more language (which may not be null: so all IDs will be kept when you facet by language).
  • Rare languages make up 0.01% of the IDs, while some like English are 1%.
  • You partition your dataset into subsets by language and send them to a reliable uploader CLI tool
  • You don't get any info back about whether the upload was a success
  • You can write any record you want to assist your verification of the multipart upload
inherits = "heisenberg"
"comment" = { fg = "gray", modifiers = ["italic"] }
# "comment" = { fg = "teddy_bear_pink_intense", modifiers = ["italic"] }
# "comment.line" = { fg = "teddy_bear_pink", modifiers = ["italic"] }
"comment.block" = { fg = "hazmat_yellow" }
"comment.block.documentation" = { fg = "chili_powder_red" }
"comment.line.documentation" = { fg = "teddy_bear_pink_intense" }
# "type.enum.variant" = { fg = "chili_powder_red" }
"type.enum.variant.builtin" = { fg = "chili_powder_red" }
@lmmx
lmmx / README.md
Last active May 30, 2025 16:52
batcmd: Separate stdout and stderr with syntax highlighting

batcmd - Separate stdout and stderr with syntax highlighting

A bashrc shell function that runs any command and displays stdout and stderr as separate streams with bat syntax highlighting. Uses file descriptor swapping to cleanly separate the streams without temporary files or running the command twice.

Features:

  • Configurable syntax highlighting language
  • Clean separation with "STDOUT" and "STDERR" headers
  • Works with any command
  • No temporary files or command duplication
@lmmx
lmmx / fbb.rs
Last active May 26, 2025 15:36
Demo of the tracing crate to automatically trace function entry and exit (each function must have the `#[instrument]` attribute)
#!/usr/bin/env rust-script
//! Automatic function call tracing with parameters
//!
//! ```cargo
//! [dependencies]
//! tracing = "0.1"
//! tracing-subscriber = { version = "0.3", features = ["fmt"] }
//! ```
use tracing::instrument;
@lmmx
lmmx / guide.md
Last active May 23, 2025 11:49
DeepGEMM uv installation issue repro guide

Requires NVIDIA Hopper architecture GPU (sm_90a must be supported)

i.e. sm_86 (RTX 30x0 series) won't work, need RTX 40x0 series

Installation:

  • make two copies of the repo, call uv venv in one and use conda create in the other (use Python 3.11.11 for both)
mkdir deepgemm && cd deepgemm
@lmmx
lmmx / uninstall-old-rust-nightly.sh
Last active May 19, 2025 13:22
Uninstall old Rust nightly versions
# Keep the latest nightly and remove older dated nightlies
LATEST_DATE=$(rustup toolchain list | grep 'nightly-[0-9]' | sed 's/nightly-\([0-9-]*\).*/\1/' | sort -r | head -n 1)
# If we found dated nightlies, remove all except the most recent one
if [ -n "$LATEST_DATE" ]; then
for toolchain in $(rustup toolchain list | grep 'nightly-[0-9]' | grep -v "$LATEST_DATE"); do
echo "Removing old nightly: $toolchain"
rustup toolchain uninstall "$toolchain"
done
fi
@lmmx
lmmx / main.rs
Last active May 16, 2025 20:32
spez! macro monomorphisation to detect type parameterised Spans
//! ```cargo
//! [dependencies]
//! spez = "0.1.2"
//! ```
use spez::spez;
use core::marker::PhantomData;
#[derive(Debug)] pub enum Cooked {}
#[derive(Debug)] pub enum Raw {}