Blog
Writing and notes
-
Triton Kernels for Evo 2 Long-Context Inference
Three opt-in Triton kernels for Evo 2's Hyena blocks: 131K-token inference now runs on a single 80GB H100 where the stock path runs out of memory, the lengths that already fit are up to 1.73x faster, and the same kernels carry to the 40B checkpoint.
-
Profiling Rust: A Flamegraph vs PGO, BOLT, and Native CPU Targeting
PGO gave 15% on unoptimised code but added nothing after source-level profiling — the full breakdown with real numbers
-
SeqPacker: 11 Bin-Packing Algorithms in Rust for LLM Sequence Packing
A high-performance sequence packing library with 11 bin-packing algorithms, written in Rust with Python bindings