Blog

Writing and notes

Jun 8, 2026
Triton Kernels for Evo 2 Long-Context Inference

Three opt-in Triton kernels for Evo 2's Hyena blocks: 131K-token inference now runs on a single 80GB H100 where the stock path runs out of memory, the lengths that already fit are up to 1.73x faster, and the same kernels carry to the 40B checkpoint.
Apr 7, 2026
Profiling Rust: A Flamegraph vs PGO, BOLT, and Native CPU Targeting

PGO gave 15% on unoptimised code but added nothing after source-level profiling — the full breakdown with real numbers
Mar 30, 2026
SeqPacker: 11 Bin-Packing Algorithms in Rust for LLM Sequence Packing

A high-performance sequence packing library with 11 bin-packing algorithms, written in Rust with Python bindings