Technical writing

Technical articles on AI engineering

Code-first technical articles where the implementation, assumptions, metrics, and limitations stay visible. Conceptual tutorials will appear here as they are published.

Latest

Article May 26, 2026

What Actually Speeds Up Transformer Inference?

Profiling and optimizing a small autoregressive transformer with JAX, KV caching, batching, graph compilation, and low-bit inference.

16.23x Best same-batch speedup 6,225 tok/s Best throughput

JAX Inference KV cache Profiling

Read article

Technical articles on AI engineering

Latest

What Actually Speeds Up Transformer Inference?

More technical writing

Training a 10M-Parameter Transformer to Learn 3-Digit Arithmetic