TC
English

Technical writing

Technical articles on AI engineering

Code-first technical articles where the implementation, assumptions, metrics, and limitations stay visible. Conceptual tutorials will appear here as they are published.

Latest

Article

What Actually Speeds Up Transformer Inference?

Profiling and optimizing a small autoregressive transformer with JAX, KV caching, batching, graph compilation, and low-bit inference.

16.23x Best same-batch speedup 6,225 tok/s Best throughput
JAX Inference KV cache Profiling
Read article

More technical writing