Xue J. Zhao
Machine Learning
Mathematics
Systems
Theory
Architecture
Algorithm
Mar
Mar 26
Mamba
architecture
Mar 24
Mathematics of Diffusion Models
discrete diffusion
score matching
flow matching
Mar 10
Training Infrastructure
Megatron Core
MoE training
Systems
Mar 7
Multimodal-Pretraining
multimodal foundation model
pretraining
paper reading
Mar 5
Insights on Rotation Based Position Embedding
context extension
YaRN
RoPE
Mar 4
Asynchronous RL
off-policy RL
post-training
reasoning
Mar 3
Optimizers
optimizers
memory efficiency
training infra
Mar 2
6D Parallelism for Distributed Training
parallelism
distributed training
infra
Feb
Feb 27
LLM Inference Optimizations
inference
systems
optimization
Feb 23
Engram and LLM Memory
emerging architecture
scaling law for memory
paper reading
Feb 14
Blackwell GEMM
gpu kernels
cutlass
cute dsl
Feb 12
Programming Blackwell GPU
gpu kernels
cutlass
cute dsl
Feb 1
Backward Pass Through LLM
theory
Jan
Jan 19
Low Precision LLM Pre-training with NVFP4
mixed-precision
quantization
engineering
Jan 17
Time Reversal SDE in Diffusion Models
ml-theory
diffusion model
sde
Jan 16
The Fokker Planck Equation
ml-theory
diffusion model
old-blog
Jan 15
Optimal Transportation and Diffusion Models
ml-theory
diffusion model
old-blog
Jan 12
Primal Dual Langevin Monte Carlo Algorithm
ml-theory
optimization
old-blog
Xue J. Zhao © 2026