Xue J. Zhao Blog

Mathematics of Diffusion Models

discrete diffusion

Training Infrastructure

Insights on Rotation Based Position Embedding

context extension

Asynchronous RL

6D Parallelism for Distributed Training

distributed training

Backward Pass Through LLM

The math behind LLM training and the prerequisite to designing optimized training kernels.

The Fokker Planck Equation

Switching lens between SDE and operator views of the Fokker-Planck equation in diffusion models.

diffusion model

Optimal Transportation and Diffusion Models

Switching lens between SDE and operator views of the Fokker-Planck equation in diffusion models.

diffusion model

Low Precision LLM Pre-training with NVFP4

mixed-precision

LLM Inference Optimizations

memory efficiency

Programming Blackwell GPU

Engram and LLM Memory

emerging architecture

scaling law for memory

Multimodal-Pretraining

multimodal foundation model

Time Reversal SDE in Diffusion Models

Heurestic for reversing time in diffusion process.

diffusion model

Matrix Calculus

Matrix derivative, Laplacian, polar body, convexity theorems

Operator Identities

Concerning extreme eigenvalues of some linear operators between Euclidean spaces.

Primal Dual Langevin Monte Carlo Algorithm

Xue J. Zhao © 2026