Triton Kernels - Fused Softmax - 2

Worklog: Performance debugging Triton Kernel

Triton Kernels - Fused Softmax

Fused Softmax Triton kernel exploration

RMS Normalization Triton kernel implementation for LLMs

Visualizer for MXFP4 quantization

KV Caching: Training vs Inference in Multi-Head Attention

Load OpenAI's reference implementation

Testing out the huggingface version for the gpt-oss-20b locally on consumer hardware

Regularization to improve duplication penalty loss

How Triton Compiler Works Under the Hood!

Enough MLIR to be dangerous - how Triton uses MLIR passes to progressively lower IR