Understanding MXFP4 Quantization
Visualizer for MXFP4 quantization
Visualizer for MXFP4 quantization
KV Caching: Training vs Inference in Multi-Head Attention
Load OpenAI's reference implementation
Testing out the huggingface version for the gpt-oss-20b locally on consumer hardware
Regularization to improve duplication penalty loss
How Triton Compiler Works Under the Hood!
Enough MLIR to be dangerous - how Triton uses MLIR passes to progressively lower IR
Improve model to reduce duplicates and improve training performance
Exploring a simple transformer model for sequence modelling in recommender systems
What happens when triton.compile is called in the frontend?