GPU Mode - Triton Internals Talk
How Triton Compiler Works Under the Hood!
How Triton Compiler Works Under the Hood!
Enough MLIR to be dangerous - how Triton uses MLIR passes to progressively lower IR
Improve model to reduce duplicates and improve training performance
Exploring a simple transformer model for sequence modelling in recommender systems
What happens when triton.compile is called in the frontend?
Missing tutorial on how triton program gets converted to cuda kernels under the hood
Benchmarking our own GPT2 model against Huggingface GPT2 model
Writing GPT2 from scratch and assigning weights from pre-trained Huggingface model
Another article in the wild on writing transformers from scratch
Performance focussed talk on using torch.compile to generate fused kernels and learning triton along the way
Train LSTM on Animal Farm and create new text
Use convolutional neural networks for image compression
Use transfer learning on VGG-16 to detect dog breeds
Train a Convolutional Neural Network to Detect Dog Breeds
How to Solve the Dynamic Discovery Problem in ZeroMQ