Deep Dive into Triton Internals (Part 1)
Missing tutorial on how triton program gets converted to cuda kernels under the hood
Missing tutorial on how triton program gets converted to cuda kernels under the hood
Benchmarking our own GPT2 model against Huggingface GPT2 model
Writing GPT2 from scratch and assigning weights from pre-trained Huggingface model
Another article in the wild on writing transformers from scratch
Performance focussed talk on using torch.compile to generate fused kernels and learning triton along the way
Train LSTM on Animal Farm and create new text
Use convolutional neural networks for image compression
Use transfer learning on VGG-16 to detect dog breeds
Train a Convolutional Neural Network to Detect Dog Breeds
How to Solve the Dynamic Discovery Problem in ZeroMQ