Transformers for Recommender Systems - Part 3
Regularization to improve duplication penalty loss
Regularization to improve duplication penalty loss
How Triton Compiler Works Under the Hood!
Enough MLIR to be dangerous - how Triton uses MLIR passes to progressively lower IR
Improve model to reduce duplicates and improve training performance
Exploring a simple transformer model for sequence modelling in recommender systems
What happens when triton.compile is called in the frontend?
Missing tutorial on how triton program gets converted to cuda kernels under the hood
Benchmarking our own GPT2 model against Huggingface GPT2 model
Writing GPT2 from scratch and assigning weights from pre-trained Huggingface model
Another article in the wild on writing transformers from scratch