Post

Deep Dive into Triton Internals (Part 1)

Missing tutorial on how triton program gets converted to cuda kernels under the hood

Recently, I have been exploring Triton, a high-level language for GPU programming. Triton is designed to make it easier to write high-performance GPU code by providing a high-level abstraction that can be compiled to efficient CUDA native code. In this post, I will dive into the internals of Triton and explore how Triton programs are compiled to CUDA kernels (CUBIN to be precise) under the hood. The information on this is fairly sparse and I hope to provide a good summary of overall triton compilation process.

This is part 1 of the deep dive. Other posts in the series:

Cuda Compilation

Before we dive into triton, it is useful to understand the cuda compilation process using nvcc. Below diagram from NVIDIA docs depicts the whole process of how cuda is compiled to executable code.

Cuda compilation

  • The CUDA compiler (nvcc) preprocesses the input program for device compilation. This involves handling directives like #include and macros relevant to device code.
  • The preprocessed device code is compiled into CUDA binary (cubin) and/or PTX (Parallel Thread Execution) intermediate code.
    • PTX is a low-level virtual machine and assembly language for CUDA. It allows for optimizations and is architecture-independent.
    • Cubin is a GPU-specific binary code optimized for the target architecture.
  • Both cubin and PTX are placed into a fatbinary, which contains multiple versions of the code optimized for different GPU architectures.

Let us look at a simple example of cuda -> ptx. You can do that pretty easily using Compiler Explorer. It is a silly example for demonstration purposes.

1
2
3
4
5
6
7
8
9
10
11
#include <stdio.h>

// Size of array
#define N 1024

// Kernel
__global__ void add_vectors(double *a, double *b, double *c)
{
    int id = blockDim.x * blockIdx.x + threadIdx.x;
    if(id < N) c[id] = a[id] + b[id];
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
.visible .entry add_vectors(double*, double*, double*)(
        .param .u64 add_vectors(double*, double*, double*)_param_0,
        .param .u64 add_vectors(double*, double*, double*)_param_1,
        .param .u64 add_vectors(double*, double*, double*)_param_2
)
{

        ld.param.u64    %rd1, [add_vectors(double*, double*, double*)_param_0];
        ld.param.u64    %rd2, [add_vectors(double*, double*, double*)_param_1];
        ld.param.u64    %rd3, [add_vectors(double*, double*, double*)_param_2];
        mov.u32         %r2, %ntid.x;
        mov.u32         %r3, %ctaid.x;
        mov.u32         %r4, %tid.x;
        mad.lo.s32      %r1, %r2, %r3, %r4;
        setp.gt.s32     %p1, %r1, 1023;
        @%p1 bra        $L__BB0_2;

        cvta.to.global.u64      %rd4, %rd1;
        mul.wide.s32    %rd5, %r1, 8;
        add.s64         %rd6, %rd4, %rd5;
        cvta.to.global.u64      %rd7, %rd2;
        add.s64         %rd8, %rd7, %rd5;
        ld.global.f64   %fd1, [%rd8];
        ld.global.f64   %fd2, [%rd6];
        add.f64         %fd3, %fd2, %fd1;
        cvta.to.global.u64      %rd9, %rd3;
        add.s64         %rd10, %rd9, %rd5;
        st.global.f64   [%rd10], %fd3;

$L__BB0_2:
        ret;

}

Triton Compiler

Let’s start with the original paper that showed a high level overview of triton jit process.

Triton compiler from paper

Though at high level, it still holds but the backend infrastructure was completely swapped to use MLIR at the end of 2022.

Triton MLIR PR Merge

Documentation on the updated triton compilation process is sparse to non-existent. The best resources that I found that went into some details on the compilation are below:

Triton Compiler Talk (Pytorch Conference 2023)

Triton conference 2023 by Ian Bearman (Microsoft)

Another Talk that is helpful and goes into MLIR details:

Couple of relevant slides that covers high-level compilation process

Triton Compiler Middle Layer

Triton Compiler Pytorch Conf

After reading through code, unit tests and watching these talks, here is my general summary on how the process works end to end.

  • Triton takes the Python code written in its DSL and parses it. This code typically involves specifying tensor operations, kernels, and other high-level abstractions.
  • The parsed Python code is translated into an intermediate representation. This IR is designed to capture the high-level structure of the computation while being amenable to various optimizations.
  • Various optimization passes are applied to the IR. These optimizations can include loop unrolling, memory access optimizations, constant folding, CSE, etc to improve the efficiency of the generated code.
  • Progressive IR generation is done via MLIR dialects - Triton, TritonGPU, TritonNVIDIAGPU.
  • The optimized Triton IR is then lowered to LLVM IR. Triton leverages LLVM to perform further optimizations and to facilitate the generation of machine code.
  • The LLVM IR is then used to generate CUDA code. This involves translating the LLVM IR to PTX code, which is the intermediate representation used by NVIDIA’s CUDA compiler (nvcc).
  • The generated PTX code is then compiled Just-In-Time (JIT) into a CUDA binary (CUBIN). This compilation is done using NVIDIA’s JIT compiler, which converts the PTX code into machine code that can be executed on the GPU.

Zoomed in view: Triton IR -> LLVM IR

Triton to LLVM

Optimization Passes

MLIR general optimizations

  • CSE, DCE, Inlining, …

TritonGPU specific optimizations

  • Pipeline
  • Prefetch
  • Matmul accelerate
  • Coalesce
  • Remove layout

TritonNVIDIAGPU specific optimizations

  • TMA Materialization
  • TMA Multicast
  • Async Dot
  • Warp Specialization

Python -> Cuda, Triton IR, PTX, CUBIN

Next, we look into how to get the Triton IR, PTX, CUBIN, et al from Python code. Let’s use the vector_add kernel from the triton examples.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import torch

import triton
import triton.language as tl
from triton.tools.disasm import get_sass


@triton.jit
def add_kernel(x_ptr,  # *Pointer* to first input vector.
               y_ptr,  # *Pointer* to second input vector.
               output_ptr,  # *Pointer* to output vector.
               n_elements,  # Size of the vector.
               BLOCK_SIZE: tl.constexpr,  # Number of elements each program should process.
               # NOTE: `constexpr` so it can be used as a shape value.
               ):
    # There are multiple 'programs' processing different data. We identify which program
    # we are here:
    pid = tl.program_id(axis=0)  # We use a 1D launch grid so axis is 0.
    # This program will process inputs that are offset from the initial data.
    # For instance, if you had a vector of length 256 and block_size of 64, the programs
    # would each access the elements [0:64, 64:128, 128:192, 192:256].
    # Note that offsets is a list of pointers:
    block_start = pid * BLOCK_SIZE
    offsets = block_start + tl.arange(0, BLOCK_SIZE)
    # Create a mask to guard memory operations against out-of-bounds accesses.
    mask = offsets < n_elements
    # Load x and y from DRAM, masking out any extra elements in case the input is not a
    # multiple of the block size.
    x = tl.load(x_ptr + offsets, mask=mask)
    y = tl.load(y_ptr + offsets, mask=mask)
    output = x + y
    # Write x + y back to DRAM.
    tl.store(output_ptr + offsets, output, mask=mask)

Triton -> Cuda Code

1
2
3
4
5
$ python3 python/triton/tools/compile.py \
  --kernel-name add_kernel \ 
  --signature "*fp32,*fp32,*fp32,i32,64" \
  --grid=1024,1024,1024 \
  ../ml-projects/triton_internals/vector_add.py

This should dump a .c and .h file with the cuda kernel code.

1
2
3
$ ll | grep add_kernel
-rw-rw-r--  1 ksharma ksharma  36K Aug  1 22:55 add_kernel.9969bdda_0123.c
-rw-rw-r--  1 ksharma ksharma  501 Aug  1 22:55 add_kernel.9969bdda_0123.h

Generated Cuda Kernel

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#ifndef TT_KERNEL_INCLUDES
#define TT_KERNEL_INCLUDES

#include <cuda.h>
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>

#endif

void unload_add_kernel_9969bdda_0123(void);
void load_add_kernel_9969bdda_0123(void);
// tt-linker: add_kernel_9969bdda_0123:CUdeviceptr x_ptr, CUdeviceptr y_ptr, CUdeviceptr output_ptr, int32_t n_elements:64_warps1xstages3
CUresult add_kernel_9969bdda_0123(CUstream stream, CUdeviceptr x_ptr, CUdeviceptr y_ptr, CUdeviceptr output_ptr, int32_t n_elements);
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
/* clang-format off */
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <string.h>
#include <cuda.h>


// helpers to check for cuda errors
#define CUDA_CHECK(ans) {\
    gpuAssert((ans), __FILE__, __LINE__);\
  }\

static inline void gpuAssert(CUresult code, const char *file, int line) {
  if (code != CUDA_SUCCESS) {
    const char *prefix = "Triton Error [CUDA]: ";
    const char *str;
    cuGetErrorString(code, &str);
    char err[1024] = {0};
    strcat(err, prefix);
    strcat(err, str);
    printf("%s\\n", err);
    exit(code);
  }
}

// globals
#define CUBIN_NAME add_kernel_9969bdda_0123_cubin
CUmodule add_kernel_9969bdda_0123_mod = NULL;
CUfunction add_kernel_9969bdda_0123_func = NULL;
unsigned char CUBIN_NAME[11472] = { 0x7f, 0x45, 0x4c, 0x46, 0x02, 0x01, 0x01, 0x33, 0x07, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0xbe, 0x00, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0x15, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x59, 0x05, 0x59, 0x00, 0x40, 0x00, 0x38, 0x00, 0x03, 0x00, 0x40, 0x00, 0x11, 0x00, 0x01, 0x00, 0x00, 0x2e, 0x73, 0x68, 0x73, 0x74, 0x72, 0x74, 0x61, 0x62, 0x00, 0x2e, 0x73, 0x74, 0x72, 0x74, 0x61, 0x62, 0x00, 0x2e, 0x73, 0x79, 0x6d, 0x74, 0x61, 0x62, 0x00, 0x2e, 0x73, 0x79, 0x6d, 0x74, 0x61, 0x62, 0x5f, 0x73, 0x68, 0x6e, 0x64, 0x78, 0x00, 0x2e, 0x6e, 0x76, 0x2e, 0x69, 0x6e, 0x66, 0x6f, 0x00, 0x2e, 0x74, 0x65, 0x78, 0x74, 0x2e, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x00, 0x2e, 0x6e, 0x76, 0x2e, 0x69, 0x6e, 0x66, 0x6f, 0x2e, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x00, 0x2e, 0x6e, 0x76, 0x2e, 0x73, 0x68, 0x61, 0x72, 0x65, 0x64, 0x2e, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x00, 0x2e, 0x6e, 0x76, 0x2e, 0x63, 0x6f, 0x6e, 0x73, 0x74, 0x61, 0x6e, 0x74, 0x30, 0x2e, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x00, 0x2e, 0x72, 0x65, 0x6c, 0x2e, 0x6e, 0x76, 0x2e, 0x63, 0x6f, 0x6e, 0x73, 0x74, 0x61, 0x6e, 0x74, 0x30, 0x2e, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x00, 0x2e, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x66, 0x72, 0x61, 0x6d, 0x65, 0x00, 0x2e, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x6c, 0x69, 0x6e, 0x65, 0x00, 0x2e, 0x72, 0x65, 0x6c, 0x2e, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x6c, 0x69, 0x6e, 0x65, 0x00, 0x2e, 0x6e, 0x76, 0x5f, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x73, 0x61, 0x73, 0x73, 0x00, 0x2e, 0x72, 0x65, 0x6c, 0x2e, 0x6e, 0x76, 0x5f, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x73, 0x61, 0x73, 0x73, 0x00, 0x2e, 0x6e, 0x76, 0x5f, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x70, 0x74, 0x78, 0x5f, 0x74, 0x78, 0x74, 0x00, 0x2e, 0x72, 0x65, 0x6c, 0x2e, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x66, 0x72, 0x61, 0x6d, 0x65, 0x00, 0x2e, 0x72, 0x65, 0x6c, 0x61, 0x2e, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x66, 0x72, 0x61, 0x6d, 0x65, 0x00, 0x2e, 0x6e, 0x76, 0x2e, 0x63, 0x61, 0x6c, 0x6c, 0x67, 0x72, 0x61, 0x70, 0x68, 0x00, 0x2e, 0x6e, 0x76, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x74, 0x79, 0x70, 0x65, 0x00, 0x2e, 0x6e, 0x76, 0x2e, 0x72, 0x65, 0x6c, 0x2e, 0x61, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x00, 0x00, 0x2e, 0x73, 0x68, 0x73, 0x74, 0x72, 0x74, 0x61, 0x62, 0x00, 0x2e, 0x73, 0x74, 0x72, 0x74, 0x61, 0x62, 0x00, 0x2e, 0x73, 0x79, 0x6d, 0x74, 0x61, 0x62, 0x00, 0x2e, 0x73, 0x79, 0x6d, 0x74, 0x61, 0x62, 0x5f, 0x73, 0x68, 0x6e, 0x64, 0x78, 0x00, 0x2e, 0x6e, 0x76, 0x2e, 0x69, 0x6e, 0x66, 0x6f, 0x00, 0x2e, 0x74, 0x65, 0x78, 0x74, 0x2e, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x00, 0x2e, 0x6e, 0x76, 0x2e, 0x69, 0x6e, 0x66, 0x6f, 0x2e, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x00, 0x2e, 0x6e, 0x76, 0x2e, 0x73, 0x68, 0x61, 0x72, 0x65, 0x64, 0x2e, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x00, 0x2e, 0x72, 0x65, 0x6c, 0x2e, 0x6e, 0x76, 0x2e, 0x63, 0x6f, 0x6e, 0x73, 0x74, 0x61, 0x6e, 0x74, 0x30, 0x2e, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x00, 0x2e, 0x6e, 0x76, 0x2e, 0x63, 0x6f, 0x6e, 0x73, 0x74, 0x61, 0x6e, 0x74, 0x30, 0x2e, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x00, 0x2e, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x66, 0x72, 0x61, 0x6d, 0x65, 0x00, 0x2e, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x6c, 0x69, 0x6e, 0x65, 0x00, 0x2e, 0x72, 0x65, 0x6c, 0x2e, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x6c, 0x69, 0x6e, 0x65, 0x00, 0x2e, 0x6e, 0x76, 0x5f, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x73, 0x61, 0x73, 0x73, 0x00, 0x2e, 0x72, 0x65, 0x6c, 0x2e, 0x6e, 0x76, 0x5f, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x6c, 0x69, 0x6e, 0x65, 0x5f, 0x73, 0x61, 0x73, 0x73, 0x00, 0x2e, 0x6e, 0x76, 0x5f, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x70, 0x74, 0x78, 0x5f, 0x74, 0x78, 0x74, 0x00, 0x2e, 0x72, 0x65, 0x6c, 0x2e, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x66, 0x72, 0x61, 0x6d, 0x65, 0x00, 0x2e, 0x72, 0x65, 0x6c, 0x61, 0x2e, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x66, 0x72, 0x61, 0x6d, 0x65, 0x00, 0x2e, 0x6e, 0x76, 0x2e, 0x63, 0x61, 0x6c, 0x6c, 0x67, 0x72, 0x61, 0x70, 0x68, 0x00, 0x2e, 0x6e, 0x76, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x74, 0x79, 0x70, 0x65, 0x00, 0x2e, 0x6e, 0x76, 0x2e, 0x72, 0x65, 0x6c, 0x2e, 0x61, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x00, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x32, 0x00, 0x00, 0x00, 0x03, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x8a, 0x00, 0x00, 0x00, 0x03, 0x00, 0x0f, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xa3, 0x00, 0x00, 0x00, 0x03, 0x00, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xb0, 0x00, 0x00, 0x00, 0x03, 0x00, 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xcc, 0x00, 0x00, 0x00, 0x03, 0x00, 0x06, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf8, 0x00, 0x00, 0x00, 0x03, 0x00, 0x07, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x2d, 0x01, 0x00, 0x00, 0x03, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x49, 0x01, 0x00, 0x00, 0x03, 0x00, 0x0b, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x58, 0x01, 0x00, 0x00, 0x12, 0x10, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0x24, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x03, 0x00, 0x04, 0x7c, 0xff, 0xff, 0xff, 0xff, 0x0f, 0x0c, 0x81, 0x80, 0x80, 0x28, 0x00, 0x08, 0xff, 0x81, 0x80, 0x28, 0x08, 0x81, 0x80, 0x80, 0x28, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0x34, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x04, 0x00, 0x00, 0x00, 0x04, 0x58, 0x00, 0x00, 0x00, 0x0c, 0x81, 0x80, 0x80, 0x28, 0x00, 0x04, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xb5, 0x00, 0x00, 0x00, 0x02, 0x00, 0x6c, 0x00, 0x00, 0x00, 0x01, 0x01, 0xfb, 0x0e, 0x0a, 0x00, 0x01, 0x01, 0x01, 0x01, 0x00, 0x00, 0x00, 0x01, 0x2f, 0x68, 0x6f, 0x6d, 0x65, 0x2f, 0x6b, 0x73, 0x68, 0x61, 0x72, 0x6d, 0x61, 0x2f, 0x64, 0x65, 0x76, 0x2f, 0x67, 0x69, 0x74, 0x2f, 0x6c, 0x65, 0x61, 0x72, 0x6e, 0x69, 0x6e, 0x67, 0x2d, 0x74, 0x72, 0x69, 0x74, 0x6f, 0x6e, 0x2f, 0x2e, 0x2e, 0x2f, 0x6d, 0x6c, 0x2d, 0x70, 0x72, 0x6f, 0x6a, 0x65, 0x63, 0x74, 0x73, 0x2f, 0x74, 0x72, 0x69, 0x74, 0x6f, 0x6e, 0x5f, 0x69, 0x6e, 0x74, 0x65, 0x72, 0x6e, 0x61, 0x6c, 0x73, 0x00, 0x00, 0x76, 0x65, 0x63, 0x74, 0x6f, 0x72, 0x5f, 0x61, 0x64, 0x64, 0x2e, 0x70, 0x79, 0x00, 0x01, 0x95, 0x92, 0xb1, 0xb5, 0x06, 0xde, 0x0e, 0x00, 0x00, 0x09, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x01, 0x03, 0x08, 0x01, 0x03, 0x0f, 0x02, 0x10, 0x01, 0xf4, 0x03, 0x75, 0x02, 0x30, 0x01, 0x03, 0x0b, 0x02, 0x10, 0x01, 0xea, 0x03, 0x02, 0x02, 0x20, 0x01, 0xf3, 0x03, 0x7a, 0x02, 0x10, 0x01, 0xf1, 0xf2, 0x03, 0x01, 0x02, 0x20, 0x01, 0xee, 0xf0, 0xf0, 0xf1, 0x03, 0x7e, 0x02, 0x20, 0x01, 0xf1, 0x02, 0xa0, 0x02, 0x00, 0x01, 0x01, 0x7b, 0x00, 0x00, 0x00, 0x02, 0x00, 0x10, 0x00, 0x00, 0x00, 0x01, 0x01, 0xfb, 0x0e, 0x0a, 0x00, 0x01, 0x01, 0x01, 0x01, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x09, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x00, 0x03, 0x11, 0x01, 0x03, 0x15, 0x02, 0x10, 0x01, 0x03, 0x0a, 0x02, 0x10, 0x01, 0xf4, 0x03, 0x5c, 0x02, 0x10, 0x01, 0x03, 0x0e, 0x02, 0x10, 0x01, 0x03, 0x1b, 0x02, 0x10, 0x01, 0x03, 0x6d, 0x02, 0x10, 0x01, 0xf1, 0xf2, 0x03, 0x12, 0x02, 0x10, 0x01, 0x03, 0x6c, 0x02, 0x10, 0x01, 0xf2, 0xf2, 0xf4, 0x03, 0x0e, 0x02, 0x10, 0x01, 0x03, 0x77, 0x02, 0x10, 0x01, 0x03, 0x0e, 0x02, 0x10, 0x01, 0xf3, 0xf2, 0xf4, 0x03, 0x79, 0x02, 0x10, 0x01, 0x03, 0x0b, 0x02, 0x10, 0x01, 0x03, 0x03, 0x02, 0x20, 0x01, 0x02, 0x80, 0x02, 0x00, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00, 0x2e, 0x76, 0x65, 0x72, 0x73, 0x69, 0x6f, 0x6e, 0x20, 0x38, 0x2e, 0x34, 0x00, 0x2e, 0x74, 0x61, 0x72, 0x67, 0x65, 0x74, 0x20, 0x73, 0x6d, 0x5f, 0x38, 0x39, 0x00, 0x2e, 0x61, 0x64, 0x64, 0x72, 0x65, 0x73, 0x73, 0x5f, 0x73, 0x69, 0x7a, 0x65, 0x20, 0x36, 0x34, 0x00, 0x00, 0x00, 0x00, 0x2e, 0x76, 0x69, 0x73, 0x69, 0x62, 0x6c, 0x65, 0x20, 0x2e, 0x65, 0x6e, 0x74, 0x72, 0x79, 0x20, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x28, 0x00, 0x2e, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x20, 0x2e, 0x75, 0x36, 0x34, 0x20, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x5f, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x5f, 0x30, 0x2c, 0x00, 0x2e, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x20, 0x2e, 0x75, 0x36, 0x34, 0x20, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x5f, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x5f, 0x31, 0x2c, 0x00, 0x2e, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x20, 0x2e, 0x75, 0x36, 0x34, 0x20, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x5f, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x5f, 0x32, 0x2c, 0x00, 0x2e, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x20, 0x2e, 0x75, 0x33, 0x32, 0x20, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x5f, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x5f, 0x33, 0x00, 0x29, 0x00, 0x2e, 0x72, 0x65, 0x71, 0x6e, 0x74, 0x69, 0x64, 0x20, 0x33, 0x32, 0x2c, 0x20, 0x31, 0x2c, 0x20, 0x31, 0x00, 0x7b, 0x00, 0x2e, 0x72, 0x65, 0x67, 0x20, 0x2e, 0x70, 0x72, 0x65, 0x64, 0x20, 0x09, 0x25, 0x70, 0x3c, 0x37, 0x3e, 0x3b, 0x00, 0x2e, 0x72, 0x65, 0x67, 0x20, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x09, 0x25, 0x72, 0x3c, 0x31, 0x34, 0x3e, 0x3b, 0x00, 0x2e, 0x72, 0x65, 0x67, 0x20, 0x2e, 0x66, 0x33, 0x32, 0x20, 0x09, 0x25, 0x66, 0x3c, 0x37, 0x3e, 0x3b, 0x00, 0x2e, 0x72, 0x65, 0x67, 0x20, 0x2e, 0x62, 0x36, 0x34, 0x20, 0x09, 0x25, 0x72, 0x64, 0x3c, 0x31, 0x31, 0x3e, 0x3b, 0x00, 0x00, 0x24, 0x4c, 0x5f, 0x5f, 0x66, 0x75, 0x6e, 0x63, 0x5f, 0x62, 0x65, 0x67, 0x69, 0x6e, 0x30, 0x3a, 0x00, 0x00, 0x00, 0x6c, 0x64, 0x2e, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x2e, 0x75, 0x36, 0x34, 0x20, 0x09, 0x25, 0x72, 0x64, 0x37, 0x2c, 0x20, 0x5b, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x5f, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x5f, 0x30, 0x5d, 0x3b, 0x00, 0x6c, 0x64, 0x2e, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x2e, 0x75, 0x36, 0x34, 0x20, 0x09, 0x25, 0x72, 0x64, 0x38, 0x2c, 0x20, 0x5b, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x5f, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x5f, 0x31, 0x5d, 0x3b, 0x00, 0x24, 0x4c, 0x5f, 0x5f, 0x74, 0x6d, 0x70, 0x30, 0x3a, 0x00, 0x00, 0x00, 0x6d, 0x6f, 0x76, 0x2e, 0x75, 0x33, 0x32, 0x20, 0x25, 0x72, 0x31, 0x2c, 0x20, 0x25, 0x63, 0x74, 0x61, 0x69, 0x64, 0x2e, 0x78, 0x3b, 0x00, 0x00, 0x00, 0x73, 0x68, 0x6c, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x09, 0x25, 0x72, 0x38, 0x2c, 0x20, 0x25, 0x72, 0x31, 0x2c, 0x20, 0x36, 0x3b, 0x00, 0x6c, 0x64, 0x2e, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x2e, 0x75, 0x36, 0x34, 0x20, 0x09, 0x25, 0x72, 0x64, 0x39, 0x2c, 0x20, 0x5b, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x5f, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x5f, 0x32, 0x5d, 0x3b, 0x00, 0x6c, 0x64, 0x2e, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x2e, 0x75, 0x33, 0x32, 0x20, 0x09, 0x25, 0x72, 0x39, 0x2c, 0x20, 0x5b, 0x61, 0x64, 0x64, 0x5f, 0x6b, 0x65, 0x72, 0x6e, 0x65, 0x6c, 0x5f, 0x70, 0x61, 0x72, 0x61, 0x6d, 0x5f, 0x33, 0x5d, 0x3b, 0x00, 0x00, 0x6d, 0x6f, 0x76, 0x2e, 0x75, 0x33, 0x32, 0x20, 0x09, 0x25, 0x72, 0x31, 0x30, 0x2c, 0x20, 0x25, 0x74, 0x69, 0x64, 0x2e, 0x78, 0x3b, 0x00, 0x61, 0x6e, 0x64, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x20, 0x09, 0x25, 0x72, 0x31, 0x31, 0x2c, 0x20, 0x25, 0x72, 0x31, 0x30, 0x2c, 0x20, 0x33, 0x31, 0x3b, 0x00, 0x00, 0x6f, 0x72, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x20, 0x09, 0x25, 0x72, 0x31, 0x32, 0x2c, 0x20, 0x25, 0x72, 0x38, 0x2c, 0x20, 0x25, 0x72, 0x31, 0x31, 0x3b, 0x00, 0x6f, 0x72, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x20, 0x09, 0x25, 0x72, 0x31, 0x33, 0x2c, 0x20, 0x25, 0x72, 0x31, 0x32, 0x2c, 0x20, 0x33, 0x32, 0x3b, 0x00, 0x00, 0x73, 0x65, 0x74, 0x70, 0x2e, 0x6c, 0x74, 0x2e, 0x73, 0x33, 0x32, 0x20, 0x09, 0x25, 0x70, 0x31, 0x2c, 0x20, 0x25, 0x72, 0x31, 0x32, 0x2c, 0x20, 0x25, 0x72, 0x39, 0x3b, 0x00, 0x73, 0x65, 0x74, 0x70, 0x2e, 0x6c, 0x74, 0x2e, 0x73, 0x33, 0x32, 0x20, 0x09, 0x25, 0x70, 0x32, 0x2c, 0x20, 0x25, 0x72, 0x31, 0x33, 0x2c, 0x20, 0x25, 0x72, 0x39, 0x3b, 0x00, 0x00, 0x6d, 0x75, 0x6c, 0x2e, 0x77, 0x69, 0x64, 0x65, 0x2e, 0x73, 0x33, 0x32, 0x20, 0x09, 0x25, 0x72, 0x64, 0x31, 0x30, 0x2c, 0x20, 0x25, 0x72, 0x31, 0x32, 0x2c, 0x20, 0x34, 0x3b, 0x00, 0x61, 0x64, 0x64, 0x2e, 0x73, 0x36, 0x34, 0x20, 0x09, 0x25, 0x72, 0x64, 0x31, 0x2c, 0x20, 0x25, 0x72, 0x64, 0x37, 0x2c, 0x20, 0x25, 0x72, 0x64, 0x31, 0x30, 0x3b, 0x00, 0x61, 0x64, 0x64, 0x2e, 0x73, 0x36, 0x34, 0x20, 0x09, 0x25, 0x72, 0x64, 0x32, 0x2c, 0x20, 0x25, 0x72, 0x64, 0x31, 0x2c, 0x20, 0x31, 0x32, 0x38, 0x3b, 0x00, 0x00, 0x00, 0x6d, 0x6f, 0x76, 0x2e, 0x75, 0x33, 0x32, 0x20, 0x25, 0x72, 0x32, 0x2c, 0x20, 0x30, 0x78, 0x30, 0x3b, 0x00, 0x40, 0x25, 0x70, 0x31, 0x20, 0x6c, 0x64, 0x2e, 0x67, 0x6c, 0x6f, 0x62, 0x61, 0x6c, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x7b, 0x20, 0x25, 0x72, 0x32, 0x20, 0x7d, 0x2c, 0x20, 0x5b, 0x20, 0x25, 0x72, 0x64, 0x31, 0x20, 0x2b, 0x20, 0x30, 0x20, 0x5d, 0x3b, 0x00, 0x00, 0x6d, 0x6f, 0x76, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x09, 0x25, 0x66, 0x31, 0x2c, 0x20, 0x25, 0x72, 0x32, 0x3b, 0x00, 0x00, 0x6d, 0x6f, 0x76, 0x2e, 0x75, 0x33, 0x32, 0x20, 0x25, 0x72, 0x33, 0x2c, 0x20, 0x30, 0x78, 0x30, 0x3b, 0x00, 0x40, 0x25, 0x70, 0x32, 0x20, 0x6c, 0x64, 0x2e, 0x67, 0x6c, 0x6f, 0x62, 0x61, 0x6c, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x7b, 0x20, 0x25, 0x72, 0x33, 0x20, 0x7d, 0x2c, 0x20, 0x5b, 0x20, 0x25, 0x72, 0x64, 0x32, 0x20, 0x2b, 0x20, 0x30, 0x20, 0x5d, 0x3b, 0x00, 0x00, 0x6d, 0x6f, 0x76, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x09, 0x25, 0x66, 0x32, 0x2c, 0x20, 0x25, 0x72, 0x33, 0x3b, 0x00, 0x00, 0x61, 0x64, 0x64, 0x2e, 0x73, 0x36, 0x34, 0x20, 0x09, 0x25, 0x72, 0x64, 0x33, 0x2c, 0x20, 0x25, 0x72, 0x64, 0x38, 0x2c, 0x20, 0x25, 0x72, 0x64, 0x31, 0x30, 0x3b, 0x00, 0x61, 0x64, 0x64, 0x2e, 0x73, 0x36, 0x34, 0x20, 0x09, 0x25, 0x72, 0x64, 0x34, 0x2c, 0x20, 0x25, 0x72, 0x64, 0x33, 0x2c, 0x20, 0x31, 0x32, 0x38, 0x3b, 0x00, 0x00, 0x00, 0x6d, 0x6f, 0x76, 0x2e, 0x75, 0x33, 0x32, 0x20, 0x25, 0x72, 0x34, 0x2c, 0x20, 0x30, 0x78, 0x30, 0x3b, 0x00, 0x40, 0x25, 0x70, 0x31, 0x20, 0x6c, 0x64, 0x2e, 0x67, 0x6c, 0x6f, 0x62, 0x61, 0x6c, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x7b, 0x20, 0x25, 0x72, 0x34, 0x20, 0x7d, 0x2c, 0x20, 0x5b, 0x20, 0x25, 0x72, 0x64, 0x33, 0x20, 0x2b, 0x20, 0x30, 0x20, 0x5d, 0x3b, 0x00, 0x00, 0x6d, 0x6f, 0x76, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x09, 0x25, 0x66, 0x33, 0x2c, 0x20, 0x25, 0x72, 0x34, 0x3b, 0x00, 0x00, 0x6d, 0x6f, 0x76, 0x2e, 0x75, 0x33, 0x32, 0x20, 0x25, 0x72, 0x35, 0x2c, 0x20, 0x30, 0x78, 0x30, 0x3b, 0x00, 0x40, 0x25, 0x70, 0x32, 0x20, 0x6c, 0x64, 0x2e, 0x67, 0x6c, 0x6f, 0x62, 0x61, 0x6c, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x7b, 0x20, 0x25, 0x72, 0x35, 0x20, 0x7d, 0x2c, 0x20, 0x5b, 0x20, 0x25, 0x72, 0x64, 0x34, 0x20, 0x2b, 0x20, 0x30, 0x20, 0x5d, 0x3b, 0x00, 0x00, 0x6d, 0x6f, 0x76, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x09, 0x25, 0x66, 0x34, 0x2c, 0x20, 0x25, 0x72, 0x35, 0x3b, 0x00, 0x00, 0x61, 0x64, 0x64, 0x2e, 0x66, 0x33, 0x32, 0x20, 0x09, 0x25, 0x66, 0x35, 0x2c, 0x20, 0x25, 0x66, 0x31, 0x2c, 0x20, 0x25, 0x66, 0x33, 0x3b, 0x00, 0x61, 0x64, 0x64, 0x2e, 0x66, 0x33, 0x32, 0x20, 0x09, 0x25, 0x66, 0x36, 0x2c, 0x20, 0x25, 0x66, 0x32, 0x2c, 0x20, 0x25, 0x66, 0x34, 0x3b, 0x00, 0x00, 0x61, 0x64, 0x64, 0x2e, 0x73, 0x36, 0x34, 0x20, 0x09, 0x25, 0x72, 0x64, 0x35, 0x2c, 0x20, 0x25, 0x72, 0x64, 0x39, 0x2c, 0x20, 0x25, 0x72, 0x64, 0x31, 0x30, 0x3b, 0x00, 0x61, 0x64, 0x64, 0x2e, 0x73, 0x36, 0x34, 0x20, 0x09, 0x25, 0x72, 0x64, 0x36, 0x2c, 0x20, 0x25, 0x72, 0x64, 0x35, 0x2c, 0x20, 0x31, 0x32, 0x38, 0x3b, 0x00, 0x00, 0x6d, 0x6f, 0x76, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x09, 0x25, 0x72, 0x36, 0x2c, 0x20, 0x25, 0x66, 0x35, 0x3b, 0x00, 0x00, 0x40, 0x25, 0x70, 0x31, 0x20, 0x73, 0x74, 0x2e, 0x67, 0x6c, 0x6f, 0x62, 0x61, 0x6c, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x5b, 0x20, 0x25, 0x72, 0x64, 0x35, 0x20, 0x2b, 0x20, 0x30, 0x20, 0x5d, 0x2c, 0x20, 0x7b, 0x20, 0x25, 0x72, 0x36, 0x20, 0x7d, 0x3b, 0x00, 0x00, 0x6d, 0x6f, 0x76, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x09, 0x25, 0x72, 0x37, 0x2c, 0x20, 0x25, 0x66, 0x36, 0x3b, 0x00, 0x00, 0x40, 0x25, 0x70, 0x32, 0x20, 0x73, 0x74, 0x2e, 0x67, 0x6c, 0x6f, 0x62, 0x61, 0x6c, 0x2e, 0x62, 0x33, 0x32, 0x20, 0x5b, 0x20, 0x25, 0x72, 0x64, 0x36, 0x20, 0x2b, 0x20, 0x30, 0x20, 0x5d, 0x2c, 0x20, 0x7b, 0x20, 0x25, 0x72, 0x37, 0x20, 0x7d, 0x3b, 0x00, 0x00, 0x00, 0x72, 0x65, 0x74, 0x3b, 0x00, 0x24, 0x4c, 0x5f, 0x5f, 0x74, 0x6d, 0x70, 0x31, 0x3a, 0x00, 0x24, 0x4c, 0x5f, 0x5f, 0x66, 0x75, 0x6e, 0x63, 0x5f, 0x65, 0x6e, 0x64, 0x30, 0x3a, 0x00, 0x00, 0x7d, 0x00, 0x00, 0x2e, 0x73, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x09, 0x2e, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x61, 0x62, 0x62, 0x72, 0x65, 0x76, 0x00, 0x7b, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x7d, 0x00, 0x2e, 0x73, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x09, 0x2e, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x69, 0x6e, 0x66, 0x6f, 0x00, 0x7b, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x7d, 0x00, 0x2e, 0x73, 0x65, 0x63, 0x74, 0x69, 0x6f, 0x6e, 0x09, 0x2e, 0x64, 0x65, 0x62, 0x75, 0x67, 0x5f, 0x6c, 0x6f, 0x63, 0x09, 0x7b, 0x09, 0x7d, 0x00, 0x00, 0x00, 0x00, 0x04, 0x2f, 0x08, 0x00, 0x09, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x04, 0x12, 0x08, 0x00, 0x09, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x11, 0x08, 0x00, 0x09, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x12, 0x08, 0x00, 0x09, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x37, 0x04, 0x00, 0x7c, 0x00, 0x00, 0x00, 0x04, 0x0a, 0x08, 0x00, 0x02, 0x00, 0x00, 0x00, 0x60, 0x01, 0x1c, 0x00, 0x03, 0x19, 0x1c, 0x00, 0x04, 0x17, 0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0x18, 0x00, 0x00, 0xf0, 0x11, 0x00, 0x04, 0x17, 0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x10, 0x00, 0x00, 0xf0, 0x21, 0x00, 0x04, 0x17, 0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x08, 0x00, 0x00, 0xf0, 0x21, 0x00, 0x04, 0x17, 0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x21, 0x00, 0x03, 0x1b, 0xff, 0x00, 0x04, 0x1c, 0x08, 0x00, 0x60, 0x01, 0x00, 0x00, 0x80, 0x01, 0x00, 0x00, 0x04, 0x10, 0x0c, 0x00, 0x20, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00, 0xfe, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00, 0xfd, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00, 0xfc, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00, 0x73, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x11, 0x25, 0x00, 0x05, 0x36, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x1d, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x44, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x7a, 0x01, 0x00, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x00, 0x00, 0xe4, 0x0f, 0x00, 0x19, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x21, 0x00, 0x00, 0x00, 0x22, 0x0e, 0x00, 0x02, 0x78, 0x0b, 0x00, 0x04, 0x00, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x00, 0x00, 0xe2, 0x0f, 0x00, 0x05, 0x78, 0x06, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0x01, 0x00, 0x00, 0xe2, 0x0f, 0x00, 0xb9, 0x7a, 0x04, 0x00, 0x00, 0x46, 0x00, 0x00, 0x00, 0x0a, 0x00, 0x00, 0x00, 0xe2, 0x0f, 0x00, 0x19, 0x79, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x25, 0x00, 0x00, 0x00, 0x62, 0x0e, 0x00, 0x05, 0x78, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0x01, 0x00, 0x00, 0xe2, 0x0f, 0x00, 0x12, 0x78, 0x00, 0x00, 0x1f, 0x00, 0x00, 0x00, 0xff, 0xc0, 0x8e, 0x07, 0x00, 0xc8, 0x1f, 0x00, 0x11, 0x72, 0x00, 0x03, 0x00, 0x00, 0x00, 0x00, 0xff, 0x30, 0x8e, 0x07, 0x00, 0xc8, 0x2f, 0x00, 0x0c, 0x7a, 0x00, 0x00, 0x00, 0x5e, 0x00, 0x00, 0x70, 0x62, 0xf0, 0x03, 0x00, 0xe2, 0x0f, 0x04, 0x25, 0x76, 0x04, 0x00, 0x00, 0x5a, 0x00, 0x00, 0x0b, 0x02, 0x8e, 0x07, 0x00, 0xe2, 0x0f, 0x04, 0x10, 0x78, 0x02, 0x00, 0x20, 0x00, 0x00, 0x00, 0xff, 0xe0, 0xff, 0x07, 0x00, 0xc8, 0x0f, 0x00, 0x0c, 0x7a, 0x00, 0x02, 0x00, 0x5e, 0x00, 0x00, 0x70, 0x62, 0xf2, 0x03, 0x00, 0xe2, 0x0f, 0x00, 0x25, 0x76, 0x02, 0x00, 0x00, 0x58, 0x00, 0x00, 0x0b, 0x02, 0x8e, 0x07, 0x00, 0xcc, 0x0f, 0x00, 0x81, 0x89, 0x06, 0x02, 0x04, 0x00, 0x00, 0x00, 0x00, 0x19, 0x1e, 0x0c, 0x00, 0xa8, 0x0e, 0x00, 0x81, 0x89, 0x07, 0x04, 0x04, 0x00, 0x00, 0x00, 0x00, 0x19, 0x1e, 0x0c, 0x00, 0xa8, 0x0e, 0x00, 0x81, 0x99, 0x08, 0x02, 0x04, 0x80, 0x00, 0x00, 0x00, 0x19, 0x1e, 0x0c, 0x00, 0xe8, 0x0e, 0x00, 0x81, 0x99, 0x09, 0x04, 0x04, 0x80, 0x00, 0x00, 0x00, 0x19, 0x1e, 0x0c, 0x00, 0xe2, 0x0e, 0x00, 0x21, 0x72, 0x0d, 0x07, 0x06, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe2, 0x4f, 0x00, 0x25, 0x76, 0x06, 0x00, 0x00, 0x5c, 0x00, 0x00, 0x0b, 0x02, 0x8e, 0x07, 0x00, 0xca, 0x0f, 0x00, 0x86, 0x89, 0x00, 0x06, 0x0d, 0x00, 0x00, 0x00, 0x04, 0x19, 0x10, 0x0c, 0x00, 0xe2, 0x01, 0x00, 0x21, 0x72, 0x09, 0x09, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe2, 0x8f, 0x00, 0x4d, 0x19, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x03, 0x00, 0xec, 0x0f, 0x00, 0x86, 0x79, 0x00, 0x06, 0x09, 0x80, 0x00, 0x00, 0x04, 0x19, 0x10, 0x0c, 0x00, 0xe2, 0x0f, 0x00, 0x4d, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x03, 0x00, 0xea, 0x0f, 0x00, 0x47, 0x79, 0x00, 0x00, 0xf0, 0xff, 0xff, 0xff, 0xff, 0xff, 0x83, 0x03, 0x00, 0xc0, 0x0f, 0x00, 0x18, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0x0f, 0x00, 0x18, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0x0f, 0x00, 0x18, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0x0f, 0x00, 0x18, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0x0f, 0x00, 0x18, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0x0f, 0x00, 0x18, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0x0f, 0x00, 0x18, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0x0f, 0x00, 0x18, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0x0f, 0x00, 0x18, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0x0f, 0x00, 0x18, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0x0f, 0x00, 0x18, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0x0f, 0x00, 0x18, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0x0f, 0x00, 0x18, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0x0f, 0x00, 0x18, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xc0, 0x0f, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x58, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0b, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x98, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x63, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x13, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x18, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xa3, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xb0, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x60, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xb9, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xcc, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x19, 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x7f, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf8, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x98, 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x69, 0x06, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x29, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x30, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x43, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x70, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x34, 0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x78, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x2d, 0x01, 0x00, 0x00, 0x01, 0x00, 0x00, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xac, 0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x20, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x49, 0x01, 0x00, 0x00, 0x0b, 0x00, 0x00, 0x70, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd0, 0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xbc, 0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe0, 0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe0, 0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf0, 0x0c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0a, 0x01, 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0d, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x6d, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x42, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x0d, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x7c, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x32, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x10, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00, 0xc0, 0x15, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00, 0x10, 0x0d, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x70, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x70, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00, 0xc0, 0x15, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 };


void unload_add_kernel_9969bdda_0123(void) {
    CUDA_CHECK(cuModuleUnload(add_kernel_9969bdda_0123_mod));
}

// TODO: some code duplication with `runtime/backend/cuda.c`
void load_add_kernel_9969bdda_0123() {
    int dev = 0;
    void *bin = (void *)&CUBIN_NAME;
    int shared = 0;
    CUDA_CHECK(cuModuleLoadData(&add_kernel_9969bdda_0123_mod, bin));
    CUDA_CHECK(cuModuleGetFunction(&add_kernel_9969bdda_0123_func, add_kernel_9969bdda_0123_mod, "add_kernel"));
    // set dynamic shared memory if necessary
    int shared_optin;
    CUDA_CHECK(cuDeviceGetAttribute(&shared_optin, CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK_OPTIN, dev));
    if (shared > 49152 && shared_optin > 49152) {
      CUDA_CHECK(cuFuncSetCacheConfig(add_kernel_9969bdda_0123_func, CU_FUNC_CACHE_PREFER_SHARED));
      CUDA_CHECK(cuFuncSetAttribute(add_kernel_9969bdda_0123_func, CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES, shared_optin))
    }
}

/*
['BLOCK_SIZE=64', 'num_warps=1', 'num_stages=3']
*/
CUresult add_kernel_9969bdda_0123(CUstream stream, CUdeviceptr x_ptr, CUdeviceptr y_ptr, CUdeviceptr output_ptr, int32_t n_elements) {
    if (add_kernel_9969bdda_0123_func == NULL)
       load_add_kernel_9969bdda_0123();
    unsigned int gX = 1024;
    unsigned int gY = 1024;
    unsigned int gZ = 1024;
    void *args[4] = { &x_ptr, &y_ptr, &output_ptr, &n_elements };
    // TODO: shared memory
    if(gX * gY * gZ > 0)
      return cuLaunchKernel(add_kernel_9969bdda_0123_func, gX, gY, gZ, 1 * 32, 1, 1, 0, stream, args, NULL);
}

We can get compiled kernel pretty easily:

1
2
3
4
5
6
7
8
9
10
11
size = 1024
x = torch.rand(size, device='cuda')
y = torch.rand(size, device='cuda')
grid = lambda meta: (triton.cdiv(size, meta['BLOCK_SIZE']),)

output = torch.empty_like(x)
compiled_kernel = add_kernel[grid](x, y, output, size, BLOCK_SIZE=1024)

# to get all the codegen keys
print(compiled_kernel.asm.keys())
# dict_keys(['llir', 'ttgir', 'ttir', 'ptx', 'cubin'])

From here it easy to see the progressive IR generation as part of the triton compilation process:

TTIR (Triton IR)

1
print(compiled_kernel.asm["ttir"])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#loc = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":9:0)
module {
  tt.func public @add_kernel_0d1d2d3de(%arg0: !tt.ptr<f32, 1> {tt.divisibility = 16 : i32} loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":9:0), %arg1: !tt.ptr<f32, 1> {tt.divisibility = 16 : i32} loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":9:0), %arg2: !tt.ptr<f32, 1> {tt.divisibility = 16 : i32} loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":9:0), %arg3: i32 {tt.divisibility = 16 : i32, tt.max_divisibility = 16 : i32} loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":9:0)) attributes {noinline = false} {
    %c1024_i32 = arith.constant 1024 : i32 loc(#loc1)
    %0 = tt.get_program_id x : i32 loc(#loc2)
    %1 = arith.muli %0, %c1024_i32 : i32 loc(#loc1)
    %2 = tt.make_range {end = 1024 : i32, start = 0 : i32} : tensor<1024xi32> loc(#loc3)
    %3 = tt.splat %1 : (i32) -> tensor<1024xi32> loc(#loc4)
    %4 = arith.addi %3, %2 : tensor<1024xi32> loc(#loc4)
    %5 = tt.splat %arg3 : (i32) -> tensor<1024xi32> loc(#loc5)
    %6 = arith.cmpi slt, %4, %5 : tensor<1024xi32> loc(#loc5)
    %7 = tt.splat %arg0 : (!tt.ptr<f32, 1>) -> tensor<1024x!tt.ptr<f32, 1>> loc(#loc6)
    %8 = tt.addptr %7, %4 : tensor<1024x!tt.ptr<f32, 1>>, tensor<1024xi32> loc(#loc6)
    %9 = tt.load %8, %6 {cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<1024xf32> loc(#loc7)
    %10 = tt.splat %arg1 : (!tt.ptr<f32, 1>) -> tensor<1024x!tt.ptr<f32, 1>> loc(#loc8)
    %11 = tt.addptr %10, %4 : tensor<1024x!tt.ptr<f32, 1>>, tensor<1024xi32> loc(#loc8)
    %12 = tt.load %11, %6 {cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<1024xf32> loc(#loc9)
    %13 = arith.addf %9, %12 : tensor<1024xf32> loc(#loc10)
    %14 = tt.splat %arg2 : (!tt.ptr<f32, 1>) -> tensor<1024x!tt.ptr<f32, 1>> loc(#loc11)
    %15 = tt.addptr %14, %4 : tensor<1024x!tt.ptr<f32, 1>>, tensor<1024xi32> loc(#loc11)
    tt.store %15, %13, %6 {cache = 1 : i32, evict = 1 : i32} : tensor<1024xf32> loc(#loc12)
    tt.return loc(#loc13)
  } loc(#loc)
} loc(#loc)
#loc1 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":23:24)
#loc2 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":18:24)
#loc3 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":24:41)
#loc4 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":24:28)
#loc5 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":26:21)
#loc6 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":29:24)
#loc7 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":29:16)
#loc8 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":30:24)
#loc9 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":30:16)
#loc10 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":31:17)
#loc11 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":33:26)
#loc12 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":33:35)
#loc13 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":33:4)

We can see that IR is more generic and does not have any hardware specific details.

Triton GPU IR (TTGIR)

1
print(compiled_kernel.asm["ttgir"])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#blocked = #triton_gpu.blocked<{sizePerThread = [4], threadsPerWarp = [32], warpsPerCTA = [4], order = [0], CTAsPerCGA = [1], CTASplitNum = [1], CTAOrder = [0]}>
#loc = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":9:0)
module attributes {"triton_gpu.compute-capability" = 89 : i32, "triton_gpu.num-ctas" = 1 : i32, "triton_gpu.num-warps" = 4 : i32, "triton_gpu.threads-per-warp" = 32 : i32} {
  tt.func public @add_kernel_0d1d2d3de(%arg0: !tt.ptr<f32, 1> {tt.divisibility = 16 : i32} loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":9:0), %arg1: !tt.ptr<f32, 1> {tt.divisibility = 16 : i32} loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":9:0), %arg2: !tt.ptr<f32, 1> {tt.divisibility = 16 : i32} loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":9:0), %arg3: i32 {tt.divisibility = 16 : i32, tt.max_divisibility = 16 : i32} loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":9:0)) attributes {noinline = false} {
    %c1024_i32 = arith.constant 1024 : i32 loc(#loc1)
    %0 = tt.get_program_id x : i32 loc(#loc2)
    %1 = arith.muli %0, %c1024_i32 : i32 loc(#loc1)
    %2 = tt.make_range {end = 1024 : i32, start = 0 : i32} : tensor<1024xi32, #blocked> loc(#loc3)
    %3 = tt.splat %1 : (i32) -> tensor<1024xi32, #blocked> loc(#loc4)
    %4 = arith.addi %3, %2 : tensor<1024xi32, #blocked> loc(#loc4)
    %5 = tt.splat %arg3 : (i32) -> tensor<1024xi32, #blocked> loc(#loc5)
    %6 = arith.cmpi slt, %4, %5 : tensor<1024xi32, #blocked> loc(#loc5)
    %7 = tt.splat %arg0 : (!tt.ptr<f32, 1>) -> tensor<1024x!tt.ptr<f32, 1>, #blocked> loc(#loc6)
    %8 = tt.addptr %7, %4 : tensor<1024x!tt.ptr<f32, 1>, #blocked>, tensor<1024xi32, #blocked> loc(#loc6)
    %9 = tt.load %8, %6 {cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<1024xf32, #blocked> loc(#loc7)
    %10 = tt.splat %arg1 : (!tt.ptr<f32, 1>) -> tensor<1024x!tt.ptr<f32, 1>, #blocked> loc(#loc8)
    %11 = tt.addptr %10, %4 : tensor<1024x!tt.ptr<f32, 1>, #blocked>, tensor<1024xi32, #blocked> loc(#loc8)
    %12 = tt.load %11, %6 {cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<1024xf32, #blocked> loc(#loc9)
    %13 = arith.addf %9, %12 : tensor<1024xf32, #blocked> loc(#loc10)
    %14 = tt.splat %arg2 : (!tt.ptr<f32, 1>) -> tensor<1024x!tt.ptr<f32, 1>, #blocked> loc(#loc11)
    %15 = tt.addptr %14, %4 : tensor<1024x!tt.ptr<f32, 1>, #blocked>, tensor<1024xi32, #blocked> loc(#loc11)
    tt.store %15, %13, %6 {cache = 1 : i32, evict = 1 : i32} : tensor<1024xf32, #blocked> loc(#loc12)
    tt.return loc(#loc13)
  } loc(#loc)
} loc(#loc)
#loc1 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":23:24)
#loc2 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":18:24)
#loc3 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":24:41)
#loc4 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":24:28)
#loc5 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":26:21)
#loc6 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":29:24)
#loc7 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":29:16)
#loc8 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":30:24)
#loc9 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":30:16)
#loc10 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":31:17)
#loc11 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":33:26)
#loc12 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":33:35)
#loc13 = loc("/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py":33:4)

The IR is more specific now and has details like compute capability, number of CTAs, warps, threads per warp, etc. Looking at this IR, it does seem to be NVIDIA agnostic mostly but seems to have some NVIDIA specific details like triton_gpu.compute-capability.

LLVM IR

1
print(compiled_kernel.asm["llir"])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"

define void @add_kernel_0d1d2d3de(ptr addrspace(1) %0, ptr addrspace(1) %1, ptr addrspace(1) %2, i32 %3) local_unnamed_addr !dbg !5 {
  %5 = tail call i32 @llvm.nvvm.read.ptx.sreg.tid.x(), !dbg !8
  %6 = shl i32 %5, 2, !dbg !8
  %7 = and i32 %6, 508, !dbg !8
  %8 = tail call i32 asm "mov.u32 $0, %ctaid.x;", "=r"() #1, !dbg !9
  %9 = shl i32 %8, 10, !dbg !10
  %10 = or i32 %9, %7, !dbg !11
  %11 = or i32 %10, 512, !dbg !11
  %12 = icmp slt i32 %10, %3, !dbg !12
  %13 = icmp slt i32 %11, %3, !dbg !12
  %14 = sext i32 %10 to i64, !dbg !13
  %15 = getelementptr float, ptr addrspace(1) %0, i64 %14, !dbg !13
  %16 = sext i32 %11 to i64, !dbg !13
  %17 = getelementptr float, ptr addrspace(1) %0, i64 %16, !dbg !13
  %18 = tail call { i32, i32, i32, i32 } asm sideeffect "mov.u32 $0, 0x0;\0A\09mov.u32 $1, 0x0;\0A\09mov.u32 $2, 0x0;\0A\09mov.u32 $3, 0x0;\0A\09@$5 ld.global.v4.b32 { $0, $1, $2, $3 }, [ $4 + 0 ];", "=r,=r,=r,=r,l,b"(ptr addrspace(1) %15, i1 %12) #1, !dbg !14
  %19 = extractvalue { i32, i32, i32, i32 } %18, 0, !dbg !14
  %20 = extractvalue { i32, i32, i32, i32 } %18, 1, !dbg !14
  %21 = extractvalue { i32, i32, i32, i32 } %18, 2, !dbg !14
  %22 = extractvalue { i32, i32, i32, i32 } %18, 3, !dbg !14
  %23 = bitcast i32 %19 to float, !dbg !14
  %24 = bitcast i32 %20 to float, !dbg !14
  %25 = bitcast i32 %21 to float, !dbg !14
  %26 = bitcast i32 %22 to float, !dbg !14
  %27 = tail call { i32, i32, i32, i32 } asm sideeffect "mov.u32 $0, 0x0;\0A\09mov.u32 $1, 0x0;\0A\09mov.u32 $2, 0x0;\0A\09mov.u32 $3, 0x0;\0A\09@$5 ld.global.v4.b32 { $0, $1, $2, $3 }, [ $4 + 0 ];", "=r,=r,=r,=r,l,b"(ptr addrspace(1) %17, i1 %13) #1, !dbg !14
  %28 = extractvalue { i32, i32, i32, i32 } %27, 0, !dbg !14
  %29 = extractvalue { i32, i32, i32, i32 } %27, 1, !dbg !14
  %30 = extractvalue { i32, i32, i32, i32 } %27, 2, !dbg !14
  %31 = extractvalue { i32, i32, i32, i32 } %27, 3, !dbg !14
  %32 = bitcast i32 %28 to float, !dbg !14
  %33 = bitcast i32 %29 to float, !dbg !14
  %34 = bitcast i32 %30 to float, !dbg !14
  %35 = bitcast i32 %31 to float, !dbg !14
  %36 = getelementptr float, ptr addrspace(1) %1, i64 %14, !dbg !15
  %37 = getelementptr float, ptr addrspace(1) %1, i64 %16, !dbg !15
  %38 = tail call { i32, i32, i32, i32 } asm sideeffect "mov.u32 $0, 0x0;\0A\09mov.u32 $1, 0x0;\0A\09mov.u32 $2, 0x0;\0A\09mov.u32 $3, 0x0;\0A\09@$5 ld.global.v4.b32 { $0, $1, $2, $3 }, [ $4 + 0 ];", "=r,=r,=r,=r,l,b"(ptr addrspace(1) %36, i1 %12) #1, !dbg !16
  %39 = extractvalue { i32, i32, i32, i32 } %38, 0, !dbg !16
  %40 = extractvalue { i32, i32, i32, i32 } %38, 1, !dbg !16
  %41 = extractvalue { i32, i32, i32, i32 } %38, 2, !dbg !16
  %42 = extractvalue { i32, i32, i32, i32 } %38, 3, !dbg !16
  %43 = bitcast i32 %39 to float, !dbg !16
  %44 = bitcast i32 %40 to float, !dbg !16
  %45 = bitcast i32 %41 to float, !dbg !16
  %46 = bitcast i32 %42 to float, !dbg !16
  %47 = tail call { i32, i32, i32, i32 } asm sideeffect "mov.u32 $0, 0x0;\0A\09mov.u32 $1, 0x0;\0A\09mov.u32 $2, 0x0;\0A\09mov.u32 $3, 0x0;\0A\09@$5 ld.global.v4.b32 { $0, $1, $2, $3 }, [ $4 + 0 ];", "=r,=r,=r,=r,l,b"(ptr addrspace(1) %37, i1 %13) #1, !dbg !16
  %48 = extractvalue { i32, i32, i32, i32 } %47, 0, !dbg !16
  %49 = extractvalue { i32, i32, i32, i32 } %47, 1, !dbg !16
  %50 = extractvalue { i32, i32, i32, i32 } %47, 2, !dbg !16
  %51 = extractvalue { i32, i32, i32, i32 } %47, 3, !dbg !16
  %52 = bitcast i32 %48 to float, !dbg !16
  %53 = bitcast i32 %49 to float, !dbg !16
  %54 = bitcast i32 %50 to float, !dbg !16
  %55 = bitcast i32 %51 to float, !dbg !16
  %56 = fadd float %23, %43, !dbg !17
  %57 = fadd float %24, %44, !dbg !17
  %58 = fadd float %25, %45, !dbg !17
  %59 = fadd float %26, %46, !dbg !17
  %60 = fadd float %32, %52, !dbg !17
  %61 = fadd float %33, %53, !dbg !17
  %62 = fadd float %34, %54, !dbg !17
  %63 = fadd float %35, %55, !dbg !17
  %64 = getelementptr float, ptr addrspace(1) %2, i64 %14, !dbg !18
  %65 = getelementptr float, ptr addrspace(1) %2, i64 %16, !dbg !18
  %66 = bitcast float %56 to i32, !dbg !19
  %67 = bitcast float %57 to i32, !dbg !19
  %68 = bitcast float %58 to i32, !dbg !19
  %69 = bitcast float %59 to i32, !dbg !19
  tail call void asm sideeffect "@$5 st.global.v4.b32 [ $4 + 0 ], { $0, $1, $2, $3 };", "r,r,r,r,l,b"(i32 %66, i32 %67, i32 %68, i32 %69, ptr addrspace(1) %64, i1 %12) #1, !dbg !19
  %70 = bitcast float %60 to i32, !dbg !19
  %71 = bitcast float %61 to i32, !dbg !19
  %72 = bitcast float %62 to i32, !dbg !19
  %73 = bitcast float %63 to i32, !dbg !19
  tail call void asm sideeffect "@$5 st.global.v4.b32 [ $4 + 0 ], { $0, $1, $2, $3 };", "r,r,r,r,l,b"(i32 %70, i32 %71, i32 %72, i32 %73, ptr addrspace(1) %65, i1 %13) #1, !dbg !19
  ret void, !dbg !20
}

; Function Attrs: mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare noundef i32 @llvm.nvvm.read.ptx.sreg.tid.x() #0

attributes #0 = { mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none) }
attributes #1 = { nounwind }

!llvm.module.flags = !{!0}
!llvm.dbg.cu = !{!1}
!nvvm.annotations = !{!3, !4, !4, !3}

!0 = !{i32 2, !"Debug Info Version", i32 3}
!1 = distinct !DICompileUnit(language: DW_LANG_C, file: !2, producer: "triton", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
!2 = !DIFile(filename: "vector_add.py", directory: "/home/ksharma/dev/git/ml-projects/triton_internals")
!3 = !{ptr @add_kernel_0d1d2d3de, !"kernel", i32 1}
!4 = !{ptr @add_kernel_0d1d2d3de, !"maxntidx", i32 128}
!5 = distinct !DISubprogram(name: "add_kernel_0d1d2d3de", linkageName: "add_kernel_0d1d2d3de", scope: !2, file: !2, line: 9, type: !6, scopeLine: 9, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !1)
!6 = !DISubroutineType(cc: DW_CC_normal, types: !7)
!7 = !{}
!8 = !DILocation(line: 24, column: 41, scope: !5)
!9 = !DILocation(line: 18, column: 24, scope: !5)
!10 = !DILocation(line: 23, column: 24, scope: !5)
!11 = !DILocation(line: 24, column: 28, scope: !5)
!12 = !DILocation(line: 26, column: 21, scope: !5)
!13 = !DILocation(line: 29, column: 24, scope: !5)
!14 = !DILocation(line: 29, column: 16, scope: !5)
!15 = !DILocation(line: 30, column: 24, scope: !5)
!16 = !DILocation(line: 30, column: 16, scope: !5)
!17 = !DILocation(line: 31, column: 17, scope: !5)
!18 = !DILocation(line: 33, column: 26, scope: !5)
!19 = !DILocation(line: 33, column: 35, scope: !5)
!20 = !DILocation(line: 33, column: 4, scope: !5)

LLVM IR has properties like nvvm.annotations, llvm.nvvm.read.ptx.sreg.tid.x(), etc. which are NVIDIA specific. From here, the next step is PTX.

PTX

1
print(compiled_kernel.asm["ptx"])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
//
// Generated by LLVM NVPTX Back-End
//

.version 8.2
.target sm_89
.address_size 64

	// .globl	add_kernel_0d1d2d3de

.visible .entry add_kernel_0d1d2d3de(
	.param .u64 add_kernel_0d1d2d3de_param_0,
	.param .u64 add_kernel_0d1d2d3de_param_1,
	.param .u64 add_kernel_0d1d2d3de_param_2,
	.param .u32 add_kernel_0d1d2d3de_param_3
)
.maxntid 128, 1, 1
{
	.reg .pred 	%p<7>;
	.reg .b32 	%r<33>;
	.reg .f32 	%f<25>;
	.reg .b64 	%rd<11>;
	.loc	1 9 0
$L__func_begin0:
	.loc	1 9 0

	ld.param.u64 	%rd7, [add_kernel_0d1d2d3de_param_0];
	ld.param.u64 	%rd8, [add_kernel_0d1d2d3de_param_1];
$L__tmp0:
	.loc	1 24 41
	mov.u32 	%r26, %tid.x;
	shl.b32 	%r27, %r26, 2;
	ld.param.u64 	%rd9, [add_kernel_0d1d2d3de_param_2];
	and.b32  	%r28, %r27, 508;
...
...truncated...
...
	.loc	1 33 26
	add.s64 	%rd5, %rd9, %rd10;
	add.s64 	%rd6, %rd5, 2048;
	.loc	1 33 35
	mov.b32 	%r18, %f17;
	mov.b32 	%r19, %f18;
	mov.b32 	%r20, %f19;
	mov.b32 	%r21, %f20;
	@%p1 st.global.v4.b32 [ %rd5 + 0 ], { %r18, %r19, %r20, %r21 };
	mov.b32 	%r22, %f21;
	mov.b32 	%r23, %f22;
	mov.b32 	%r24, %f23;
	mov.b32 	%r25, %f24;
	@%p2 st.global.v4.b32 [ %rd6 + 0 ], { %r22, %r23, %r24, %r25 };
	.loc	1 33 4
	ret;
$L__tmp1:
$L__func_end0:

}
	.file	1 "/home/ksharma/dev/git/ml-projects/triton_internals/vector_add.py"
	.section	.debug_abbrev
	{
.b8 1
.b8 17
.b8 1
.b8 37
.b8 8
...
...truncated...
...
$L__pubTypes_end0:
	}
	.section	.debug_loc	{	}

I don’t know if non-compiler engineers need to get into PTX details or know how to read it. But it’s good to know that Triton can generate PTX code if needed for performance tuning.

CUBIN

Finally, if you want to get the CUBIN, you can check compiled_kernel.asm["cubin"]. This will give you the binary code that can be run on the GPU.

CUBIN - ELF

If you feel the need to read the cubin in a more human-readable format, you can use readelf linux utility to read the cubin file.

1
2
with open("/tmp/cubin_data.o", "wb") as f:
    f.write(compiled_kernel.asm["cubin"])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
$ readelf -a /tmp/cubin_data.o
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 33 07 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            <unknown: 33>
  ABI Version:                       7
  Type:                              EXEC (Executable file)
  Machine:                           NVIDIA CUDA architecture
  Version:                           0x7b
  Entry point address:               0x0
  Start of program headers:          6784 (bytes into file)
  Start of section headers:          5760 (bytes into file)
  Flags:                             0x590559
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         3
  Size of section headers:           64 (bytes)
  Number of section headers:         16
  Section header string table index: 1

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .shstrtab         STRTAB           0000000000000000  00000040
       0000000000000155  0000000000000000           0     0     1
...
...truncated

We can see that the ELF header has details about type of binary, machine, entry point, etc. However, I am not sure how useful it is to be able to read CUBIN.

SASS

Cubin by itself might not be super useful. You can convert it to SASS using get_sass method in triton tools.

1
print(get_sass(compiled_kernel.asm["cubin"]))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Function:add_kernel_0d1d2d3de
--:-:-:Y:2	MOV R1, c[0x0][0x28];
--:-:0:-:1	S2R R0, SR_TID.X;
--:-:-:-:1	MOV R15, 0x4;
--:-:-:-:1	CS2R R16, SRZ;
--:-:-:-:1	CS2R R18, SRZ;
--:-:1:-:1	S2R R3, SR_CTAID.X;
--:-:-:-:1	CS2R R20, SRZ;
--:-:-:-:1	CS2R R22, SRZ;
--:-:-:-:1	ULDC.64 UR4, c[0x0][0x118];
--:-:-:-:1	CS2R R4, SRZ;
--:-:-:-:1	CS2R R6, SRZ;
01:-:-:Y:4	SHF.L.U32 R0, R0, 0x2, RZ;
--:-:-:Y:4	LOP3.LUT R0, R0, 0x1fc, RZ, 0xc0, !PT;
02:-:-:Y:4	LEA R0, R3, R0, 0xa;
--:-:-:-:1	ISETP.GE.AND P0, PT, R0.reuse, c[0x0][0x178], PT;
--:-:-:-:1	IMAD.WIDE R12, R0.reuse, R15, c[0x0][0x168];
--:-:-:Y:4	IADD3 R2, R0, 0x200, RZ;
--:-:-:-:1	ISETP.GE.AND P1, PT, R2, c[0x0][0x178], PT;
--:-:-:-:1	IMAD.WIDE R2, R0, R15, c[0x0][0x160];
--:-:-:-:1	CS2R R8, SRZ;
--:-:-:Y:4	CS2R R10, SRZ;
--:-:2:-:4	@!P0 LDG.E.128 R16, [R2.64];
--:-:2:-:4	@!P0 LDG.E.128 R20, [R12.64];
--:-:3:-:4	@!P1 LDG.E.128 R4, [R2.64+0x800];
--:-:3:-:1	@!P1 LDG.E.128 R8, [R12.64+0x800];
--:-:-:-:1	IMAD.WIDE R14, R0, R15, c[0x0][0x170];
04:-:-:-:1	FADD R16, R20, R16;
--:-:-:-:1	FADD R17, R21, R17;
--:-:-:-:1	FADD R18, R22, R18;
--:-:-:Y:5	FADD R19, R23, R19;
--:0:-:-:1	@!P0 STG.E.128 [R14.64], R16;
08:-:-:-:1	FADD R4, R8, R4;
--:-:-:-:1	FADD R5, R9, R5;
--:-:-:-:1	FADD R6, R10, R6;
--:-:-:-:1	FADD R7, R11, R7;
--:-:-:-:6	@P1 EXIT;
--:-:-:-:1	STG.E.128 [R14.64+0x800], R4;
--:-:-:-:5	EXIT;
LBB0:
--:-:-:Y:0	BRA LBB0;
--:-:-:Y:0	NOP;
--:-:-:Y:0	NOP;
--:-:-:Y:0	NOP;
--:-:-:Y:0	NOP;
--:-:-:Y:0	NOP;
--:-:-:Y:0	NOP;
--:-:-:Y:0	NOP;
--:-:-:Y:0	NOP;
--:-:-:Y:0	NOP;

Conclusion

This was a deep dive into Triton internals. We saw how Triton can be used to compile a simple kernel and then extract the PTX, CUBIN, and SASS from it. My general motivation was to understand how triton works under the hood and learnt a lot.

Sources

Here are some of the sources I referred to while writing this post:

This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.