This is the posts listing page. Below you’ll find published posts I’ve written.
Mamba Notes
Note: This is a rough, work-in-progress post — things may change or be incomplete. This post describes the triton implementation of the backward pass of the Mamba-2 Chunking sequence layer. We use adjoint notation, i.e \(\bar{A} \) means \(\frac{\partial L} {\partial {A}} \).The only important result to keep in mind here is how adjoints flow over matmuls, and that broadcasting calls for sum over the partial gradients over the dimension that was broadcasted on in the first place ...