Publications
You can also find my articles on Google Scholar.
- Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU
- SIMT-Step Execution: A Flexible Operational Semantics For GPU Subgroup Behavior
- BetterTogether: An Interference-Aware Framework for Fine-grained Software Pipelining on Heterogeneous SoCs.
- sqlelf: a SQL-centric Approach to ELF Analysis