Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU
Reese Levine, Rithik Sharma, Nitisha Jain, Abhijit Ramesh, Zheyuan Chen, N. Abbas, James Contini, Tyler Sorensen
May 2026
Reese Levine, Rithik Sharma, Nitisha Jain, Abhijit Ramesh, Zheyuan Chen, N. Abbas, James Contini, Tyler Sorensen
May 2026
Reese Levine, Rithik Sharma, Nitisha Jain, Abhijit Ramesh, Zheyuan Chen, N. Abbas, James Contini, Tyler Sorensen
May 2026
Reese Levine, Rithik Sharma, Nitisha Jain, Abhijit Ramesh, Zheyuan Chen, N. Abbas, James Contini, Tyler Sorensen
May 2026
Reese Levine, Rithik Sharma, Nitisha Jain, Abhijit Ramesh, Zheyuan Chen, N. Abbas, James Contini, Tyler Sorensen
May 2026