Tool
vAttention
vAttention is a memory manager for KV-cache in LLM serving systems. It decouples the allocation of virtual memory and physical memory using the CUDA virtual memory APIs. This approach enables allocating physical memory on demand…
Publication
MGit: A Model Versioning and Management System
Group
Networking Infrastructure Group
The Networking Infrastructure Group at Microsoft Research Asia engages in fundamental research on all aspects of computer networking and infrastructure.