Event
OSDI 2024
Microsoft is proud to sponsor the 18th USENIX Symposium on Operating Systems Design and Implementation (opens in new tab). OSDI brings together professionals from academic and industrial backgrounds in what has become a premier forum…
Publication
A Call for Research on Storage Emissions
Project
MInference: Million-Tokens Prompt Inference for Long-context LLMs
Million-Tokens Prompt Inference for Long-context LLMs MInference 1.0 leverages the dynamic sparse nature of LLMs’ attention, which exhibits some static patterns, to speed up the pre-filling for long-context LLMs. It first determines offline which sparse pattern…
Tool
BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.