Torch
We aim to develop practical tools and techniques that can help cloud developers adequately debug, test, configure, and monitor their systems. The research spans all aspects of improving reliability and availability of large-scale cloud systems,…
Hyperscale cloud reliability and the art of organic collaboration
What does it take to build one of the most reliable hyperscale clouds on the planet? It clearly requires astronomical investments and a vast organization that operates at global scale in near seamless coordination. Yet…
Rethinking Networking for “Five Computers”
SeeDot: compiler for low-precision machine learning
The emergence of IoT and Machine Learning (ML) has seen an increase in systems that deploy sensors to collect data and analyze the data using ML algorithms in the cloud. However, running the ML classifiers directly…
TLA+ Specifications of the Consistency Guarantees Provided by Cosmos DB
Microsoft Azure Cosmos DB provides 5 well defined operation consistency properties to the clients: strong consistency, bounded staleness, session consistency, consistent prefix, and eventual consistency. Here we provide client-centric TLA+ specifications of these properties to…