Open data using Cloud infrastructure: An initiative to host, share and use open data using cloud services
In this paper, we discuss Microsoft Research Open Data, a new data repository in the cloud dedicated to facilitating collaboration across the global research community. The repository provides a single, convenient location for research datasets. The datasets span many domains such as computer science, social science, biology, genomics and others, representing many years of data curation efforts by researchers. The datasets are accompanied by meaningful research assets such as meta data and publications. The data can seamlessly be copied to a data user’s cloud subscription on powerful data science virtual machines that accelerate research reproducibility and advance research outcomes using the data.