Managing Messes in Computational Notebooks
Scaling Write-Intensive Key-Value Stores
In recent years, the log-structured merge-tree (LSM-tree) has become the mainstream core data structure used by key-value stores to ingest and persist data quickly. LSM-tree enables fast writes by buffering incoming data in memory and…
Analytics Scripts for Azure Open Datasets
A set of Databricks and Azure notebook scripts that compute urban heat island effects for all major cities in the United States. Daily indices for each city are computed from January 1, 2008 to present…
Froid and the relational database query quandary with Dr. Karthik Ramachandra
Episode 73, April 24, 2019 – In the world of relational databases, structured query language, or SQL, has long been King of the Queries, primarily because of its ubiquity and unparalleled performance. But many users…
A Client-centric Approach to Transactional Datastores
Modern applications must collect and store massive amounts of data. Cloud storage offers these applications simplicity: the abstraction of a failure-free, perfectly scalable black-box. While appealing, offloading data to the cloud is not without challenges.…
Froid
Froid is an extensible, language-agnostic framework for optimizing imperative functions in databases. The purpose of Froid is to enable developers to use the abstraction of UDFs without compromising on performance.
MLlib*: Fast Training of GLMs using Spark MLlib
Visualization for People + Systems
Making sense of large and complex data requires methods that integrate human judgment and domain expertise with modern data processing systems. To meet this challenge, my work combines methods from visualization, data management, human-computer interaction,…