Research Tools: code, datasets, & models

Tool

GauXC

Skala is a neural network-based exchange-correlation functional for density functional theory (DFT), developed by Microsoft Research AI for Science. It leverages deep learning to predict exchange-correlation energies from electron density features, achieving chemical accuracy for…

GitHub

Tool

cppvrf

A Verifiable Random Function (VRF) is a cryptographic public-key primitive that, from a secret key and a given input, produces a unique pseudorandom output, along with a proof that the output was correctly computed.Only the…

Access

Tool

SafeAgents

A unified framework for building and evaluating safe multi-agent systems SafeAgents provides a simple, framework-agnostic API for creating multi-agent systems with built-in safety evaluation, attack detection, and support for multiple agentic frameworks (Autogen, LangGraph, OpenAI…

GitHub

Tool

BusyBox

BusyBox is a physical 3D-printable device for benchmarking affordance generalization in robot foundation models. It features Please check out our website (opens in new tab) for more details. For fully building a instrumented BusyBox capable…

GitHub Publication

Tool

interwhen

interwhen is a test-time verification framework for language models that enforces correctness with respect to a set of verifiers. It is designed to improve instance-level reliability in reasoning systems, particularly in high-stakes domains where occasional…

GitHub

Tool

Chartifact

Declarative, interactive data documents Chartifact is a low-code document format for creating interactive, data-driven pages such as reports, dashboards, and presentations. It travels like a document and works like a mini app. Designed for use…

GitHub

Tool

LiteBox

A security-focused library OS [!NOTE]This project is currently actively evolving and improving. While we areworking toward a stable release, some APIs and interfaces may change as thedesign continues to mature. You are welcome to explore…

GitHub

Tool

TestExplora

This repository is the official implementation of the paper “TestExplora: Benchmarking LLMs for Proactive Bug Discovery via Repository-Level Test Generation” It can be used for baseline evaluation using the prompts mentioned in the paper. TestExplora…

GitHub Publication

Tool

SABER: Scaling-Aware Best-of-N Estimation of Risk

Scaling-Aware Best-of-N Estimation of Risk A Python package for predicting large-scale adversarial risk in Large Language Models under Best-of-N sampling. Paper: https://arxiv.org/pdf/2601.22636 (opens in new tab) Standard LLM safety evaluations use single-shot (ASR@1) metrics,…

GitHub Publication

Tool

SigmaCollab

SigmaCollab is a dataset that enables research on human-AI physically situated collaboration. The dataset consists of a set of 85 sessions in which untrained participants were guided by a mixed-reality assistive AI agent in performing…

GitHub