Multi-Object Advertisement Creative Generation
Applied Scientist II
Are you interested in solving large‑scale search problems and building next‑generation of Query Understanding solutions powered by LLMs and SLMs? The Bing Orca team is a research‑driven applied science and engineering group building the next…
SafeAgents
A unified framework for building and evaluating safe multi-agent systems SafeAgents provides a simple, framework-agnostic API for creating multi-agent systems with built-in safety evaluation, attack detection, and support for multiple agentic frameworks (Autogen, LangGraph, OpenAI…
BusyBox
BusyBox is a physical 3D-printable device for benchmarking affordance generalization in robot foundation models. It features Please check out our website (opens in new tab) for more details. For fully building a instrumented BusyBox capable…
TestExplora
This repository is the official implementation of the paper “TestExplora: Benchmarking LLMs for Proactive Bug Discovery via Repository-Level Test Generation” It can be used for baseline evaluation using the prompts mentioned in the paper. TestExplora…
Systematic debugging for AI agents: Introducing the AgentRx framework
As AI agents transition from simple chatbots to autonomous systems capable of managing cloud incidents, navigating complex web interfaces, and executing multi-step API workflows, a new challenge has emerged: transparency. When a human makes a…