Reducing the Costs of Proof Synthesis on Rust Systems by Scaling Up a Seed Training Set

Nongyu Di; Tianyu Chen; Shan Lu; Shuai Lu; Yeyun Gong; Peng Cheng; Jay Lorch; Yuan Yao; Xiaoxing Ma

Reducing the Costs of Proof Synthesis on Rust Systems by Scaling Up a Seed Training Set

Nongyu Di ,
Tianyu Chen ,
Shan Lu ,
Shuai Lu ,
Yeyun Gong ,
Peng Cheng ,
Jay Lorch ,
Yuan Yao ,
Xiaoxing Ma

February 2026

Download BibTex

Large Language Models (LLMs) are widely used for code generation. However, the correctness of code generated by LLMs remains a concern. A potential remedy to this concern is to have LLMs generate formal correctness proofs along with such code. However, compared with code generation, code-proof generation requires much higher reasoning capability and has much less existing data to learn from. In this paper, we present VeruSyn, a data synthesis pipeline for Verus, a state-of-the-art verification tool for system software written in Rust. Through self-synthesis and tutorial-based synthesis, VeruSyn achieves much larger scale and Verus-feature coverage than previous data-synthesis techniques designed for Verus; VeruSyn also supplements its dataset with long-chain-of-thought (CoT) data through agent trajectory synthesis. With VeruSyn, we synthesize the largest set of Verus verified programs: 6.9 million Rust programs, each with a formal specification and a proof that it meets that specification. This dataset lets us create a fine-tuned Qwen2.5-Coder-32B-Instruct model with appealing cost-proof tradeoff compared with state-of-the-art commercial models like Claude Sonnet 4.5. It also significantly outperforms models like o4-mini and previously proposed research models.

The training data set produced by this work can be accessed at microsoft/Verus_Training_Data · Datasets at Hugging Face (opens in new tab)