论文与出版物 Scaling Agentic Capabilities, Not Context: Efficient Reinforcement Finetuning for Large Toolspaces Karan Gupta, Pranav Vajreshwari, Yash Pandya, Raghav Magazine, Akshay Nambi, Ahmed Awadallah ICLR Agents in the Wild | March 2026
论文与出版物 Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use Aradhye Agarwal, Gurdit Siyan, Yash Pandya, Joykirat Singh, Akshay Nambi, Ahmed Awadallah ICLR Agents in the Wild: Safety, Security, and Beyond | March 2026
微软研究院博客 Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model 2026年3月4日 | Jyoti Aneja, Michael Harrison, Neel Joshi, Tyler LaBonte, John Langford, Eduardo Salinas
论文与出版物 Phi-4-reasoning-vision-15B Technical Report Jyoti Aneja, Michael Harrison, Neel Joshi, Tyler LaBonte, John Langford, Eduardo Salinas MSR-TR-2026-10 | March 2026 作者:Microsoft Research
论文与出版物 Wavelet Predictive Representations for Non-Stationary Reinforcement Learning Min Wang, Xin Li, Ye He, Yao-Hui Li, Hasnaa Bennis, Riashat Islam, Mingzhong Wang ICLR 2026 | October 2025
岗位 Research Intern – Foundations of GenAI Posted: 2026年1月12日 地点: New York, NY, US 研究领域: Artificial intelligence