Technical Papers
Towards Automated Monitor Configuration for Cloud Services: A Data-Driven AIOps Framework Informed by Industrial Practice
Authors: Anson Bastos, Anjaly Parayil (Microsoft); Ayush Choure (Independent); Chetan Bansal, Rujia Wang (Microsoft)
CAPES: Causal Analysis of Power Effect under Score-based Scheduling
Authors: Jiali Xing, William Meng, Ziqi Meng (University of Pennsylvania); Liangcheng Yu (Microsoft Research); Vincent Liu, Benjamin Lee (University of Pennsylvania)
ActionNex: A Virtual Outage Manager for Cloud Computing
Authors: Zhenfeng Lin, Haoji Hu, Ming Hao (Microsoft); Xuchao Zhang (Microsoft Research); Ryan Zhang, Junhao Li, Oleg Kulygin, Ze Li, Sheila Jiang, Chetan Bansal, Hatay Tuna, Salman Zafar (Microsoft)
Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?
Authors: Taeyoon Kim, Woohyeok Park (Hanyang University); Hoyeong Yun (OKESTRO Co., Ltd.); Kyungyong Lee (Hanyang University)
Can LLMs Heal Themselves? An Empirical Study of Automation Gaps in LLM Serving Systems
Authors: Bhala Ranganathan, Minghua Ma, Mickey Zhang, Klein Hu, Rakesh Kelkar, Chetan Bansal (Microsoft)
Abstracts
Speculative Load Fusion for Cloud Efficiency
Authors: Deepanjali Mishra (Carnegie Mellon University); Tanvir Ahmed Khan (Columbia University in the City of New York); Gilles Pokam (Intel); Heiner Litz (UC Santa Cruz); Akshitha Sriraman (Carnegie Mellon University)
Multi-Modal Outage Detection via Graph-Enhanced Retrieval
Authors: Udaivir Yadav, Francisco Mandujano-Reyes, Youjiang Wu (Microsoft)
Markov Models for Improved Outage Candidate Generation in Supervised Outage Prediction Models
Authors: Aditya Mate, Youjiang Wu, Joe Hu, Udaivir Yadav (Microsoft); Yingnong Dang (Microsoft Azure)
Design Principles for Agentic RCA
Authors: Sayan Sinha (Georgia Tech/Conviva); Vipul Harsh (Conviva); B. Aditya Prakash (Georgia Tech); Vyas Sekar, Hui Zhang (Carnegie Mellon University/Conviva)