Accepted Papers

Technical Papers

Towards Automated Monitor Configuration for Cloud Services: A Data-Driven AIOps Framework Informed by Industrial Practice

Authors: Anson Bastos, Anjaly Parayil (Microsoft); Ayush Choure (Independent); Chetan Bansal, Rujia Wang (Microsoft)


CAPES: Causal Analysis of Power Effect under Score-based Scheduling

Authors: Jiali Xing, William Meng, Ziqi Meng (University of Pennsylvania); Liangcheng Yu (Microsoft Research); Vincent Liu, Benjamin Lee (University of Pennsylvania)


ActionNex: A Virtual Outage Manager for Cloud Computing

Authors: Zhenfeng Lin, Haoji Hu, Ming Hao (Microsoft); Xuchao Zhang (Microsoft Research); Ryan Zhang, Junhao Li, Oleg Kulygin, Ze Li, Sheila Jiang, Chetan Bansal, Hatay Tuna, Salman Zafar (Microsoft)


Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?

Authors: Taeyoon Kim, Woohyeok Park (Hanyang University); Hoyeong Yun (OKESTRO Co., Ltd.); Kyungyong Lee (Hanyang University)


Can LLMs Heal Themselves? An Empirical Study of Automation Gaps in LLM Serving Systems

Authors: Bhala Ranganathan, Minghua Ma, Mickey Zhang, Klein Hu, Rakesh Kelkar, Chetan Bansal (Microsoft)


Abstracts

Speculative Load Fusion for Cloud Efficiency

Authors: Deepanjali Mishra (Carnegie Mellon University); Tanvir Ahmed Khan (Columbia University in the City of New York); Gilles Pokam (Intel); Heiner Litz (UC Santa Cruz); Akshitha Sriraman (Carnegie Mellon University)


Multi-Modal Outage Detection via Graph-Enhanced Retrieval

Authors: Udaivir Yadav, Francisco Mandujano-Reyes, Youjiang Wu (Microsoft)


Markov Models for Improved Outage Candidate Generation in Supervised Outage Prediction Models

Authors: Aditya Mate, Youjiang Wu, Joe Hu, Udaivir Yadav (Microsoft); Yingnong Dang (Microsoft Azure)


Design Principles for Agentic RCA

Authors: Sayan Sinha (Georgia Tech/Conviva); Vipul Harsh (Conviva); B. Aditya Prakash (Georgia Tech); Vyas Sekar, Hui Zhang (Carnegie Mellon University/Conviva)


Need to find us?