Search Results for author: Saravan Rajmohan

Found 27 papers, 8 papers with code

Exploring LLM-based Agents for Root Cause Analysis

no code implementations7 Mar 2024 Devjeet Roy, Xuchao Zhang, Rashi Bhave, Chetan Bansal, Pedro Las-Casas, Rodrigo Fonseca, Saravan Rajmohan

Lastly, we conduct a case study with a team at Microsoft to equip the ReAct agent with tools that give it access to external diagnostic services that are used by the team for manual RCA.

Management Retrieval

Intelligent Monitoring Framework for Cloud Services: A Data-Driven Approach

no code implementations29 Feb 2024 Pooja Srinivas, Fiza Husain, Anjaly Parayil, Ayush Choure, Chetan Bansal, Saravan Rajmohan

We conduct an extensive empirical study and derive key insights on the major classes of monitors employed by cloud services at Microsoft, their associated dimensions, and the interrelationship between service properties and this ontology.

UFO: A UI-Focused Agent for Windows OS Interaction

1 code implementation8 Feb 2024 Chaoyun Zhang, Liqun Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Minghua Ma, Yu Kang, QIngwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective

1 code implementation5 Feb 2024 Zexin Wang, Changhua Pei, Minghua Ma, Xin Wang, Zhihan Li, Dan Pei, Saravan Rajmohan, Dongmei Zhang, QIngwei Lin, Haiming Zhang, Jianhui Li, Gaogang Xie

To ensure an accurate AD, FCVAE exploits an innovative approach to concurrently integrate both the global and local frequency features into the condition of Conditional Variational Autoencoder (CVAE) to significantly increase the accuracy of reconstructing the normal data.

Anomaly Detection Time Series +1

Dependency Aware Incident Linking in Large Cloud Systems

no code implementations5 Feb 2024 Supriyo Ghosh, Karish Grover, Jimmy Wong, Chetan Bansal, Rakesh Namineni, Mohit Verma, Saravan Rajmohan

In this paper, we propose the dependency-aware incident linking (DiLink) framework which leverages both textual and service dependency graph information to improve the accuracy and coverage of incident links not only coming from same service, but also from different services and workloads.

Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-4

no code implementations24 Jan 2024 Xuchao Zhang, Supriyo Ghosh, Chetan Bansal, Rujia Wang, Minghua Ma, Yu Kang, Saravan Rajmohan

The results reveal that our in-context learning approach outperforms the previous fine-tuned large language models such as GPT-3 by an average of 24. 8\% across all metrics, with an impressive 49. 7\% improvement over the zero-shot model.

In-Context Learning

Contrastive Learning with Negative Sampling Correction

no code implementations13 Jan 2024 Lu Wang, Chao Du, Pu Zhao, Chuan Luo, Zhangchi Zhu, Bo Qiao, Wei zhang, QIngwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

To correct the negative sampling bias, we propose a novel contrastive learning method named Positive-Unlabeled Contrastive Learning (PUCL).

Contrastive Learning Data Augmentation +2

COIN: Chance-Constrained Imitation Learning for Uncertainty-aware Adaptive Resource Oversubscription Policy

no code implementations13 Jan 2024 Lu Wang, Mayukh Das, Fangkai Yang, Chao Duo, Bo Qiao, Hang Dong, Si Qin, Chetan Bansal, QIngwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

We address the challenge of learning safe and robust decision policies in presence of uncertainty in context of the real scientific problem of adaptive resource oversubscription to enhance resource efficiency while ensuring safety against resource congestion risk.

Imitation Learning Management

Why does Prediction Accuracy Decrease over Time? Uncertain Positive Learning for Cloud Failure Prediction

no code implementations8 Jan 2024 Haozhe Li, Minghua Ma, Yudong Liu, Pu Zhao, Lingling Zheng, Ze Li, Yingnong Dang, Murali Chintalapati, Saravan Rajmohan, QIngwei Lin, Dongmei Zhang

Using two real-world datasets of disk failure prediction and conducting node prediction experiments in Microsoft Azure, which is a top-tier cloud provider that serves millions of users, we demonstrate Uptake can significantly improve the failure prediction accuracy by 5% on average.

Cloud Computing

Xpert: Empowering Incident Management with Query Recommendations via Large Language Models

no code implementations19 Dec 2023 YuXuan Jiang, Chaoyun Zhang, Shilin He, Zhihao Yang, Minghua Ma, Si Qin, Yu Kang, Yingnong Dang, Saravan Rajmohan, QIngwei Lin, Dongmei Zhang

This paper presents a thorough empirical study on the utilization of queries of KQL, a DSL employed for incident management in a large-scale cloud management system at Microsoft.

Management

TaskWeaver: A Code-First Agent Framework

1 code implementation29 Nov 2023 Bo Qiao, Liqun Li, Xu Zhang, Shilin He, Yu Kang, Chaoyun Zhang, Fangkai Yang, Hang Dong, Jue Zhang, Lu Wang, Minghua Ma, Pu Zhao, Si Qin, Xiaoting Qin, Chao Du, Yong Xu, QIngwei Lin, Saravan Rajmohan, Dongmei Zhang

TaskWeaver provides support for rich data structures, flexible plugin usage, and dynamic plugin selection, and leverages LLM coding capabilities for complex logic.

Natural Language Understanding

Rethinking Privacy in Machine Learning Pipelines from an Information Flow Control Perspective

no code implementations27 Nov 2023 Lukas Wutschitz, Boris Köpf, Andrew Paverd, Saravan Rajmohan, Ahmed Salem, Shruti Tople, Santiago Zanella-Béguelin, Menglin Xia, Victor Rühle

In this paper, we take an information flow control perspective to describe machine learning systems, which allows us to leverage metadata such as access control policies and define clear-cut privacy and confidentiality guarantees with interpretable information flows.

Retrieval

Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation

1 code implementation7 Nov 2023 Ruomeng Ding, Chaoyun Zhang, Lu Wang, Yong Xu, Minghua Ma, Wei zhang, Si Qin, Saravan Rajmohan, QIngwei Lin, Dongmei Zhang

To address these limitations, we introduce a novel thought prompting approach called "Everything of Thoughts" (XoT) to defy the law of "Penrose triangle of existing thought paradigms.

Decision Making

PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation with GPT-4 in Cloud Incident Root Cause Analysis

no code implementations11 Sep 2023 Dylan Zhang, Xuchao Zhang, Chetan Bansal, Pedro Las-Casas, Rodrigo Fonseca, Saravan Rajmohan

Major cloud providers have employed advanced AI-based solutions like large language models to aid humans in identifying the root causes of cloud incidents.

Decision Making

Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction

1 code implementation1 Aug 2023 Zhangchi Zhu, Lu Wang, Pu Zhao, Chao Du, Wei zhang, Hang Dong, Bo Qiao, QIngwei Lin, Saravan Rajmohan, Dongmei Zhang

To mitigate the impact of label uncertainty and improve the robustness of learning with positive and unlabeled data, we propose a new robust PU learning method with a training strategy motivated by the nature of human learning: easy cases should be learned first.

ImDiffusion: Imputed Diffusion Models for Multivariate Time Series Anomaly Detection

1 code implementation3 Jul 2023 Yuhang Chen, Chaoyun Zhang, Minghua Ma, Yudong Liu, Ruomeng Ding, Bowen Li, Shilin He, Saravan Rajmohan, QIngwei Lin, Dongmei Zhang

To the best of our knowledge, ImDiffusion represents a pioneering approach that combines imputation-based techniques with time series anomaly detection, while introducing the novel use of diffusion models to the field.

Anomaly Detection Imputation +2

Introspective Tips: Large Language Model for In-Context Decision Making

no code implementations19 May 2023 Liting Chen, Lu Wang, Hang Dong, Yali Du, Jie Yan, Fangkai Yang, Shuang Li, Pu Zhao, Si Qin, Saravan Rajmohan, QIngwei Lin, Dongmei Zhang

The emergence of large language models (LLMs) has substantially influenced natural language processing, demonstrating exceptional results across various tasks.

Decision Making Language Modelling +2

Empower Large Language Model to Perform Better on Industrial Domain-Specific Question Answering

1 code implementation19 May 2023 Fangkai Yang, Pu Zhao, Zezhong Wang, Lu Wang, Jue Zhang, Mohit Garg, QIngwei Lin, Saravan Rajmohan, Dongmei Zhang

Large Language Model (LLM) has gained popularity and achieved remarkable results in open-domain tasks, but its performance in real industrial domain-specific scenarios is average due to its lack of specific domain knowledge.

Language Modelling Large Language Model +2

Conservative State Value Estimation for Offline Reinforcement Learning

1 code implementation NeurIPS 2023 Liting Chen, Jie Yan, Zhengdao Shao, Lu Wang, QIngwei Lin, Saravan Rajmohan, Thomas Moscibroda, Dongmei Zhang

In this paper, we propose Conservative State Value Estimation (CSVE), a new approach that learns conservative V-function via directly imposing penalty on OOD states.

D4RL reinforcement-learning

Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models

no code implementations10 Jan 2023 Toufique Ahmed, Supriyo Ghosh, Chetan Bansal, Thomas Zimmermann, Xuchao Zhang, Saravan Rajmohan

In this work, we do the first large-scale study to evaluate the effectiveness of these models for helping engineers root cause and mitigate production incidents.

Management Question Answering +1

Learning Cooperative Oversubscription for Cloud by Chance-Constrained Multi-Agent Reinforcement Learning

no code implementations21 Nov 2022 Junjie Sheng, Lu Wang, Fangkai Yang, Bo Qiao, Hang Dong, Xiangfeng Wang, Bo Jin, Jun Wang, Si Qin, Saravan Rajmohan, QIngwei Lin, Dongmei Zhang

To address these two limitations, this paper formulates the oversubscription for cloud as a chance-constrained optimization problem and propose an effective Chance Constrained Multi-Agent Reinforcement Learning (C2MARL) method to solve this problem.

Multi-agent Reinforcement Learning reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.