Search Results for author: Erik Miehling

Found 9 papers, 2 papers with code

Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI

no code implementations23 Sep 2024 Ambrish Rawat, Stefan Schoepf, Giulio Zizzo, Giandomenico Cornacchia, Muhammad Zaid Hameed, Kieran Fraser, Erik Miehling, Beat Buesser, Elizabeth M. Daly, Mark Purcell, Prasanna Sattigeri, Pin-Yu Chen, Kush R. Varshney

As generative AI, particularly large language models (LLMs), become increasingly integrated into production applications, new attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems.

Programming Refusal with Conditional Activation Steering

1 code implementation6 Sep 2024 Bruce W. Lee, Inkit Padhi, Karthikeyan Natesan Ramamurthy, Erik Miehling, Pierre Dognin, Manish Nagireddy, Amit Dhurandhar

In this paper, we propose Conditional Activation Steering (CAST), which analyzes LLM activation patterns during inference to selectively apply or withhold activation steering based on the input context.

CELL your Model: Contrastive Explanation Methods for Large Language Models

no code implementations17 Jun 2024 Ronny Luss, Erik Miehling, Amit Dhurandhar

However, in the case of generative AI such as large language models (LLMs), there is no class prediction to explain.

Text Generation

Language Models in Dialogue: Conversational Maxims for Human-AI Interactions

no code implementations22 Mar 2024 Erik Miehling, Manish Nagireddy, Prasanna Sattigeri, Elizabeth M. Daly, David Piorkowski, John T. Richards

Modern language models, while sophisticated, exhibit some inherent shortcomings, particularly in conversational settings.

Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning

1 code implementation2 Apr 2020 Weichao Mao, Kaiqing Zhang, Erik Miehling, Tamer Başar

To enable the development of tractable algorithms, we introduce the concept of an information state embedding that serves to compress agents' histories.

Multi-agent Reinforcement Learning reinforcement-learning +2

Non-Cooperative Inverse Reinforcement Learning

no code implementations NeurIPS 2019 Xiangyuan Zhang, Kaiqing Zhang, Erik Miehling, Tamer Başar

Through interacting with the more informed player, the less informed player attempts to both infer, and act according to, the true objective function.

reinforcement-learning Reinforcement Learning +1

Online Planning for Decentralized Stochastic Control with Partial History Sharing

no code implementations6 Aug 2019 Kaiqing Zhang, Erik Miehling, Tamer Başar

To demonstrate the applicability of the model, we propose a novel collaborative intrusion response model, where multiple agents (defenders) possessing asymmetric information aim to collaboratively defend a computer network.

Decision Making

Cannot find the paper you are looking for? You can Submit a new open access paper.