no code implementations • 23 Sep 2024 • Ambrish Rawat, Stefan Schoepf, Giulio Zizzo, Giandomenico Cornacchia, Muhammad Zaid Hameed, Kieran Fraser, Erik Miehling, Beat Buesser, Elizabeth M. Daly, Mark Purcell, Prasanna Sattigeri, Pin-Yu Chen, Kush R. Varshney
As generative AI, particularly large language models (LLMs), become increasingly integrated into production applications, new attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems.
1 code implementation • 6 Sep 2024 • Bruce W. Lee, Inkit Padhi, Karthikeyan Natesan Ramamurthy, Erik Miehling, Pierre Dognin, Manish Nagireddy, Amit Dhurandhar
In this paper, we propose Conditional Activation Steering (CAST), which analyzes LLM activation patterns during inference to selectively apply or withhold activation steering based on the input context.
no code implementations • 17 Jun 2024 • Ronny Luss, Erik Miehling, Amit Dhurandhar
However, in the case of generative AI such as large language models (LLMs), there is no class prediction to explain.
no code implementations • 22 Mar 2024 • Erik Miehling, Manish Nagireddy, Prasanna Sattigeri, Elizabeth M. Daly, David Piorkowski, John T. Richards
Modern language models, while sophisticated, exhibit some inherent shortcomings, particularly in conversational settings.
no code implementations • 9 Mar 2024 • Swapnaja Achintalwar, Adriana Alvarado Garcia, Ateret Anaby-Tavor, Ioana Baldini, Sara E. Berger, Bishwaranjan Bhattacharjee, Djallel Bouneffouf, Subhajit Chaudhury, Pin-Yu Chen, Lamogha Chiazor, Elizabeth M. Daly, Kirushikesh DB, Rogério Abreu de Paula, Pierre Dognin, Eitan Farchi, Soumya Ghosh, Michael Hind, Raya Horesh, George Kour, Ja Young Lee, Nishtha Madaan, Sameep Mehta, Erik Miehling, Keerthiram Murugesan, Manish Nagireddy, Inkit Padhi, David Piorkowski, Ambrish Rawat, Orna Raz, Prasanna Sattigeri, Hendrik Strobelt, Sarathkrishna Swaminathan, Christoph Tillmann, Aashka Trivedi, Kush R. Varshney, Dennis Wei, Shalisha Witherspooon, Marcel Zalmanovici
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations.
no code implementations • 9 Sep 2020 • Muhammad Aneeq uz Zaman, Kaiqing Zhang, Erik Miehling, Tamer Başar
We propose an actor-critic algorithm to iteratively compute the mean-field equilibrium (MFE) of the LQ-MFG.
Multi-agent Reinforcement Learning reinforcement-learning +1
1 code implementation • 2 Apr 2020 • Weichao Mao, Kaiqing Zhang, Erik Miehling, Tamer Başar
To enable the development of tractable algorithms, we introduce the concept of an information state embedding that serves to compress agents' histories.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • NeurIPS 2019 • Xiangyuan Zhang, Kaiqing Zhang, Erik Miehling, Tamer Başar
Through interacting with the more informed player, the less informed player attempts to both infer, and act according to, the true objective function.
no code implementations • 6 Aug 2019 • Kaiqing Zhang, Erik Miehling, Tamer Başar
To demonstrate the applicability of the model, we propose a novel collaborative intrusion response model, where multiple agents (defenders) possessing asymmetric information aim to collaboratively defend a computer network.