1 code implementation • 6 Sep 2024 • Bruce W. Lee, Inkit Padhi, Karthikeyan Natesan Ramamurthy, Erik Miehling, Pierre Dognin, Manish Nagireddy, Amit Dhurandhar
In this paper, we propose Conditional Activation Steering (CAST), which analyzes LLM activation patterns during inference to selectively apply or withhold activation steering based on the input context.
no code implementations • 19 Aug 2024 • Inkit Padhi, Karthikeyan Natesan Ramamurthy, Prasanna Sattigeri, Manish Nagireddy, Pierre Dognin, Kush R. Varshney
Aligning large language models (LLMs) to value systems has emerged as a significant area of research within the fields of AI and NLP.
no code implementations • 8 Jul 2024 • Manish Nagireddy, Inkit Padhi, Soumya Ghosh, Prasanna Sattigeri
Motivated by findings from developing a detector for social bias, we adopt the notion of a use-mention distinction - which we identified as the primary source of under-performance in the preliminary versions of our social bias detector.
1 code implementation • 3 Apr 2024 • Hussein Mozannar, Valerie Chen, Mohammed Alsobay, Subhro Das, Sebastian Zhao, Dennis Wei, Manish Nagireddy, Prasanna Sattigeri, Ameet Talwalkar, David Sontag
Evaluation of large language models (LLMs) for code has primarily relied on static benchmarks, including HumanEval (Chen et al., 2021), which measure the ability of LLMs to generate complete code that passes unit tests.
no code implementations • 22 Mar 2024 • Erik Miehling, Manish Nagireddy, Prasanna Sattigeri, Elizabeth M. Daly, David Piorkowski, John T. Richards
Modern language models, while sophisticated, exhibit some inherent shortcomings, particularly in conversational settings.
no code implementations • 21 Mar 2024 • Lucas Monteiro Paes, Dennis Wei, Hyo Jin Do, Hendrik Strobelt, Ronny Luss, Amit Dhurandhar, Manish Nagireddy, Karthikeyan Natesan Ramamurthy, Prasanna Sattigeri, Werner Geyer, Soumya Ghosh
To address the challenges of text as output and long text inputs, we propose a general framework called MExGen that can be instantiated with different attribution algorithms.
no code implementations • 19 Mar 2024 • Pierre Dognin, Jesus Rios, Ronny Luss, Inkit Padhi, Matthew D Riemer, Miao Liu, Prasanna Sattigeri, Manish Nagireddy, Kush R. Varshney, Djallel Bouneffouf
Developing value-aligned AI agents is a complex undertaking and an ongoing challenge in the field of AI.
no code implementations • 9 Mar 2024 • Swapnaja Achintalwar, Adriana Alvarado Garcia, Ateret Anaby-Tavor, Ioana Baldini, Sara E. Berger, Bishwaranjan Bhattacharjee, Djallel Bouneffouf, Subhajit Chaudhury, Pin-Yu Chen, Lamogha Chiazor, Elizabeth M. Daly, Kirushikesh DB, Rogério Abreu de Paula, Pierre Dognin, Eitan Farchi, Soumya Ghosh, Michael Hind, Raya Horesh, George Kour, Ja Young Lee, Nishtha Madaan, Sameep Mehta, Erik Miehling, Keerthiram Murugesan, Manish Nagireddy, Inkit Padhi, David Piorkowski, Ambrish Rawat, Orna Raz, Prasanna Sattigeri, Hendrik Strobelt, Sarathkrishna Swaminathan, Christoph Tillmann, Aashka Trivedi, Kush R. Varshney, Dennis Wei, Shalisha Witherspooon, Marcel Zalmanovici
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations.
no code implementations • 12 Dec 2023 • Manish Nagireddy, Lamogha Chiazor, Moninder Singh, Ioana Baldini
Current datasets for unwanted social bias auditing are limited to studying protected demographic features such as race and gender.
no code implementations • 17 Feb 2023 • Manish Nagireddy, Moninder Singh, Samuel C. Hoffman, Evaline Ju, Karthikeyan Natesan Ramamurthy, Kush R. Varshney
In this paper, focusing specifically on compositions of functions arising from the different pillars, we aim to reduce this gap, develop new insights for trustworthy ML, and answer questions such as the following.
no code implementations • 13 May 2022 • Wesley Hanwen Deng, Manish Nagireddy, Michelle Seng Ah Lee, Jatinder Singh, Zhiwei Steven Wu, Kenneth Holstein, Haiyi Zhu
Recent years have seen the development of many open-source ML fairness toolkits aimed at helping ML practitioners assess and address unfairness in their systems.
no code implementations • 21 Apr 2022 • Nil-Jana Akpinar, Manish Nagireddy, Logan Stapleton, Hao-Fei Cheng, Haiyi Zhu, Steven Wu, Hoda Heidari
This stylized setup offers the distinct capability of testing fairness interventions beyond observational data and against an unbiased benchmark.