Search Results for author: Manish Nagireddy

Found 12 papers, 2 papers with code

Programming Refusal with Conditional Activation Steering

1 code implementation6 Sep 2024 Bruce W. Lee, Inkit Padhi, Karthikeyan Natesan Ramamurthy, Erik Miehling, Pierre Dognin, Manish Nagireddy, Amit Dhurandhar

In this paper, we propose Conditional Activation Steering (CAST), which analyzes LLM activation patterns during inference to selectively apply or withhold activation steering based on the input context.

Value Alignment from Unstructured Text

no code implementations19 Aug 2024 Inkit Padhi, Karthikeyan Natesan Ramamurthy, Prasanna Sattigeri, Manish Nagireddy, Pierre Dognin, Kush R. Varshney

Aligning large language models (LLMs) to value systems has emerged as a significant area of research within the fields of AI and NLP.

Synthetic Data Generation

When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails

no code implementations8 Jul 2024 Manish Nagireddy, Inkit Padhi, Soumya Ghosh, Prasanna Sattigeri

Motivated by findings from developing a detector for social bias, we adopt the notion of a use-mention distinction - which we identified as the primary source of under-performance in the preliminary versions of our social bias detector.

Synthetic Data Generation

The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers

1 code implementation3 Apr 2024 Hussein Mozannar, Valerie Chen, Mohammed Alsobay, Subhro Das, Sebastian Zhao, Dennis Wei, Manish Nagireddy, Prasanna Sattigeri, Ameet Talwalkar, David Sontag

Evaluation of large language models (LLMs) for code has primarily relied on static benchmarks, including HumanEval (Chen et al., 2021), which measure the ability of LLMs to generate complete code that passes unit tests.

Language Models in Dialogue: Conversational Maxims for Human-AI Interactions

no code implementations22 Mar 2024 Erik Miehling, Manish Nagireddy, Prasanna Sattigeri, Elizabeth M. Daly, David Piorkowski, John T. Richards

Modern language models, while sophisticated, exhibit some inherent shortcomings, particularly in conversational settings.

Multi-Level Explanations for Generative Language Models

no code implementations21 Mar 2024 Lucas Monteiro Paes, Dennis Wei, Hyo Jin Do, Hendrik Strobelt, Ronny Luss, Amit Dhurandhar, Manish Nagireddy, Karthikeyan Natesan Ramamurthy, Prasanna Sattigeri, Werner Geyer, Soumya Ghosh

To address the challenges of text as output and long text inputs, we propose a general framework called MExGen that can be instantiated with different attribution algorithms.

Question Answering text-classification +1

SocialStigmaQA: A Benchmark to Uncover Stigma Amplification in Generative Language Models

no code implementations12 Dec 2023 Manish Nagireddy, Lamogha Chiazor, Moninder Singh, Ioana Baldini

Current datasets for unwanted social bias auditing are limited to studying protected demographic features such as race and gender.

Question Answering

Function Composition in Trustworthy Machine Learning: Implementation Choices, Insights, and Questions

no code implementations17 Feb 2023 Manish Nagireddy, Moninder Singh, Samuel C. Hoffman, Evaline Ju, Karthikeyan Natesan Ramamurthy, Kush R. Varshney

In this paper, focusing specifically on compositions of functions arising from the different pillars, we aim to reduce this gap, develop new insights for trustworthy ML, and answer questions such as the following.

Adversarial Robustness Fairness +1

Exploring How Machine Learning Practitioners (Try To) Use Fairness Toolkits

no code implementations13 May 2022 Wesley Hanwen Deng, Manish Nagireddy, Michelle Seng Ah Lee, Jatinder Singh, Zhiwei Steven Wu, Kenneth Holstein, Haiyi Zhu

Recent years have seen the development of many open-source ML fairness toolkits aimed at helping ML practitioners assess and address unfairness in their systems.

BIG-bench Machine Learning Fairness

A Sandbox Tool to Bias(Stress)-Test Fairness Algorithms

no code implementations21 Apr 2022 Nil-Jana Akpinar, Manish Nagireddy, Logan Stapleton, Hao-Fei Cheng, Haiyi Zhu, Steven Wu, Hoda Heidari

This stylized setup offers the distinct capability of testing fairness interventions beyond observational data and against an unbiased benchmark.

Fairness

Cannot find the paper you are looking for? You can Submit a new open access paper.