Search Results for author: Satyapriya Krishna

Found 22 papers, 8 papers with code

More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness

1 code implementation • 29 Apr 2024 • Aaron J. Li, Satyapriya Krishna, Himabindu Lakkaraju

The surge in Large Language Models (LLMs) development has led to improved performance on cognitive tasks as well as an urgent need to align these models with human values in order to safely exploit their power.

Ethics Language Modelling

Paper
Code

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

4 code implementations • 8 Apr 2024 • Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou, Przemysław Kazienko, Kranthi Kiran GV, Jan Kocoń, Bartłomiej Koptyra, Satyapriya Krishna, Ronald McClelland Jr., Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Stanisław Woźniak, Ruichong Zhang, Bingchen Zhao, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu

We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture.

11,659

Paper
Code

Understanding the Effects of Iterative Prompting on Truthfulness

no code implementations • 9 Feb 2024 • Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

The development of Large Language Models (LLMs) has notably transformed numerous sectors, offering impressive text generation capabilities.

Text Generation

Paper
Add Code

Black-Box Access is Insufficient for Rigorous AI Audits

no code implementations • 25 Jan 2024 • Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell

The effectiveness of an audit, however, depends on the degree of system access granted to auditors.

Paper
Add Code

On the Intersection of Self-Correction and Trust in Language Models

no code implementations • 6 Nov 2023 • Satyapriya Krishna

Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex cognitive tasks.

Misinformation

Paper
Add Code

Are Large Language Models Post Hoc Explainers?

1 code implementation • 9 Oct 2023 • Nicholas Kroeger, Dan Ley, Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

To this end, several approaches have been proposed in recent literature to explain the behavior of complex predictive models in a post hoc fashion.

Explainable artificial intelligence Explainable Artificial Intelligence (XAI) +1

Paper
Code

On the Trade-offs between Adversarial Robustness and Actionable Explanations

no code implementations • 28 Sep 2023 • Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders.

Adversarial Robustness

Paper
Add Code

Post Hoc Explanations of Language Models Can Improve Language Models

no code implementations • NeurIPS 2023 • Satyapriya Krishna, Jiaqi Ma, Dylan Slack, Asma Ghandeharioun, Sameer Singh, Himabindu Lakkaraju

Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex tasks.

In-Context Learning

Paper
Add Code

Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten

no code implementations • 8 Feb 2023 • Satyapriya Krishna, Jiaqi Ma, Himabindu Lakkaraju

The Right to Explanation and the Right to be Forgotten are two important principles outlined to regulate algorithmic decision making and data usage in real-world applications.

Decision Making

Paper
Add Code

TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations

1 code implementation • 8 Jul 2022 • Dylan Slack, Satyapriya Krishna, Himabindu Lakkaraju, Sameer Singh

In real-world evaluations with humans, 73% of healthcare workers (e. g., doctors and nurses) agreed they would use TalkToModel over baseline point-and-click systems for explainability in a disease prediction task, and 85% of ML professionals agreed TalkToModel was easier to use for computing explanations.

BIG-bench Machine Learning Disease Prediction +1

103

Paper
Code

OpenXAI: Towards a Transparent Evaluation of Model Explanations

2 code implementations • 22 Jun 2022 • Chirag Agarwal, Dan Ley, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, Himabindu Lakkaraju

OpenXAI comprises of the following key components: (i) a flexible synthetic data generator and a collection of diverse real-world datasets, pre-trained models, and state-of-the-art feature attribution methods, and (ii) open-source implementations of eleven quantitative metrics for evaluating faithfulness, stability (robustness), and fairness of explanation methods, in turn providing comparisons of several explanation methods across a wide variety of metrics, models, and datasets.

Benchmarking Explainable Artificial Intelligence (XAI) +1

213

Paper
Code

Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal

no code implementations • Findings (ACL) 2022 • Umang Gupta, Jwala Dhamala, Varun Kumar, Apurv Verma, Yada Pruksachatkun, Satyapriya Krishna, Rahul Gupta, Kai-Wei Chang, Greg Ver Steeg, Aram Galstyan

Language models excel at generating coherent text, and model compression techniques such as knowledge distillation have enabled their use in resource-constrained settings.

counterfactual Fairness +3

Paper
Add Code

Measuring Fairness of Text Classifiers via Prediction Sensitivity

no code implementations • ACL 2022 • Satyapriya Krishna, Rahul Gupta, Apurv Verma, Jwala Dhamala, Yada Pruksachatkun, Kai-Wei Chang

With the rapid growth in language processing applications, fairness has emerged as an important consideration in data-driven solutions.

Attribute counterfactual +3

Paper
Add Code

Rethinking Stability for Attribution-based Explanations

no code implementations • 14 Mar 2022 • Chirag Agarwal, Nari Johnson, Martin Pawelczyk, Satyapriya Krishna, Eshika Saxena, Marinka Zitnik, Himabindu Lakkaraju

As attribution-based explanation methods are increasingly used to establish model trustworthiness in high-stakes situations, it is critical to ensure that these explanations are stable, e. g., robust to infinitesimal perturbations to an input.

Paper
Add Code

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

no code implementations • 3 Feb 2022 • Satyapriya Krishna, Tessa Han, Alex Gu, Javin Pombra, Shahin Jabbari, Steven Wu, Himabindu Lakkaraju

To this end, we first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction, and introduce a novel quantitative framework to formalize this understanding.

BIG-bench Machine Learning

Paper
Add Code

Towards Realistic Single-Task Continuous Learning Research for NER

1 code implementation • Findings (EMNLP) 2021 • Justin Payan, Yuval Merhav, He Xie, Satyapriya Krishna, Anil Ramakrishna, Mukund Sridhar, Rahul Gupta

There is an increasing interest in continuous learning (CL), as data privacy is becoming a priority for real-world machine learning applications.

NER

Paper
Code

Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification

no code implementations • Findings (ACL) 2021 • Yada Pruksachatkun, Satyapriya Krishna, Jwala Dhamala, Rahul Gupta, Kai-Wei Chang

Existing bias mitigation methods to reduce disparities in model outcomes across cohorts have focused on data augmentation, debiasing model embeddings, or adding fairness-based optimization objectives during training.

Data Augmentation Fairness +2

Paper
Add Code

Grounding Complex Navigational Instructions Using Scene Graphs

no code implementations • 3 Jun 2021 • Michiel de Jong, Satyapriya Krishna, Anuva Agarwal

Training a reinforcement learning agent to carry out natural language instructions is limited by the available supervision, i. e. knowing when the instruction has been carried out.

Question Answering reinforcement-learning +2

Paper
Add Code

ADePT: Auto-encoder based Differentially Private Text Transformation

2 code implementations • EACL 2021 • Satyapriya Krishna, Rahul Gupta, Christophe Dupuy

We prove the theoretical privacy guarantee of our algorithm and assess its privacy leakage under Membership Inference Attacks(MIA) (Shokri et al., 2017) on models trained with transformed data.

Paper
Code

BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation

1 code implementation • 27 Jan 2021 • Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, Rahul Gupta

To systematically study and benchmark social biases in open-ended language generation, we introduce the Bias in Open-Ended Language Generation Dataset (BOLD), a large-scale dataset that consists of 23, 679 English text generation prompts for bias benchmarking across five domains: profession, gender, race, religion, and political ideology.

Benchmarking Text Generation

Paper
Code

Towards classification parity across cohorts

no code implementations • 16 May 2020 • Aarsh Patel, Rahul Gupta, Mukund Harakere, Satyapriya Krishna, Aman Alok, Peng Liu

In this research work, we aim to achieve classification parity across explicit as well as implicit sensitive features.

Classification Clustering +6

Paper
Add Code

FineText: Text Classification via Attention-based Language Model Fine-tuning

no code implementations • 25 Oct 2019 • Yunzhe Tao, Saurabh Gupta, Satyapriya Krishna, Xiong Zhou, Orchid Majumder, Vineet Khare

Training deep neural networks from scratch on natural language processing (NLP) tasks requires significant amount of manually labeled text corpus and substantial time to converge, which usually cannot be satisfied by the customers.

Benchmarking General Classification +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.