Search Results for author: Mahima Pushkarna

Found 8 papers, 5 papers with code

LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models

no code implementations16 Feb 2024 Minsuk Kahng, Ian Tenney, Mahima Pushkarna, Michael Xieyang Liu, James Wexler, Emily Reif, Krystal Kallarackal, Minsuk Chang, Michael Terry, Lucas Dixon

Automatic side-by-side evaluation has emerged as a promising approach to evaluating the quality of responses from large language models (LLMs).

ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles

no code implementations24 Oct 2023 Savvas Petridis, Ben Wedin, James Wexler, Aaron Donsbach, Mahima Pushkarna, Nitesh Goyal, Carrie J. Cai, Michael Terry

Inspired by these findings, we developed ConstitutionMaker, an interactive tool for converting user feedback into principles, to steer LLM-based chatbots.

Chatbot Language Modelling +2

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

no code implementations22 Jun 2022 Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou

This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims.

Benchmarking Text Generation

Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI

1 code implementation3 Apr 2022 Mahima Pushkarna, Andrew Zaldivar, Oddur Kjartansson

In this paper, we propose Data Cards for fostering transparent, purposeful and human-centered documentation of datasets within the practical contexts of industry and research.

Healthsheet: Development of a Transparency Artifact for Health Datasets

1 code implementation26 Feb 2022 Negar Rostamzadeh, Diana Mincu, Subhrajit Roy, Andrew Smart, Lauren Wilcox, Mahima Pushkarna, Jessica Schrouff, Razvan Amironesei, Nyalleng Moorosi, Katherine Heller

Our findings from the interviewee study and case studies show 1) that datasheets should be contextualized for healthcare, 2) that despite incentives to adopt accountability practices such as datasheets, there is a lack of consistency in the broader use of these practices 3) how the ML for health community views datasheets and particularly \textit{Healthsheets} as diagnostic tool to surface the limitations and strength of datasets and 4) the relative importance of different fields in the datasheet to healthcare concerns.

The What-If Tool: Interactive Probing of Machine Learning Models

1 code implementation9 Jul 2019 James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viegas, Jimbo Wilson

A key challenge in developing and deploying Machine Learning (ML) systems is understanding their performance across a wide range of inputs.

BIG-bench Machine Learning Fairness

ClinicalVis: Supporting Clinical Task-Focused Design Evaluation

1 code implementation13 Oct 2018 Marzyeh Ghassemi, Mahima Pushkarna, James Wexler, Jesse Johnson, Paul Varghese

Making decisions about what clinical tasks to prepare for is multi-factored, and especially challenging in intensive care environments where resources must be balanced with patient needs.

Human-Computer Interaction

Cannot find the paper you are looking for? You can Submit a new open access paper.