2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
1 code implementation • 2 Feb 2024 • Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, Douwe Kiela
Kahneman & Tversky's $\textit{prospect theory}$ tells us that humans perceive random variables in a biased but well-defined manner; for example, humans are famously loss-averse.
1 code implementation • 16 Oct 2021 • Kawin Ethayarajh, Yejin Choi, Swabha Swayamdipta
However, this comparison provides little understanding of how difficult each instance in a given distribution is, or what attributes make the dataset difficult for a given model.
1 code implementation • EMNLP 2021 • John Hewitt, Kawin Ethayarajh, Percy Liang, Christopher D. Manning
Probing experiments investigate the extent to which neural representations make properties -- like part-of-speech -- predictable.
1 code implementation • ACL 2022 • Kaitlyn Zhou, Kawin Ethayarajh, Dallas Card, Dan Jurafsky
Cosine similarity of contextual embeddings is used in many NLP tasks (e. g., QA, IR, MT) and metrics (e. g., BERTScore).
1 code implementation • 14 Sep 2023 • Rajan Vivek, Kawin Ethayarajh, Diyi Yang, Douwe Kiela
Moreover, just several anchor points can be used to estimate model per-class predictions on all other points in a dataset with low mean absolute error, sufficient for gauging where the model is likely to fail.
no code implementations • ACL 2019 • Kawin Ethayarajh, David Duvenaud, Graeme Hirst
A surprising property of word vectors is that word analogies can often be solved with vector arithmetic.
no code implementations • WS 2018 • Kawin Ethayarajh
We propose a random walk model that is robust to this confound, where the probability of word generation is inversely related to the angular distance between the word and sentence embeddings.
no code implementations • ACL 2019 • Kawin Ethayarajh, David Duvenaud, Graeme Hirst
Word embeddings are often criticized for capturing undesirable word associations such as gender stereotypes.
no code implementations • IJCNLP 2019 • Kawin Ethayarajh
A notable property of word embeddings is that word relationships can exist as linear substructures in the embedding space.
1 code implementation • IJCNLP 2019 • Kawin Ethayarajh
In all layers of ELMo, BERT, and GPT-2, on average, less than 5% of the variance in a word's contextualized representations can be explained by a static embedding for that word, providing some justification for the success of contextualized representations.
no code implementations • EMNLP (Eval4NLP) 2020 • Kawin Ethayarajh, Dorsa Sadigh
To this end, we propose BLEU Neighbors, a nearest neighbors model for estimating language quality by using the BLEU score as a kernel function.
no code implementations • ACL 2020 • Kawin Ethayarajh
However, manually annotating a large dataset with a protected attribute is slow and expensive.
no code implementations • EMNLP 2020 • Kawin Ethayarajh, Dan Jurafsky
Benchmarks such as GLUE have helped drive advances in NLP by incentivizing the creation of more accurate models.
no code implementations • 17 Apr 2021 • Kaitlyn Zhou, Kawin Ethayarajh, Dan Jurafsky
How does word frequency in pre-training data affect the behavior of similarity metrics in contextualized BERT embeddings?
no code implementations • ACL 2021 • Kawin Ethayarajh, Dan Jurafsky
Shapley Values, a solution to the credit assignment problem in cooperative game theory, are a popular type of explanation in machine learning, having been used to explain the importance of features, embeddings, and even neurons.
no code implementations • NeurIPS 2021 • Zhiyi Ma, Kawin Ethayarajh, Tristan Thrush, Somya Jain, Ledell Wu, Robin Jia, Christopher Potts, Adina Williams, Douwe Kiela
We introduce Dynaboard, an evaluation-as-a-service framework for hosting benchmarks and conducting holistic model comparison, integrated with the Dynabench platform.
1 code implementation • Findings (ACL) 2022 • Kaitlyn Zhou, Kawin Ethayarajh, Dan Jurafsky
We examine whether some countries are more richly represented in embedding space than others.
no code implementations • 24 May 2022 • Kawin Ethayarajh, Dan Jurafsky
Human ratings are the gold standard in NLG evaluation.