no code implementations • 24 May 2022 • Kawin Ethayarajh, Dan Jurafsky
Human ratings are treated as the gold standard in NLG evaluation.
1 code implementation • ACL 2022 • Kaitlyn Zhou, Kawin Ethayarajh, Dallas Card, Dan Jurafsky
Cosine similarity of contextual embeddings is used in many NLP tasks (e. g., QA, IR, MT) and metrics (e. g., BERTScore).
1 code implementation • Findings (ACL) 2022 • Kaitlyn Zhou, Kawin Ethayarajh, Dan Jurafsky
We examine whether some countries are more richly represented in embedding space than others.
no code implementations • 16 Oct 2021 • Kawin Ethayarajh, Yejin Choi, Swabha Swayamdipta
Estimating the difficulty of a dataset typically involves comparing state-of-the-art models to humans; the bigger the performance gap, the harder the dataset is said to be.
1 code implementation • EMNLP 2021 • John Hewitt, Kawin Ethayarajh, Percy Liang, Christopher D. Manning
Probing experiments investigate the extent to which neural representations make properties -- like part-of-speech -- predictable.
no code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Kohd, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
no code implementations • ACL 2021 • Kawin Ethayarajh, Dan Jurafsky
Shapley Values, a solution to the credit assignment problem in cooperative game theory, are a popular type of explanation in machine learning, having been used to explain the importance of features, embeddings, and even neurons.
no code implementations • NeurIPS 2021 • Zhiyi Ma, Kawin Ethayarajh, Tristan Thrush, Somya Jain, Ledell Wu, Robin Jia, Christopher Potts, Adina Williams, Douwe Kiela
We introduce Dynaboard, an evaluation-as-a-service framework for hosting benchmarks and conducting holistic model comparison, integrated with the Dynabench platform.
no code implementations • 17 Apr 2021 • Kaitlyn Zhou, Kawin Ethayarajh, Dan Jurafsky
How does word frequency in pre-training data affect the behavior of similarity metrics in contextualized BERT embeddings?
no code implementations • EMNLP 2020 • Kawin Ethayarajh, Dan Jurafsky
Benchmarks such as GLUE have helped drive advances in NLP by incentivizing the creation of more accurate models.
no code implementations • EMNLP (Eval4NLP) 2020 • Kawin Ethayarajh, Dorsa Sadigh
To this end, we propose BLEU Neighbors, a nearest neighbors model for estimating language quality by using the BLEU score as a kernel function.
no code implementations • ACL 2020 • Kawin Ethayarajh
However, manually annotating a large dataset with a protected attribute is slow and expensive.
no code implementations • IJCNLP 2019 • Kawin Ethayarajh
A notable property of word embeddings is that word relationships can exist as linear substructures in the embedding space.
1 code implementation • IJCNLP 2019 • Kawin Ethayarajh
In all layers of ELMo, BERT, and GPT-2, on average, less than 5% of the variance in a word's contextualized representations can be explained by a static embedding for that word, providing some justification for the success of contextualized representations.
no code implementations • ACL 2019 • Kawin Ethayarajh, David Duvenaud, Graeme Hirst
Word embeddings are often criticized for capturing undesirable word associations such as gender stereotypes.
no code implementations • ACL 2019 • Kawin Ethayarajh, David Duvenaud, Graeme Hirst
A surprising property of word vectors is that word analogies can often be solved with vector arithmetic.
no code implementations • WS 2018 • Kawin Ethayarajh
We propose a random walk model that is robust to this confound, where the probability of word generation is inversely related to the angular distance between the word and sentence embeddings.