no code implementations • 3 Oct 2024 • Wei Cheng, Tianlu Wang, Yanmin Ji, Fan Yang, Keren Tan, Yiyu Zheng
While in-context learning with large language models (LLMs) has shown impressive performance, we have discovered a unique miscalibration behavior where both correct and incorrect predictions are assigned the same level of confidence.
no code implementations • 5 Aug 2024 • Tianlu Wang, Ilia Kulikov, Olga Golovneva, Ping Yu, Weizhe Yuan, Jane Dwivedi-Yu, Richard Yuanzhe Pang, Maryam Fazel-Zarandi, Jason Weston, Xian Li
Model-based evaluation is at the heart of successful model development -- as a reward model for training, and as a replacement for human evaluation.
no code implementations • 29 May 2024 • Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar Sukhbaatar
Incorporating position encoding (PE) makes it possible to address by position, such as attending to the i-th token.
no code implementations • 30 Jan 2024 • Silin Gao, Jane Dwivedi-Yu, Ping Yu, Xiaoqing Ellen Tan, Ramakanth Pasunuru, Olga Golovneva, Koustuv Sinha, Asli Celikyilmaz, Antoine Bosselut, Tianlu Wang
LLM agents trained with our method also show more efficient tool use, with inference speed being on average ~1. 4x faster than baseline tool-augmented LLMs.
no code implementations • 8 Dec 2023 • Olga Golovneva, Sean O'Brien, Ramakanth Pasunuru, Tianlu Wang, Luke Zettlemoyer, Maryam Fazel-Zarandi, Asli Celikyilmaz
Using constrained reasoning, PathFinder integrates novel quality constraints, pruning, and exploration methods to enhance the efficiency and the quality of generation.
no code implementations • 14 Nov 2023 • Kumar Shridhar, Koustuv Sinha, Andrew Cohen, Tianlu Wang, Ping Yu, Ram Pasunuru, Mrinmaya Sachan, Jason Weston, Asli Celikyilmaz
In recent years, Large Language Models (LLMs) have demonstrated remarkable generative abilities, but can they judge the quality of their own generations?
Ranked #53 on Arithmetic Reasoning on GSM8K
1 code implementation • 5 Sep 2023 • Lili Yu, Bowen Shi, Ramakanth Pasunuru, Benjamin Muller, Olga Golovneva, Tianlu Wang, Arun Babu, Binh Tang, Brian Karrer, Shelly Sheynin, Candace Ross, Adam Polyak, Russell Howes, Vasu Sharma, Puxin Xu, Hovhannes Tamoyan, Oron Ashual, Uriel Singer, Shang-Wen Li, Susan Zhang, Richard James, Gargi Ghosh, Yaniv Taigman, Maryam Fazel-Zarandi, Asli Celikyilmaz, Luke Zettlemoyer, Armen Aghajanyan
It is also a general-purpose model that can do both text-to-image and image-to-text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs.
Ranked #2 on Text-to-Image Generation on COCO
1 code implementation • 8 Aug 2023 • Tianlu Wang, Ping Yu, Xiaoqing Ellen Tan, Sean O'Brien, Ramakanth Pasunuru, Jane Dwivedi-Yu, Olga Golovneva, Luke Zettlemoyer, Maryam Fazel-Zarandi, Asli Celikyilmaz
As large language models improve, there is increasing interest in techniques that leverage these models' capabilities to refine their own outputs.
no code implementations • 26 Jun 2023 • Xiaochuang Han, Daniel Simig, Todor Mihaylov, Yulia Tsvetkov, Asli Celikyilmaz, Tianlu Wang
We observe that a continued pretraining on this small subset significantly improves the model's ICL ability, by up to 18%.
1 code implementation • 20 Jun 2023 • Sidi Lu, Hongyi Liu, Asli Celikyilmaz, Tianlu Wang, Nanyun Peng
We investigate CDM for open-domain text generation evaluation under two paradigms: 1) _Generative_ CDM, which harnesses the contrast of two language models' distributions to generate synthetic examples for training discriminator-based metrics; 2) _Discriminative_ CDM, which directly uses distribution disparities between two language models for evaluation.
1 code implementation • 24 May 2023 • Haoyi Qiu, Zi-Yi Dou, Tianlu Wang, Asli Celikyilmaz, Nanyun Peng
Model-based evaluation metrics (e. g., CLIPScore and GPTScore) have demonstrated decent correlations with human judgments in various language generation tasks.
no code implementations • 14 Mar 2023 • Jaspreet Ranjit, Tianlu Wang, Baishakhi Ray, Vicente Ordonez
We also find that (2) models finetuned on larger scale datasets are more likely to introduce new biased associations.
1 code implementation • 22 Dec 2022 • Srinivasan Iyer, Xi Victoria Lin, Ramakanth Pasunuru, Todor Mihaylov, Daniel Simig, Ping Yu, Kurt Shuster, Tianlu Wang, Qing Liu, Punit Singh Koura, Xian Li, Brian O'Horo, Gabriel Pereyra, Jeff Wang, Christopher Dewan, Asli Celikyilmaz, Luke Zettlemoyer, Ves Stoyanov
To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks.
Ranked #26 on Natural Language Inference on RTE
no code implementations • 16 Dec 2022 • Ping Yu, Tianlu Wang, Olga Golovneva, Badr Alkhamissy, Gargi Ghosh, Mona Diab, Asli Celikyilmaz
Current large language models can perform reasonably well on complex tasks that require step-by-step reasoning with few-shot learning.
no code implementations • 4 Oct 2022 • Daniel Simig, Tianlu Wang, Verna Dankers, Peter Henderson, Khuyagbaatar Batsuren, Dieuwke Hupkes, Mona Diab
In NLP, models are usually evaluated by reporting single-number performance scores on a number of readily available benchmarks, without much deeper analysis.
1 code implementation • 5 Sep 2022 • Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu
Departing from recent in-context learning methods, we formulate an annotation-efficient, two-step framework: selective annotation that chooses a pool of examples to annotate from unlabeled data in advance, followed by prompt retrieval that retrieves task examples from the annotated pool at test time.
10 code implementations • 2 May 2022 • Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.
Ranked #2 on Stereotypical Bias Analysis on CrowS-Pairs
2 code implementations • 20 Dec 2021 • Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li
Large-scale generative language models such as GPT-3 are competitive few-shot learners.
1 code implementation • Findings (NAACL) 2022 • Tianlu Wang, Rohit Sridhar, Diyi Yang, Xuezhi Wang
Recently, NLP models have achieved remarkable progress across a variety of tasks; however, they have also been criticized for being not robust.
2 code implementations • CVPR 2021 • Jack Lanchantin, Tianlu Wang, Vicente Ordonez, Yanjun Qi
Multi-label image classification is the task of predicting a set of labels corresponding to objects, attributes or other entities present in an image.
1 code implementation • EMNLP 2021 • Fuxiao Liu, Yinghan Wang, Tianlu Wang, Vicente Ordonez
We propose Visual News Captioner, an entity-aware model for the task of news image captioning.
no code implementations • EMNLP 2020 • Tianlu Wang, Xuezhi Wang, Yao Qin, Ben Packer, Kang Li, Jilin Chen, Alex Beutel, Ed Chi
Experiments on real-world NLP datasets demonstrate that our method can generate more diverse and fluent adversarial texts, compared to many existing adversarial text generation approaches.
1 code implementation • ACL 2020 • Tianlu Wang, Xi Victoria Lin, Nazneen Fatema Rajani, Bryan McCann, Vicente Ordonez, Caiming Xiong
Word embeddings derived from human-generated corpora inherit strong gender bias which can be further amplified by downstream models.
2 code implementations • NAACL 2019 • Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, Kai-Wei Chang
In this paper, we quantify, analyze and mitigate gender bias exhibited in ELMo's contextualized word vectors.
2 code implementations • ICCV 2019 • Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, Vicente Ordonez
In this work, we present a framework to measure and mitigate intrinsic biases with respect to protected variables --such as gender-- in visual recognition tasks.
3 code implementations • NAACL 2018 • Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, Kai-Wei Chang
We introduce a new benchmark, WinoBias, for coreference resolution focused on gender bias.
1 code implementation • CVPR 2018 • Tianlu Wang, Kota Yamaguchi, Vicente Ordonez
We propose an inference procedure for deep convolutional neural networks (CNNs) when partial evidence is available.
3 code implementations • EMNLP 2017 • Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, Kai-Wei Chang
Language is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web.