1 code implementation • 17 Sep 2024 • Orion Weller, Benjamin Van Durme, Dawn Lawrie, Ashwin Paranjape, Yuhao Zhang, Jack Hessel
Instruction-tuned language models (LM) are able to respond to imperative commands, providing a more natural user interface compared to their base counterparts.
no code implementations • 5 Sep 2024 • Yuntian Deng, Wenting Zhao, Jack Hessel, Xiang Ren, Claire Cardie, Yejin Choi
The increasing availability of real-world conversation data offers exciting opportunities for researchers to study user-chatbot interactions.
no code implementations • 2 Jul 2024 • Khyathi Raghavi Chandu, Linjie Li, Anas Awadalla, Ximing Lu, Jae Sung Park, Jack Hessel, Lijuan Wang, Yejin Choi
The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable.
1 code implementation • 2 Jul 2024 • Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi
Chat-based language models are designed to be helpful, yet they should not comply with every user request.
no code implementations • 29 Jun 2024 • Jaeyoung Lee, Ximing Lu, Jack Hessel, Faeze Brahman, Youngjae Yu, Yonatan Bisk, Yejin Choi, Saadia Gabriel
Given the growing influx of misinformation across news and social media, there is a critical need for systems that can provide effective real-time verification of news claims.
1 code implementation • 27 Jun 2024 • Jiwan Chung, Sungjae Lee, Minseo Kim, Seungju Han, Ashkan Yousefpour, Jack Hessel, Youngjae Yu
Visual arguments, often used in advertising or social causes, rely on images to persuade viewers to do or believe something.
no code implementations • 2 May 2024 • Wenting Zhao, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, Yuntian Deng
In addition to timestamped chat transcripts, we enrich the dataset with demographic data, including state, country, and hashed IP addresses, alongside request headers.
1 code implementation • 23 Feb 2024 • Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, Khyathi Raghavi Chandu
Selective prediction minimizes incorrect predictions from vision-language models (VLMs) by allowing them to abstain from answering when uncertain.
no code implementations • 14 Feb 2024 • Yutaro Yamada, Khyathi Chandu, YuChen Lin, Jack Hessel, Ilker Yildirim, Yejin Choi
In this paper, we propose a language agent with chain-of-3D-thoughts (L3GO), an inference-time approach that can reason about part-based 3D mesh generation of unconventional objects that current data-driven diffusion models struggle with.
no code implementations • 5 Feb 2024 • Anthony Sicilia, Hyunwoo Kim, Khyathi Raghavi Chandu, Malihe Alikhani, Jack Hessel
Effective interlocutors account for the uncertain goals, beliefs, and emotions of others.
3 code implementations • 1 Feb 2024 • Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi
Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs.
no code implementations • 10 Dec 2023 • Peter West, Ronan Le Bras, Taylor Sorensen, Bill Yuchen Lin, Liwei Jiang, Ximing Lu, Khyathi Chandu, Jack Hessel, Ashutosh Baheti, Chandra Bhagavatula, Yejin Choi
We present NovaCOMET, an open commonsense knowledge model, that combines the best aspects of knowledge and general task models.
2 code implementations • NeurIPS 2023 • Jae Sung Park, Jack Hessel, Khyathi Raghavi Chandu, Paul Pu Liang, Ximing Lu, Peter West, Youngjae Yu, Qiuyuan Huang, Jianfeng Gao, Ali Farhadi, Yejin Choi
Empirical results and human evaluations in a zero-shot setup demonstrate that our distillation method results in more precise VL models of reasoning compared to a baseline of passing a generated referring expression to an LLM.
no code implementations • 14 Nov 2023 • Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Yejin Choi, Xiang Lorraine Li, Alane Suhr
To instead investigate the ability to model unusual, unexpected, and unlikely situations, we explore the task of uncommonsense abductive reasoning.
1 code implementation • 6 Nov 2023 • Sahana Ramnath, Brihi Joshi, Skyler Hallinan, Ximing Lu, Liunian Harold Li, Aaron Chan, Jack Hessel, Yejin Choi, Xiang Ren
Results on five difficult question-answering datasets StrategyQA, QuaRel, OpenBookQA, NumerSense and QASC show that not only does MaRio improve task accuracy, but it also improves the self-rationalization quality of small LMs across the aforementioned axes better than a supervised fine-tuning (SFT) baseline.
2 code implementations • 30 Oct 2023 • Amita Kamath, Jack Hessel, Kai-Wei Chang
Recent vision-language (VL) models are powerful, but can they reliably distinguish "right" from "left"?
1 code implementation • 17 Oct 2023 • Joel Jang, Seungone Kim, Bill Yuchen Lin, Yizhong Wang, Jack Hessel, Luke Zettlemoyer, Hannaneh Hajishirzi, Yejin Choi, Prithviraj Ammanabrolu
In this work, we study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem, wherein LLMs are aligned to multiple (sometimes conflicting) preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem.
1 code implementation • 16 Oct 2023 • Seungju Han, Junhyeok Kim, Jack Hessel, Liwei Jiang, Jiwan Chung, Yejin Son, Yejin Choi, Youngjae Yu
NORMLENS consists of 10K human judgments accompanied by free-form explanations covering 2K multimodal situations, and serves as a probe to address two questions: (1) to what extent can models align with average human judgment?
1 code implementation • 12 Aug 2023 • Yonatan Bitton, Hritik Bansal, Jack Hessel, Rulin Shao, Wanrong Zhu, Anas Awadalla, Josh Gardner, Rohan Taori, Ludwig Schmidt
These descriptions enable 1) collecting human-verified reference outputs for each instance; and 2) automatic evaluation of candidate multimodal generations using a text-only LLM, aligning with human judgment.
2 code implementations • 2 Aug 2023 • Anas Awadalla, Irena Gao, Josh Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Shiori Sagawa, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt
We introduce OpenFlamingo, a family of autoregressive vision-language models ranging from 3B to 9B parameters.
Ranked #14 on
Visual Question Answering (VQA)
on InfiMM-Eval
1 code implementation • 26 Jun 2023 • Binzhu Xie, Sicheng Zhang, Zitang Zhou, Bo Li, Yuanhan Zhang, Jack Hessel, Jingkang Yang, Ziwei Liu
Surprising videos, such as funny clips, creative performances, or visual illusions, attract significant attention.
1 code implementation • 24 Jun 2023 • Liunian Harold Li, Jack Hessel, Youngjae Yu, Xiang Ren, Kai-Wei Chang, Yejin Choi
We release our corpus of chain-of-thought samples and code.
4 code implementations • NeurIPS 2023 • Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi
Our evaluations show that the best model in any given evaluation reaches on average 87% of ChatGPT performance, and 73% of GPT-4 performance, suggesting that further investment in building better base models and instruction-tuning data is required to close the gap.
1 code implementation • 24 May 2023 • Amita Kamath, Jack Hessel, Kai-Wei Chang
We first curate CompPrompts, a set of increasingly compositional image captions that VL models should be able to capture (e. g., single object, to object+property, to multiple interacting objects).
1 code implementation • NeurIPS 2023 • Wanrong Zhu, Jack Hessel, Anas Awadalla, Samir Yitzhak Gadre, Jesse Dodge, Alex Fang, Youngjae Yu, Ludwig Schmidt, William Yang Wang, Yejin Choi
We release Multimodal C4, an augmentation of the popular text-only C4 corpus with images interleaved.
1 code implementation • ICCV 2023 • Seungju Han, Jack Hessel, Nouha Dziri, Yejin Choi, Youngjae Yu
To train CHAMPAGNE, we collect and release YTD-18M, a large-scale corpus of 18M video-based dialogues.
no code implementations • ICCV 2023 • Nitzan Bitton-Guetta, Yonatan Bitton, Jack Hessel, Ludwig Schmidt, Yuval Elovici, Gabriel Stanovsky, Roy Schwartz
We introduce WHOOPS!, a new dataset and benchmark for visual commonsense.
Ranked #1 on
Image-to-Text Retrieval
on WHOOPS!
(using extra training data)
1 code implementation • CVPR 2023 • Youngjae Yu, Jiwan Chung, Heeseung Yun, Jack Hessel, Jae Sung Park, Ximing Lu, Rowan Zellers, Prithviraj Ammanabrolu, Ronan Le Bras, Gunhee Kim, Yejin Choi
Language models are capable of commonsense reasoning: while domain-specific models can learn from explicit knowledge (e. g. commonsense graphs [6], ethical norms [25]), and larger models like GPT-3 manifest broad commonsense reasoning capacity.
1 code implementation • 20 Dec 2022 • Hyunwoo Kim, Jack Hessel, Liwei Jiang, Peter West, Ximing Lu, Youngjae Yu, Pei Zhou, Ronan Le Bras, Malihe Alikhani, Gunhee Kim, Maarten Sap, Yejin Choi
Data scarcity has been a long standing issue in the field of open-domain social dialogue.
3 code implementations • 3 Oct 2022 • Rajkumar Ramamurthy, Prithviraj Ammanabrolu, Kianté Brantley, Jack Hessel, Rafet Sifa, Christian Bauckhage, Hannaneh Hajishirzi, Yejin Choi
To help answer this, we first introduce an open-source modular library, RL4LMs (Reinforcement Learning for Language Models), for optimizing language generators with RL.
1 code implementation • 13 Sep 2022 • Jack Hessel, Ana Marasović, Jena D. Hwang, Lillian Lee, Jeff Da, Rowan Zellers, Robert Mankoff, Yejin Choi
Large neural networks can now generate jokes, but do they really "understand" humor?
1 code implementation • 26 May 2022 • Ximing Lu, Sean Welleck, Jack Hessel, Liwei Jiang, Lianhui Qin, Peter West, Prithviraj Ammanabrolu, Yejin Choi
Large-scale language models often learn behaviors that are misaligned with user expectations.
1 code implementation • 25 May 2022 • Youngjae Yu, Jiwan Chung, Heeseung Yun, Jack Hessel, JaeSung Park, Ximing Lu, Prithviraj Ammanabrolu, Rowan Zellers, Ronan Le Bras, Gunhee Kim, Yejin Choi
Large language models readily adapt to novel settings, even without task-specific training data.
1 code implementation • 10 Feb 2022 • Jack Hessel, Jena D. Hwang, Jae Sung Park, Rowan Zellers, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, Yejin Choi
We present Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents.
no code implementations • CVPR 2022 • Rowan Zellers, Jiasen Lu, Ximing Lu, Youngjae Yu, Yanpeng Zhao, Mohammadreza Salehi, Aditya Kusupati, Jack Hessel, Ali Farhadi, Yejin Choi
Given a video, we replace snippets of text and audio with a MASK token; the model learns by choosing the correct masked-out snippet.
Ranked #6 on
Action Classification
on Kinetics-600
(using extra training data)
1 code implementation • NAACL 2022 • Sarah Wiegreffe, Jack Hessel, Swabha Swayamdipta, Mark Riedl, Yejin Choi
We create a pipeline that combines GPT-3 with a supervised filter that incorporates binary acceptability judgments from humans in the loop.
1 code implementation • NAACL 2022 • Yanpeng Zhao, Jack Hessel, Youngjae Yu, Ximing Lu, Rowan Zellers, Yejin Choi
In a difficult zero-shot setting with no paired audio-text data, our model demonstrates state-of-the-art zero-shot performance on the ESC50 and US8K audio classification tasks, and even surpasses the supervised state of the art for Clotho caption retrieval (with audio queries) by 2. 2\% R@1.
1 code implementation • NAACL 2022 • Peter West, Chandra Bhagavatula, Jack Hessel, Jena D. Hwang, Liwei Jiang, Ronan Le Bras, Ximing Lu, Sean Welleck, Yejin Choi
We apply this to the ATOMIC resource, and share our new symbolic knowledge graph and commonsense models.
no code implementations • ACL 2021 • Jack Hessel, Alexandra Schofield
Ordered word sequences contain the rich structures that define language.
1 code implementation • NeurIPS 2021 • Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi
As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future.
3 code implementations • EMNLP 2021 • Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, Yejin Choi
Image captioning has conventionally relied on reference-based automatic evaluations, where machine captions are compared against captions written by humans.
Ranked #1 on
Hallucination Pair-wise Detection (4-ref)
on FOIL
Hallucination Pair-wise Detection (1-ref)
Hallucination Pair-wise Detection (4-ref)
+3
1 code implementation • EMNLP 2020 • Gregory Yauney, Jack Hessel, David Mimno
Images can give us insights into the contextual meanings of words, but current image-text grounding approaches require detailed annotations.
no code implementations • EMNLP 2020 • Jack Hessel, Lillian Lee
Modeling expressive cross-modal interactions seems crucial in multimodal tasks, such as visual question answering.
1 code implementation • EMNLP 2020 • Jack Hessel, Zhenhai Zhu, Bo Pang, Radu Soricut
Pretraining from unlabelled web videos has quickly become the de-facto means of achieving high performance on many video understanding tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • CONLL 2019 • Jack Hessel, Bo Pang, Zhenhai Zhu, Radu Soricut
Instructional videos get high-traffic on video sharing platforms, and prior work suggests that providing time-stamped, subtask annotations (e. g., "heat the oil in the pan") improves user experiences.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
2 code implementations • IJCNLP 2019 • Jack Hessel, Lillian Lee, David Mimno
Images and text co-occur constantly on the web, but explicit links between images and sentences (or other intra-document textual units) are often not present.
no code implementations • NAACL 2019 • Jack Hessel, Lillian Lee
Controversial posts are those that split the preferences of a community, receiving both significant positive and significant negative feedback.
1 code implementation • NAACL 2018 • Jack Hessel, David Mimno, Lillian Lee
Multimodal machine learning algorithms aim to learn visual-textual correspondences.
1 code implementation • 6 Mar 2017 • Jack Hessel, Lillian Lee, David Mimno
The content of today's social media is becoming more and more rich, increasingly mixing text, images, videos, and audio.
no code implementations • WS 2015 • Jack Hessel, Nicolas Savva, Michael J. Wilber
We examine the possibility that recent promising results in automatic caption generation are due primarily to language models.