no code implementations • 30 Aug 2024 • Spencer Whitehead, Jacob Phillips, Sean Hendryx
Multimodal language models can exhibit hallucinations in their outputs, which limits their reliability.
1 code implementation • 18 Jul 2024 • Vaskar Nath, Dylan Slack, Jeff Da, Yuntao Ma, Hugh Zhang, Spencer Whitehead, Sean Hendryx
Beyond improving reward model performance, we show this way of training RM representations enables improved $\textit{steerability}$ because it allows us to evaluate the likelihood of an action achieving a particular goal-state (e. g., whether a solution is correct or helpful).
1 code implementation • CVPR 2023 • Corentin Dancette, Spencer Whitehead, Rishabh Maheshwary, Ramakrishna Vedantam, Stefan Scherer, Xinlei Chen, Matthieu Cord, Marcus Rohrbach
In this work, we explore Selective VQA in both in-distribution (ID) and OOD scenarios, where models are presented with mixtures of ID and OOD data.
no code implementations • 11 May 2023 • Suzanne Petryk, Spencer Whitehead, Joseph E. Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach
The ability to judge whether a caption correctly describes an image is a critical part of vision-language understanding.
Ranked #62 on
Visual Reasoning
on Winoground
28 code implementations • ICCV 2023 • Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick
We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation.
Ranked #2 on
Zero-Shot Instance Segmentation
on LVIS v1.0 val
1 code implementation • 28 Apr 2022 • Spencer Whitehead, Suzanne Petryk, Vedaad Shakib, Joseph Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach
We first enable abstention capabilities for several VQA models, and analyze both their coverage, the portion of questions answered, and risk, the error on that portion.
2 code implementations • CVPR 2021 • Spencer Whitehead, Hui Wu, Heng Ji, Rogerio Feris, Kate Saenko
Generalization to out-of-distribution data has been a problem for Visual Question Answering (VQA) models.
1 code implementation • 26 Nov 2020 • Spencer Whitehead, Hui Wu, Yi Ren Fung, Heng Ji, Rogerio Feris, Kate Saenko
Existing Visual Question Answering (VQA) models are often fragile and sensitive to input variations.
no code implementations • CONLL 2018 • Boliang Zhang, Spencer Whitehead, Lifu Huang, Heng Ji
Many name tagging approaches use local contextual information with much success, but fail when the local context is ambiguous or limited.
no code implementations • ACL 2020 • Manling Li, Alireza Zareian, Ying Lin, Xiaoman Pan, Spencer Whitehead, Brian Chen, Bo Wu, Heng Ji, Shih-Fu Chang, Clare Voss, Daniel Napierski, Marjorie Freedman
We present the first comprehensive, open source multimedia knowledge extraction system that takes a massive stream of unstructured, heterogeneous multimedia data from various sources and languages as input, and creates a coherent, structured knowledge base, indexing entities, relations, and events, following a rich, fine-grained ontology.
no code implementations • ACL 2020 • Manling Li, Alireza Zareian, Qi Zeng, Spencer Whitehead, Di Lu, Heng Ji, Shih-Fu Chang
We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents.
1 code implementation • 5 Nov 2019 • Maxwell Crouse, Ibrahim Abdelaziz, Bassem Makni, Spencer Whitehead, Cristina Cornelio, Pavan Kapanipathi, Kavitha Srinivas, Veronika Thost, Michael Witbrock, Achille Fokoue
Automated theorem provers have traditionally relied on manually tuned heuristics to guide how they perform proof search.
no code implementations • 5 Nov 2019 • Pavan Kapanipathi, Veronika Thost, Siva Sankalp Patel, Spencer Whitehead, Ibrahim Abdelaziz, Avinash Balakrishnan, Maria Chang, Kshitij Fadnis, Chulaka Gunasekara, Bassem Makni, Nicholas Mattei, Kartik Talamadupula, Achille Fokoue
A few approaches have shown that information from external knowledge sources like knowledge graphs (KGs) can add value, in addition to the textual content, by providing background knowledge that may be critical for a task.
no code implementations • 16 Aug 2019 • Benoit Charbonneau, Spencer Whitehead
We describe a Maple package that serves at least four purposes.
Computational Geometry
no code implementations • NAACL 2019 • Manling Li, Ying Lin, Joseph Hoover, Spencer Whitehead, Clare Voss, Morteza Dehghani, Heng Ji
This paper demonstrates a state-of-the-art end-to-end multilingual (English, Russian, and Ukrainian) knowledge extraction system that can perform entity discovery and linking, relation extraction, event extraction, and coreference.
no code implementations • EMNLP 2018 • Spencer Whitehead, Heng Ji, Mohit Bansal, Shih-Fu Chang, Clare Voss
We develop an approach that uses video meta-data to retrieve topically related news documents for a video and extracts the events and named entities from these documents.
2 code implementations • ACL 2018 • Qingyun Wang, Zhi-Hao Zhou, Lifu Huang, Spencer Whitehead, Boliang Zhang, Heng Ji, Kevin Knight
We present a paper abstract writing system based on an attentive neural sequence-to-sequence model that can take a title as input and automatically generate an abstract.
Ranked #1 on
Paper generation
on ACL Title and Abstract Dataset
no code implementations • EMNLP 2018 • Di Lu, Spencer Whitehead, Lifu Huang, Heng Ji, Shih-Fu Chang
Current image captioning approaches generate descriptions which lack specific information, such as named entities that are involved in the images.