1 code implementation • NAACL 2022 • Jae Sung Park, Sheng Shen, Ali Farhadi, Trevor Darrell, Yejin Choi, Anna Rohrbach
We test the robustness of recent methods on the proposed automatic contrast sets, and compare them to additionally collected human-generated counterparts, to assess their effectiveness.
no code implementations • 12 Nov 2024 • Anas Awadalla, Le Xue, Manli Shu, An Yan, Jun Wang, Senthil Purushwalkam, Sheng Shen, Hannah Lee, Oscar Lo, Jae Sung Park, Etash Guha, Silvio Savarese, Ludwig Schmidt, Yejin Choi, Caiming Xiong, ran Xu
We introduce BLIP3-KALE, a dataset of 218 million image-text pairs that bridges the gap between descriptive synthetic captions and factual web-scale alt-text.
no code implementations • 8 Oct 2024 • Mohammadreza Salehi, Jae Sung Park, Tanush Yadav, Aditya Kusupati, Ranjay Krishna, Yejin Choi, Hannaneh Hajishirzi, Ali Farhadi
To evaluate the effectiveness of multimodal foundation models in helping us recognize such actions, we present ActionAtlas v1. 0, a multiple-choice video question answering benchmark featuring short videos across various sports.
1 code implementation • 25 Sep 2024 • Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Taira Anderson, Erin Bransom, Kiana Ehsani, Huong Ngo, YenSung Chen, Ajay Patel, Mark Yatskar, Chris Callison-Burch, Andrew Head, Rose Hendrix, Favyen Bastani, Eli VanderBilt, Nathan Lambert, Yvonne Chou, Arnavi Chheda, Jenna Sparks, Sam Skjonsberg, Michael Schmitz, Aaron Sarnat, Byron Bischoff, Pete Walsh, Chris Newell, Piper Wolters, Tanmay Gupta, Kuo-Hao Zeng, Jon Borchardt, Dirk Groeneveld, Crystal Nam, Sophie Lebrecht, Caitlin Wittlif, Carissa Schoenick, Oscar Michel, Ranjay Krishna, Luca Weihs, Noah A. Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, Aniruddha Kembhavi
Today's most advanced vision-language models (VLMs) remain proprietary.
no code implementations • 2 Jul 2024 • Khyathi Raghavi Chandu, Linjie Li, Anas Awadalla, Ximing Lu, Jae Sung Park, Jack Hessel, Lijuan Wang, Yejin Choi
The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable.
1 code implementation • 28 May 2024 • Ethan Shen, Alan Fan, Sarah M. Pratt, Jae Sung Park, Matthew Wallingford, Sham M. Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati
We achieve this by feeding a superposition of the most recent token embeddings from the $k$ drafts as input to the next decoding step of the language model.
1 code implementation • 7 Jan 2024 • Zane Durante, Qiuyuan Huang, Naoki Wake, Ran Gong, Jae Sung Park, Bidipta Sarkar, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Yejin Choi, Katsushi Ikeuchi, Hoi Vo, Li Fei-Fei, Jianfeng Gao
To accelerate research on agent-based multimodal intelligence, we define "Agent AI" as a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data, and can produce meaningful embodied actions.
2 code implementations • NeurIPS 2023 • Jae Sung Park, Jack Hessel, Khyathi Raghavi Chandu, Paul Pu Liang, Ximing Lu, Peter West, Youngjae Yu, Qiuyuan Huang, Jianfeng Gao, Ali Farhadi, Yejin Choi
Empirical results and human evaluations in a zero-shot setup demonstrate that our distillation method results in more precise VL models of reasoning compared to a baseline of passing a generated referring expression to an LLM.
no code implementations • 1 May 2023 • Qiuyuan Huang, Jae Sung Park, Abhinav Gupta, Paul Bennett, Ran Gong, Subhojit Som, Baolin Peng, Owais Khan Mohammed, Chris Pal, Yejin Choi, Jianfeng Gao
In this study, we develop an infinite agent that learns to transfer knowledge memory from general foundation models (e. g. GPT4, DALLE) to novel domains or scenarios for scene understanding and generation in the physical or virtual world.
1 code implementation • CVPR 2023 • Youngjae Yu, Jiwan Chung, Heeseung Yun, Jack Hessel, Jae Sung Park, Ximing Lu, Rowan Zellers, Prithviraj Ammanabrolu, Ronan Le Bras, Gunhee Kim, Yejin Choi
Language models are capable of commonsense reasoning: while domain-specific models can learn from explicit knowledge (e. g. commonsense graphs [6], ethical norms [25]), and larger models like GPT-3 manifest broad commonsense reasoning capacity.
1 code implementation • 10 Feb 2022 • Jack Hessel, Jena D. Hwang, Jae Sung Park, Rowan Zellers, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, Yejin Choi
We present Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents.
1 code implementation • NeurIPS 2021 • Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi
As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future.
1 code implementation • NeurIPS 2021 • Aditya Kusupati, Matthew Wallingford, Vivek Ramanujan, Raghav Somani, Jae Sung Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi
We further quantitatively measure the quality of our codes by applying it to the efficient image retrieval as well as out-of-distribution (OOD) detection problems.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Ana Marasović, Chandra Bhagavatula, Jae Sung Park, Ronan Le Bras, Noah A. Smith, Yejin Choi
Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level explanations based on gradients or attention weights.
1 code implementation • ECCV 2020 • Jae Sung Park, Trevor Darrell, Anna Rohrbach
This auxiliary task allows us to propose a two-stage approach to Identity-Aware Video Description.
no code implementations • 5 Jun 2020 • Yue Tan, Chunjing Hu, Kuan Zhang, Kan Zheng, Ethan A. Davis, Jae Sung Park
Anomaly detection for non-linear dynamical system plays an important role in ensuring the system stability.
no code implementations • ECCV 2020 • Jae Sung Park, Chandra Bhagavatula, Roozbeh Mottaghi, Ali Farhadi, Yejin Choi
In addition, we provide person-grounding (i. e., co-reference links) between people appearing in the image and people mentioned in the textual commonsense descriptions, allowing for tighter integration between images and text.
1 code implementation • 2 May 2019 • Jae Woong Soh, Jae Sung Park, Nam Ik Cho
This paper presents a new framework for jointly enhancing the resolution and the dynamic range of an image, i. e., simultaneous super-resolution (SR) and high dynamic range imaging (HDRI), based on a convolutional neural network (CNN).
1 code implementation • CVPR 2019 • Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach
Among the main issues are the fluency and coherence of the generated descriptions, and their relevance to the video.
1 code implementation • 2 Aug 2017 • Jae Sung Park, Nam Ik Cho
This paper presents an algorithm that enhances undesirably illuminated images by generating and fusing multi-level illuminations from a single image. The input image is first decomposed into illumination and reflectance components by using an edge-preserving smoothing filter.
no code implementations • 8 Jul 2017 • Jae Sung Park, Biao Jia, Mohit Bansal, Dinesh Manocha
We generate a factor graph from natural language instructions called the Dynamic Grounding Graph (DGG), which takes latent parameters into account.
Robotics