1 code implementation • CVPR 2019 • Kenneth Marino, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi
In this paper, we address the task of knowledge-based visual question answering and provide a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods that rely on external knowledge resources.
1 code implementation • 12 Nov 2020 • Adithyavairavan Murali, Weiyu Liu, Kenneth Marino, Sonia Chernova, Abhinav Gupta
This is largely due to the scale of the datasets both in terms of the number of objects and tasks studied.
1 code implementation • 3 Jun 2022 • Dustin Schwenk, Apoorv Khandelwal, Christopher Clark, Kenneth Marino, Roozbeh Mottaghi
In contrast to the existing knowledge-based VQA datasets, the questions generally cannot be answered by simply querying a knowledge base, and instead require some form of commonsense reasoning about the scene depicted in the image.
1 code implementation • ICLR 2021 • Valerie Chen, Abhinav Gupta, Kenneth Marino
We also find that incorporating natural language allows the model to generalize to unseen tasks in a zero-shot setting and to learn quickly from a few demonstrations.
1 code implementation • ICCV 2017 • Jacob Walker, Kenneth Marino, Abhinav Gupta, Martial Hebert
First we explicitly model the high level structure of active objects in the scene---humans---and use a VAE to model the possible future movements of humans in the pose space.
Ranked #2 on Human Pose Forecasting on Human3.6M (CMD metric)
no code implementations • CVPR 2017 • Kenneth Marino, Ruslan Salakhutdinov, Abhinav Gupta
One characteristic that sets humans apart from modern learning-based computer vision algorithms is the ability to acquire knowledge about the world and use that knowledge to reason about the visual world.
no code implementations • ICLR 2019 • Kenneth Marino, Abhinav Gupta, Rob Fergus, Arthur Szlam
The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time.
no code implementations • 29 Jun 2020 • Kenneth Marino, Rob Fergus, Arthur Szlam, Abhinav Gupta
This paper formulates hypothesis verification as an RL problem.
no code implementations • CVPR 2021 • Kenneth Marino, Xinlei Chen, Devi Parikh, Abhinav Gupta, Marcus Rohrbach
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.
Ranked #6 on Visual Question Answering (VQA) on A-OKVQA
no code implementations • 25 Sep 2019 • Kenneth Marino, Rob Fergus, Arthur Szlam, Abhinav Gupta
In order to train the agents, we exploit the underlying structure in the majority of hypotheses -- they can be formulated as triplets (pre-condition, action sequence, post-condition).
no code implementations • 31 Oct 2022 • Manzil Zaheer, Kenneth Marino, Will Grathwohl, John Schultz, Wendy Shang, Sheila Babayan, Arun Ahuja, Ishita Dasgupta, Christine Kaeser-Chen, Rob Fergus
A fundamental ability of an intelligent web-based agent is seeking out and acquiring new information.
no code implementations • 29 Jan 2023 • Theodore Sumers, Kenneth Marino, Arun Ahuja, Rob Fergus, Ishita Dasgupta
Instruction-following agents must ground language into their observation and action spaces.
no code implementations • 1 Feb 2023 • Ishita Dasgupta, Christine Kaeser-Chen, Kenneth Marino, Arun Ahuja, Sheila Babayan, Felix Hill, Rob Fergus
On the other hand, Large Scale Language Models (LSLMs) have exhibited strong reasoning ability and the ability to to adapt to new tasks through in-context learning.