Search Results for author: Kenneth Marino

Found 13 papers, 5 papers with code

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

1 code implementation • CVPR 2019 • Kenneth Marino, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi

In this paper, we address the task of knowledge-based visual question answering and provide a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods that rely on external knowledge resources.

object-detection Object Detection +3

Paper
Code

Same Object, Different Grasps: Data and Semantic Knowledge for Task-Oriented Grasping

1 code implementation • 12 Nov 2020 • Adithyavairavan Murali, Weiyu Liu, Kenneth Marino, Sonia Chernova, Abhinav Gupta

This is largely due to the scale of the datasets both in terms of the number of objects and tasks studied.

Robotic Grasping

Paper
Code

A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge

1 code implementation • 3 Jun 2022 • Dustin Schwenk, Apoorv Khandelwal, Christopher Clark, Kenneth Marino, Roozbeh Mottaghi

In contrast to the existing knowledge-based VQA datasets, the questions generally cannot be answered by simply querying a knowledge base, and instead require some form of commonsense reasoning about the scene depicted in the image.

Question Answering Visual Question Answering +1

Paper
Code

Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning

1 code implementation • ICLR 2021 • Valerie Chen, Abhinav Gupta, Kenneth Marino

We also find that incorporating natural language allows the model to generalize to unseen tasks in a zero-shot setting and to learn quickly from a few demonstrations.

Multi-Task Learning reinforcement-learning +1

Paper
Code

The Pose Knows: Video Forecasting by Generating Pose Futures

1 code implementation • ICCV 2017 • Jacob Walker, Kenneth Marino, Abhinav Gupta, Martial Hebert

First we explicitly model the high level structure of active objects in the scene---humans---and use a VAE to model the possible future movements of humans in the pose space.

Ranked #2 on Human Pose Forecasting on Human3.6M (CMD metric)

Human Pose Forecasting Video Prediction

Paper
Code

The More You Know: Using Knowledge Graphs for Image Classification

no code implementations • CVPR 2017 • Kenneth Marino, Ruslan Salakhutdinov, Abhinav Gupta

One characteristic that sets humans apart from modern learning-based computer vision algorithms is the ability to acquire knowledge about the world and use that knowledge to reason about the visual world.

Classification General Classification +3

Paper
Add Code

Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies

no code implementations • ICLR 2019 • Kenneth Marino, Abhinav Gupta, Rob Fergus, Arthur Szlam

The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time.

Paper
Add Code

Empirically Verifying Hypotheses Using Reinforcement Learning

no code implementations • 29 Jun 2020 • Kenneth Marino, Rob Fergus, Arthur Szlam, Abhinav Gupta

This paper formulates hypothesis verification as an RL problem.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA

no code implementations • CVPR 2021 • Kenneth Marino, Xinlei Chen, Devi Parikh, Abhinav Gupta, Marcus Rohrbach

One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.

Ranked #6 on Visual Question Answering (VQA) on A-OKVQA

Visual Question Answering (VQA)

Paper
Add Code

Agent as Scientist: Learning to Verify Hypotheses

no code implementations • 25 Sep 2019 • Kenneth Marino, Rob Fergus, Arthur Szlam, Abhinav Gupta

In order to train the agents, we exploit the underlying structure in the majority of hypotheses -- they can be formulated as triplets (pre-condition, action sequence, post-condition).

Paper
Add Code

Learning to Navigate Wikipedia by Taking Random Walks

no code implementations • 31 Oct 2022 • Manzil Zaheer, Kenneth Marino, Will Grathwohl, John Schultz, Wendy Shang, Sheila Babayan, Arun Ahuja, Ishita Dasgupta, Christine Kaeser-Chen, Rob Fergus

A fundamental ability of an intelligent web-based agent is seeking out and acquiring new information.

Fact Verification Navigate +1

Paper
Add Code

Distilling Internet-Scale Vision-Language Models into Embodied Agents

no code implementations • 29 Jan 2023 • Theodore Sumers, Kenneth Marino, Arun Ahuja, Rob Fergus, Ishita Dasgupta

Instruction-following agents must ground language into their observation and action spaces.

Instruction Following

Paper
Add Code

Collaborating with language models for embodied reasoning

no code implementations • 1 Feb 2023 • Ishita Dasgupta, Christine Kaeser-Chen, Kenneth Marino, Arun Ahuja, Sheila Babayan, Felix Hill, Rob Fergus

On the other hand, Large Scale Language Models (LSLMs) have exhibited strong reasoning ability and the ability to to adapt to new tasks through in-context learning.

In-Context Learning Language Modelling +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.