Search Results for author: Peter Anderson

Found 33 papers, 19 papers with code

Prompt Expansion for Adaptive Text-to-Image Generation

no code implementations • 27 Dec 2023 • Siddhartha Datta, Alexander Ku, Deepak Ramachandran, Peter Anderson

Text-to-image generation models are powerful but difficult to use.

Paper
Add Code

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

no code implementations • 27 Oct 2023 • Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang

With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above.

Question Answering Question Generation +3

Paper
Add Code

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

no code implementations • CVPR 2023 • Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan

Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.

Image Inpainting Object +1

Paper
Add Code

Iterative Vision-and-Language Navigation

no code implementations • CVPR 2023 • Jacob Krantz, Shurjo Banerjee, Wang Zhu, Jason Corso, Peter Anderson, Stefan Lee, Jesse Thomason

We present Iterative Vision-and-Language Navigation (IVLN), a paradigm for evaluating language-guided agents navigating in a persistent environment over time.

Instruction Following Vision and Language Navigation

Paper
Add Code

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

no code implementations • CVPR 2023 • Aishwarya Kamath, Peter Anderson, Su Wang, Jing Yu Koh, Alexander Ku, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh

Recent studies in Vision-and-Language Navigation (VLN) train RL agents to execute natural-language navigation instructions in photorealistic environments, as a step towards robots that can follow human instructions.

Ranked #1 on Vision and Language Navigation on RxR (using extra training data)

Imitation Learning Instruction Following +1

Paper
Add Code

Simple and Effective Synthesis of Indoor 3D Scenes

1 code implementation • 6 Apr 2022 • Jing Yu Koh, Harsh Agrawal, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

We study the problem of synthesizing immersive 3D indoor scenes from one or more images.

Data Augmentation Vision and Language Navigation

Paper
Code

Less is More: Generating Grounded Navigation Instructions from Landmarks

no code implementations • CVPR 2022 • Su Wang, Ceslee Montgomery, Jordi Orbay, Vighnesh Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha Jaques, Austin Waters, Jason Baldridge, Peter Anderson

We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes.

Instruction Following Visual Grounding

Paper
Add Code

Pathdreamer: A World Model for Indoor Navigation

1 code implementation • ICCV 2021 • Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

People navigating in unfamiliar buildings take advantage of myriad visual, spatial and semantic cues to efficiently achieve their navigation goals.

Semantic Segmentation Vision and Language Navigation

139

Paper
Code

PanGEA: The Panoramic Graph Environment Annotation Toolkit

no code implementations • NAACL (ALVR) 2021 • Alexander Ku, Peter Anderson, Jordi Pont-Tuset, Jason Baldridge

PanGEA, the Panoramic Graph Environment Annotation toolkit, is a lightweight toolkit for collecting speech and text annotations in photo-realistic 3D environments.

Instruction Following

Paper
Add Code

On the Evaluation of Vision-and-Language Navigation Instructions

no code implementations • EACL 2021 • Ming Zhao, Peter Anderson, Vihan Jain, Su Wang, Alexander Ku, Jason Baldridge, Eugene Ie

Vision-and-Language Navigation wayfinding agents can be enhanced by exploiting automatically generated navigation instructions.

Vision and Language Navigation

Paper
Add Code

Where Are You? Localization from Embodied Dialog

2 code implementations • EMNLP 2020 • Meera Hahn, Jacob Krantz, Dhruv Batra, Devi Parikh, James M. Rehg, Stefan Lee, Peter Anderson

In this paper, we focus on the LED task -- providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices.

Navigate Visual Dialog

Paper
Code

Sim-to-Real Transfer for Vision-and-Language Navigation

1 code implementation • 7 Nov 2020 • Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee

We study the challenging problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions.

Vision and Language Navigation

Paper
Code

Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding

3 code implementations • EMNLP 2020 • Alexander Ku, Peter Anderson, Roma Patel, Eugene Ie, Jason Baldridge

We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN) dataset.

Ranked #5 on Vision and Language Navigation on RxR

Vision and Language Navigation

215

Paper
Code

Spatially Aware Multimodal Transformers for TextVQA

1 code implementation • ECCV 2020 • Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal

Further, each head in our multi-head self-attention layer focuses on a different subset of relations.

Optical Character Recognition (OCR) Visual Grounding +1

Paper
Code

Extended Abstract: Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

no code implementations • ICML Workshop LaReL 2020 • Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

Following a navigation instruction such as 'Walk down the stairs and stop near the sofa' requires an agent to ground scene elements referenced via language (e. g.'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Vision and Language Navigation

Paper
Add Code

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

1 code implementation • ECCV 2020 • Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

Following a navigation instruction such as 'Walk down the stairs and stop at the brown sofa' requires embodied AI agents to ground scene elements referenced via language (e. g. 'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Ranked #6 on Vision and Language Navigation on VLN Challenge

Vision and Language Navigation

Paper
Code

Chasing Ghosts: Instruction Following as Bayesian State Tracking

1 code implementation • NeurIPS 2019 • Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee

Our experiments show that our approach outperforms a strong LingUNet baseline when predicting the goal location on the map.

Instruction Following Vision and Language Navigation

Paper
Code

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

1 code implementation • CVPR 2020 • Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton Van Den Hengel

One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language.

Referring Expression Vision and Language Navigation

105

Paper
Code

Audio-Visual Scene-Aware Dialog

2 code implementations • 25 Jan 2019 • Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K. Marks, Chiori Hori, Peter Anderson, Stefan Lee, Devi Parikh

We introduce the task of scene-aware dialog.

Scene-Aware Dialogue

Paper
Code

nocaps: novel object captioning at scale

2 code implementations • ICCV 2019 • Harsh Agrawal, Karan Desai, YuFei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson

To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task.

Image Captioning Object +2

Paper
Code

Disfluency Detection using Auto-Correlational Neural Networks

4 code implementations • EMNLP 2018 • Paria Jamshid Lou, Peter Anderson, Mark Johnson

In recent years, the natural language processing community has moved away from task-specific feature engineering, i. e., researchers discovering ad-hoc feature representations for various tasks, in favor of general-purpose methods that learn the input representation by themselves.

Feature Engineering

Paper
Code

On Evaluation of Embodied Navigation Agents

9 code implementations • 18 Jul 2018 • Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir

Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence.

Benchmarking

1,697

Paper
Code

Face-Cap: Image Captioning using Facial Expression Analysis

1 code implementation • 6 Jul 2018 • Omid Mohamad Nezami, Mark Dras, Peter Anderson, Len Hamey

In this work, we present two variants of our Face-Cap model, which embed facial expression features in different ways, to generate image captions.

Descriptive Image Captioning

Paper
Code

Connecting Language and Vision to Actions

no code implementations • ACL 2018 • Peter Anderson, Abhishek Das, Qi Wu

A long-term goal of AI research is to build intelligent agents that can see the rich visual environment around us, communicate this understanding in natural language to humans and other agents, and act in a physical or embodied environment.

Image Captioning Language Modelling +3

Paper
Add Code

Predicting accuracy on large datasets from smaller pilot data

no code implementations • ACL 2018 • Mark Johnson, Peter Anderson, Mark Dras, Mark Steedman

Because obtaining training data is often the most difficult part of an NLP or ML project, we develop methods for predicting how much data is required to achieve a desired test accuracy by extrapolating results from models trained on a small pilot training dataset.

Document Classification

Paper
Add Code

Partially-Supervised Image Captioning

no code implementations • NeurIPS 2018 • Peter Anderson, Stephen Gould, Mark Johnson

To address this problem, we teach image captioning models new visual concepts from labeled images and object detection datasets.

Image Captioning Object +3

Paper
Add Code

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

8 code implementations • CVPR 2018 • Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton Van Den Hengel

This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering.

Ranked #10 on Visual Navigation on R2R

Translation Vision and Language Navigation +2

452

Paper
Code

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

10 code implementations • CVPR 2018 • Damien Teney, Peter Anderson, Xiaodong He, Anton Van Den Hengel

This paper presents a state-of-the-art model for visual question answering (VQA), which won the first place in the 2017 VQA Challenge.

Ranked #30 on Visual Question Answering (VQA) on VQA v2 test-std

Visual Question Answering

1,402

Paper
Code

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

65 code implementations • CVPR 2018 • Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.

Ranked #29 on Visual Question Answering (VQA) on VQA v2 test-std

Image Captioning Visual Question Answering

5,413

Paper
Code

Guided Open Vocabulary Image Captioning with Constrained Beam Search

1 code implementation • EMNLP 2017 • Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould

Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects.

Image Captioning TAG +1

Paper
Code

SPICE: Semantic Propositional Image Caption Evaluation

11 code implementations • 29 Jul 2016 • Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould

There is considerable interest in the task of automatically generating image captions.

Image Captioning

1,060

Paper
Code

On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization

no code implementations • 19 Jul 2016 • Stephen Gould, Basura Fernando, Anoop Cherian, Peter Anderson, Rodrigo Santa Cruz, Edison Guo

Some recent works in machine learning and computer vision involve the solution of a bi-level optimization problem.

BIG-bench Machine Learning

Paper
Add Code

Discriminative Hierarchical Rank Pooling for Activity Recognition

no code implementations • CVPR 2016 • Basura Fernando, Peter Anderson, Marcus Hutter, Stephen Gould

We present hierarchical rank pooling, a video sequence encoding method for activity recognition.

Action Recognition Temporal Action Localization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.