no code implementations • 11 Nov 2024 • Peter Anderson, Mano Vikash Janardhanan, Jason He, Wei Cheng, Charlie Flanagan
Financial documents are filled with specialized terminology, arcane jargon, and curious acronyms that pose challenges for general-purpose text embeddings.
no code implementations • 27 Dec 2023 • Siddhartha Datta, Alexander Ku, Deepak Ramachandran, Peter Anderson
Text-to-image generation models are powerful but difficult to use.
no code implementations • 27 Oct 2023 • Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang
With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above.
no code implementations • CVPR 2023 • Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan
Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.
no code implementations • CVPR 2023 • Aishwarya Kamath, Peter Anderson, Su Wang, Jing Yu Koh, Alexander Ku, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh
Recent studies in Vision-and-Language Navigation (VLN) train RL agents to execute natural-language navigation instructions in photorealistic environments, as a step towards robots that can follow human instructions.
Ranked #1 on Vision and Language Navigation on RxR (using extra training data)
no code implementations • CVPR 2023 • Jacob Krantz, Shurjo Banerjee, Wang Zhu, Jason Corso, Peter Anderson, Stefan Lee, Jesse Thomason
We present Iterative Vision-and-Language Navigation (IVLN), a paradigm for evaluating language-guided agents navigating in a persistent environment over time.
1 code implementation • 6 Apr 2022 • Jing Yu Koh, Harsh Agrawal, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
We study the problem of synthesizing immersive 3D indoor scenes from one or more images.
no code implementations • CVPR 2022 • Su Wang, Ceslee Montgomery, Jordi Orbay, Vighnesh Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha Jaques, Austin Waters, Jason Baldridge, Peter Anderson
We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes.
1 code implementation • ICCV 2021 • Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
People navigating in unfamiliar buildings take advantage of myriad visual, spatial and semantic cues to efficiently achieve their navigation goals.
no code implementations • NAACL (ALVR) 2021 • Alexander Ku, Peter Anderson, Jordi Pont-Tuset, Jason Baldridge
PanGEA, the Panoramic Graph Environment Annotation toolkit, is a lightweight toolkit for collecting speech and text annotations in photo-realistic 3D environments.
no code implementations • EACL 2021 • Ming Zhao, Peter Anderson, Vihan Jain, Su Wang, Alexander Ku, Jason Baldridge, Eugene Ie
Vision-and-Language Navigation wayfinding agents can be enhanced by exploiting automatically generated navigation instructions.
2 code implementations • EMNLP 2020 • Meera Hahn, Jacob Krantz, Dhruv Batra, Devi Parikh, James M. Rehg, Stefan Lee, Peter Anderson
In this paper, we focus on the LED task -- providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices.
1 code implementation • 7 Nov 2020 • Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee
We study the challenging problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions.
3 code implementations • EMNLP 2020 • Alexander Ku, Peter Anderson, Roma Patel, Eugene Ie, Jason Baldridge
We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN) dataset.
Ranked #5 on Vision and Language Navigation on RxR
1 code implementation • ECCV 2020 • Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal
Further, each head in our multi-head self-attention layer focuses on a different subset of relations.
no code implementations • ICML Workshop LaReL 2020 • Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra
Following a navigation instruction such as 'Walk down the stairs and stop near the sofa' requires an agent to ground scene elements referenced via language (e. g.'stairs') to visual content in the environment (pixels corresponding to 'stairs').
1 code implementation • ECCV 2020 • Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra
Following a navigation instruction such as 'Walk down the stairs and stop at the brown sofa' requires embodied AI agents to ground scene elements referenced via language (e. g. 'stairs') to visual content in the environment (pixels corresponding to 'stairs').
Ranked #6 on Vision and Language Navigation on VLN Challenge
1 code implementation • NeurIPS 2019 • Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee
Our experiments show that our approach outperforms a strong LingUNet baseline when predicting the goal location on the map.
1 code implementation • CVPR 2020 • Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton Van Den Hengel
One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language.
2 code implementations • 25 Jan 2019 • Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K. Marks, Chiori Hori, Peter Anderson, Stefan Lee, Devi Parikh
We introduce the task of scene-aware dialog.
2 code implementations • ICCV 2019 • Harsh Agrawal, Karan Desai, YuFei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson
To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task.
4 code implementations • EMNLP 2018 • Paria Jamshid Lou, Peter Anderson, Mark Johnson
In recent years, the natural language processing community has moved away from task-specific feature engineering, i. e., researchers discovering ad-hoc feature representations for various tasks, in favor of general-purpose methods that learn the input representation by themselves.
10 code implementations • 18 Jul 2018 • Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir
Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence.
1 code implementation • 6 Jul 2018 • Omid Mohamad Nezami, Mark Dras, Peter Anderson, Len Hamey
In this work, we present two variants of our Face-Cap model, which embed facial expression features in different ways, to generate image captions.
no code implementations • ACL 2018 • Mark Johnson, Peter Anderson, Mark Dras, Mark Steedman
Because obtaining training data is often the most difficult part of an NLP or ML project, we develop methods for predicting how much data is required to achieve a desired test accuracy by extrapolating results from models trained on a small pilot training dataset.
no code implementations • ACL 2018 • Peter Anderson, Abhishek Das, Qi Wu
A long-term goal of AI research is to build intelligent agents that can see the rich visual environment around us, communicate this understanding in natural language to humans and other agents, and act in a physical or embodied environment.
no code implementations • NeurIPS 2018 • Peter Anderson, Stephen Gould, Mark Johnson
To address this problem, we teach image captioning models new visual concepts from labeled images and object detection datasets.
8 code implementations • CVPR 2018 • Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton Van Den Hengel
This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering.
Ranked #11 on Visual Navigation on R2R
10 code implementations • CVPR 2018 • Damien Teney, Peter Anderson, Xiaodong He, Anton Van Den Hengel
This paper presents a state-of-the-art model for visual question answering (VQA), which won the first place in the 2017 VQA Challenge.
Ranked #30 on Visual Question Answering (VQA) on VQA v2 test-std
65 code implementations • CVPR 2018 • Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang
Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.
Ranked #29 on Visual Question Answering (VQA) on VQA v2 test-std
1 code implementation • EMNLP 2017 • Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould
Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects.
11 code implementations • 29 Jul 2016 • Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould
There is considerable interest in the task of automatically generating image captions.
no code implementations • 19 Jul 2016 • Stephen Gould, Basura Fernando, Anoop Cherian, Peter Anderson, Rodrigo Santa Cruz, Edison Guo
Some recent works in machine learning and computer vision involve the solution of a bi-level optimization problem.
no code implementations • CVPR 2016 • Basura Fernando, Peter Anderson, Marcus Hutter, Stephen Gould
We present hierarchical rank pooling, a video sequence encoding method for activity recognition.