Greenback Bears and Fiscal Hawks: Finance is a Jungle and Text Embeddings Must Adapt

no code implementations11 Nov 2024 Peter Anderson, Mano Vikash Janardhanan, Jason He, Wei Cheng, Charlie Flanagan

Financial documents are filled with specialized terminology, arcane jargon, and curious acronyms that pose challenges for general-purpose text embeddings.

Question Answering

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

no code implementations27 Oct 2023 Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang

With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above.

Question Answering Question Generation +3

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

no code implementations CVPR 2023 Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan

Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.

Image Inpainting Object +1

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

no code implementations CVPR 2023 Aishwarya Kamath, Peter Anderson, Su Wang, Jing Yu Koh, Alexander Ku, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh

Recent studies in Vision-and-Language Navigation (VLN) train RL agents to execute natural-language navigation instructions in photorealistic environments, as a step towards robots that can follow human instructions.

 Ranked #1 on Vision and Language Navigation on RxR (using extra training data)

Imitation Learning Instruction Following +1

Iterative Vision-and-Language Navigation

no code implementations CVPR 2023 Jacob Krantz, Shurjo Banerjee, Wang Zhu, Jason Corso, Peter Anderson, Stefan Lee, Jesse Thomason

We present Iterative Vision-and-Language Navigation (IVLN), a paradigm for evaluating language-guided agents navigating in a persistent environment over time.

Instruction Following Vision and Language Navigation

Pathdreamer: A World Model for Indoor Navigation

1 code implementation ICCV 2021 Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

People navigating in unfamiliar buildings take advantage of myriad visual, spatial and semantic cues to efficiently achieve their navigation goals.

Semantic Segmentation Vision and Language Navigation

PanGEA: The Panoramic Graph Environment Annotation Toolkit

no code implementations NAACL (ALVR) 2021 Alexander Ku, Peter Anderson, Jordi Pont-Tuset, Jason Baldridge

PanGEA, the Panoramic Graph Environment Annotation toolkit, is a lightweight toolkit for collecting speech and text annotations in photo-realistic 3D environments.

Instruction Following

On the Evaluation of Vision-and-Language Navigation Instructions

no code implementations EACL 2021 Ming Zhao, Peter Anderson, Vihan Jain, Su Wang, Alexander Ku, Jason Baldridge, Eugene Ie

Vision-and-Language Navigation wayfinding agents can be enhanced by exploiting automatically generated navigation instructions.

Vision and Language Navigation

Where Are You? Localization from Embodied Dialog

2 code implementations EMNLP 2020 Meera Hahn, Jacob Krantz, Dhruv Batra, Devi Parikh, James M. Rehg, Stefan Lee, Peter Anderson

In this paper, we focus on the LED task -- providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices.

Navigate Visual Dialog

Sim-to-Real Transfer for Vision-and-Language Navigation

1 code implementation7 Nov 2020 Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee

We study the challenging problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions.

Vision and Language Navigation

Extended Abstract: Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

no code implementations ICML Workshop LaReL 2020 Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

Following a navigation instruction such as 'Walk down the stairs and stop near the sofa' requires an agent to ground scene elements referenced via language (e. g.'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Vision and Language Navigation

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

1 code implementation ECCV 2020 Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

Following a navigation instruction such as 'Walk down the stairs and stop at the brown sofa' requires embodied AI agents to ground scene elements referenced via language (e. g. 'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Vision and Language Navigation

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

1 code implementation CVPR 2020 Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton Van Den Hengel

One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language.

Referring Expression Vision and Language Navigation

nocaps: novel object captioning at scale

2 code implementations ICCV 2019 Harsh Agrawal, Karan Desai, YuFei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson

To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task.

Image Captioning Object +2

Disfluency Detection using Auto-Correlational Neural Networks

4 code implementations EMNLP 2018 Paria Jamshid Lou, Peter Anderson, Mark Johnson

In recent years, the natural language processing community has moved away from task-specific feature engineering, i. e., researchers discovering ad-hoc feature representations for various tasks, in favor of general-purpose methods that learn the input representation by themselves.

Feature Engineering

Face-Cap: Image Captioning using Facial Expression Analysis

1 code implementation6 Jul 2018 Omid Mohamad Nezami, Mark Dras, Peter Anderson, Len Hamey

In this work, we present two variants of our Face-Cap model, which embed facial expression features in different ways, to generate image captions.

Descriptive Image Captioning

Predicting accuracy on large datasets from smaller pilot data

no code implementations ACL 2018 Mark Johnson, Peter Anderson, Mark Dras, Mark Steedman

Because obtaining training data is often the most difficult part of an NLP or ML project, we develop methods for predicting how much data is required to achieve a desired test accuracy by extrapolating results from models trained on a small pilot training dataset.

Document Classification

Connecting Language and Vision to Actions

no code implementations ACL 2018 Peter Anderson, Abhishek Das, Qi Wu

A long-term goal of AI research is to build intelligent agents that can see the rich visual environment around us, communicate this understanding in natural language to humans and other agents, and act in a physical or embodied environment.

Image Captioning Language Modeling +4

Partially-Supervised Image Captioning

no code implementations NeurIPS 2018 Peter Anderson, Stephen Gould, Mark Johnson

To address this problem, we teach image captioning models new visual concepts from labeled images and object detection datasets.

Image Captioning Object +3

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

8 code implementations CVPR 2018 Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton Van Den Hengel

This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering.

Reinforcement Learning Translation +3

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

65 code implementations CVPR 2018 Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.

Image Captioning Visual Question Answering

SPICE: Semantic Propositional Image Caption Evaluation

11 code implementations29 Jul 2016 Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould

There is considerable interest in the task of automatically generating image captions.

Image Captioning

