Search Results for author: Max Bain

Found 11 papers, 7 papers with code

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

no code implementations10 Oct 2023 Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.

Language Modelling Text Generation

OxfordVGG Submission to the EGO4D AV Transcription Challenge

1 code implementation18 Jul 2023 Jaesung Huh, Max Bain, Andrew Zisserman

This report presents the technical details of our submission on the EGO4D Audio-Visual (AV) Automatic Speech Recognition Challenge 2023 from the OxfordVGG team.

Automatic Speech Recognition speech-recognition +1

Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets

1 code implementation24 May 2023 Brandon Smith, Miguel Farinha, Siobhan Mackenzie Hall, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain

To address this issue, we propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets, where only the gender of the subject is edited and the background is fixed.

AutoAD: Movie Description in Context

1 code implementation CVPR 2023 Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form.

Image Captioning Text Generation

AutoAD II: The Sequel - Who, When, and What in Movie Audio Description

no code implementations ICCV 2023 Tengda Han, Max Bain, Arsha Nagrani, Gul Varol, Weidi Xie, Andrew Zisserman

Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.

Language Modelling Text Generation

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

1 code implementation22 Mar 2022 Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain

Vision-language models can encode societal biases and stereotypes, but there are challenges to measuring and mitigating these multimodal harms due to lacking measurement robustness and feature degradation.

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval

5 code implementations ICCV 2021 Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman

Our objective in this work is video-text retrieval - in particular a joint embedding that enables efficient text-to-video retrieval.

Ranked #4 on Video Retrieval on QuerYD (using extra training data)

Retrieval Text Retrieval +4

Count, Crop and Recognise: Fine-Grained Recognition in the Wild

no code implementations19 Sep 2019 Max Bain, Arsha Nagrani, Daniel Schofield, Andrew Zisserman

The goal of this paper is to label all the animal individuals present in every frame of a video.

Cannot find the paper you are looking for? You can Submit a new open access paper.