Search Results for author: Max Bain

Found 11 papers, 7 papers with code

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

no code implementations • 18 Apr 2024 • Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei LI, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen, Samuel Phua, Yazheng Yang, Yi Tay, Yuqi Wang, Zhongkai Zhu, Zhihui Xie

On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e. g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation.

Paper
Add Code

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

no code implementations • 10 Oct 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.

Language Modelling Text Generation

Paper
Add Code

OxfordVGG Submission to the EGO4D AV Transcription Challenge

1 code implementation • 18 Jul 2023 • Jaesung Huh, Max Bain, Andrew Zisserman

This report presents the technical details of our submission on the EGO4D Audio-Visual (AV) Automatic Speech Recognition Challenge 2023 from the OxfordVGG team.

Automatic Speech Recognition speech-recognition +1

8,876

Paper
Code

Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets

1 code implementation • 24 May 2023 • Brandon Smith, Miguel Farinha, Siobhan Mackenzie Hall, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain

To address this issue, we propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets, where only the gender of the subject is edited and the background is fixed.

Paper
Code

AutoAD: Movie Description in Context

1 code implementation • CVPR 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form.

Image Captioning Text Generation

134

Paper
Code

AutoAD II: The Sequel - Who, When, and What in Movie Audio Description

no code implementations • ICCV 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gul Varol, Weidi Xie, Andrew Zisserman

Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.

Language Modelling Text Generation

Paper
Add Code

A CLIP-Hitchhiker's Guide to Long Video Retrieval

1 code implementation • 17 May 2022 • Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman

Our goal in this paper is the adaptation of image-text models for long video retrieval.

Ranked #4 on Zero-Shot Action Recognition on Charades

Retrieval Video Retrieval +1

Paper
Code

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

1 code implementation • 22 Mar 2022 • Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain

Vision-language models can encode societal biases and stereotypes, but there are challenges to measuring and mitigating these multimodal harms due to lacking measurement robustness and feature degradation.

Paper
Code

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval

5 code implementations • ICCV 2021 • Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman

Our objective in this work is video-text retrieval - in particular a joint embedding that enables efficient text-to-video retrieval.

Ranked #4 on Video Retrieval on QuerYD (using extra training data)

Retrieval Text Retrieval +4

2,972

Paper
Code

Condensed Movies: Story Based Retrieval with Contextual Embeddings

1 code implementation • 8 May 2020 • Max Bain, Arsha Nagrani, Andrew Brown, Andrew Zisserman

Our objective in this work is long range understanding of the narrative structure of movies.

Retrieval Text to Video Retrieval +1

146

Paper
Code

Count, Crop and Recognise: Fine-Grained Recognition in the Wild

no code implementations • 19 Sep 2019 • Max Bain, Arsha Nagrani, Daniel Schofield, Andrew Zisserman

The goal of this paper is to label all the animal individuals present in every frame of a video.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.