no code implementations • 18 Apr 2024 • Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei LI, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen, Samuel Phua, Yazheng Yang, Yi Tay, Yuqi Wang, Zhongkai Zhu, Zhihui Xie
On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e. g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation.
no code implementations • 10 Oct 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.
1 code implementation • 18 Jul 2023 • Jaesung Huh, Max Bain, Andrew Zisserman
This report presents the technical details of our submission on the EGO4D Audio-Visual (AV) Automatic Speech Recognition Challenge 2023 from the OxfordVGG team.
1 code implementation • 24 May 2023 • Brandon Smith, Miguel Farinha, Siobhan Mackenzie Hall, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain
To address this issue, we propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets, where only the gender of the subject is edited and the background is fixed.
1 code implementation • CVPR 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form.
no code implementations • ICCV 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gul Varol, Weidi Xie, Andrew Zisserman
Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.
1 code implementation • 17 May 2022 • Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman
Our goal in this paper is the adaptation of image-text models for long video retrieval.
Ranked #4 on Zero-Shot Action Recognition on Charades
1 code implementation • 22 Mar 2022 • Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain
Vision-language models can encode societal biases and stereotypes, but there are challenges to measuring and mitigating these multimodal harms due to lacking measurement robustness and feature degradation.
5 code implementations • ICCV 2021 • Max Bain, Arsha Nagrani, Gül Varol, Andrew Zisserman
Our objective in this work is video-text retrieval - in particular a joint embedding that enables efficient text-to-video retrieval.
Ranked #4 on Video Retrieval on QuerYD (using extra training data)
1 code implementation • 8 May 2020 • Max Bain, Arsha Nagrani, Andrew Brown, Andrew Zisserman
Our objective in this work is long range understanding of the narrative structure of movies.
no code implementations • 19 Sep 2019 • Max Bain, Arsha Nagrani, Daniel Schofield, Andrew Zisserman
The goal of this paper is to label all the animal individuals present in every frame of a video.