no code implementations • 2 Apr 2025 • Tengda Han, Dilara Gokay, Joseph Heyward, Chuhan Zhang, Daniel Zoran, Viorica Pătrăucean, João Carreira, Dima Damen, Andrew Zisserman
We demonstrate the drop in performance when moving from shuffled to sequential learning on three tasks: the one-video representation learning method DoRA, standard VideoMAE on multi-video datasets, and the task of future video prediction.
no code implementations • 15 Oct 2024 • Toby Perrett, Tengda Han, Dima Damen, Andrew Zisserman
We introduce two benchmarks for unique captioning, based on egocentric footage and timeloop movies - where repeating actions are common.
1 code implementation • 22 Jul 2024 • Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
Our objective is to generate Audio Descriptions (ADs) for both movies and TV series in a training-free manner.
2 code implementations • 5 Jul 2024 • Niki Amini-Naieni, Tengda Han, Andrew Zisserman
The goal of this paper is to improve the generality and accuracy of open-vocabulary object counting in images.
Ranked #1 on
Object Counting
on FSC147
no code implementations • 22 Apr 2024 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names.
no code implementations • 1 Apr 2024 • Joao F. Henriques, Dylan Campbell, Tengda Han
As the horses have long left the barn, our proposal may be seen as antiquated and irrelevant.
no code implementations • CVPR 2024 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names.
no code implementations • 21 Dec 2023 • Zeqian Li, Qirui Chen, Tengda Han, Ya zhang, Yanfeng Wang, Weidi Xie
In this paper, we aim to establish an automatic, scalable pipeline for denoising the large-scale instructional dataset and construct a high-quality video-text dataset with multiple descriptive steps supervision, named HowToStep.
no code implementations • 10 Oct 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.
1 code implementation • CVPR 2024 • Lukas Knobel, Tengda Han, Yuki M. Asano
While recent supervised methods for reference-based object counting continue to improve the performance on benchmark datasets, they have to rely on small datasets due to the cost associated with manually annotating dozens of objects in images.
1 code implementation • 2 Jun 2023 • Niki Amini-Naieni, Kiana Amini-Naieni, Tengda Han, Andrew Zisserman
Our objective is open-world object counting in images, where the target object class is specified by a text description.
Ranked #5 on
Zero-Shot Counting
on FSC147
1 code implementation • CVPR 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form.
no code implementations • ICCV 2023 • Tengda Han, Max Bain, Arsha Nagrani, Gul Varol, Weidi Xie, Andrew Zisserman
Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.
1 code implementation • 12 Oct 2022 • Jochem Loedeman, Maarten C. Stol, Tengda Han, Yuki M. Asano
With the introduction of the transformer architecture in computer vision, increasing model scale has been demonstrated as a clear path to achieving performance and robustness gains.
no code implementations • 10 Oct 2022 • Tengda Han, Weidi Xie, Andrew Zisserman
The objective of this paper is an efficient training method for video tasks.
6 code implementations • DeepMind 2022 • Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, Karen Simonyan
Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research.
Ranked #1 on
Action Recognition
on RareAct
1 code implementation • CVPR 2022 • Tengda Han, Weidi Xie, Andrew Zisserman
The objective of this paper is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to: (1) determine if a sentence is alignable with the video; and (2) if it is alignable, then determine its alignment.
1 code implementation • 8 Dec 2021 • Chen Ju, Tengda Han, Kunhao Zheng, Ya zhang, Weidi Xie
Image-based visual-language (I-VL) pre-training has shown great success for learning joint visual-textual representations from large-scale web data, revealing remarkable ability for zero-shot generalisation.
Ranked #5 on
Zero-Shot Action Detection
on ActivityNet-1.3
1 code implementation • NeurIPS 2020 • Tengda Han, Weidi Xie, Andrew Zisserman
The objective of this paper is visual-only self-supervised video representation learning.
Ranked #8 on
Self-Supervised Action Recognition Linear
on HMDB51
1 code implementation • ECCV 2020 • Tengda Han, Weidi Xie, Andrew Zisserman
The objective of this paper is self-supervised learning from video, in particular for representations for action recognition.
1 code implementation • 10 Sep 2019 • Tengda Han, Weidi Xie, Andrew Zisserman
The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition.
Ranked #10 on
Self-Supervised Action Recognition Linear
on UCF101
Representation Learning
Self-Supervised Action Recognition
+3
no code implementations • 19 Sep 2017 • Tengda Han, Jue Wang, Anoop Cherian, Stephen Gould
For effective human-robot interaction, it is important that a robotic assistant can forecast the next action a human will consider in a given task.
no code implementations • 24 Jul 2017 • Sam Toyer, Anoop Cherian, Tengda Han, Stephen Gould
Human pose forecasting is an important problem in computer vision with applications to human-robot interaction, visual surveillance, and autonomous driving.