2 code implementations • 8 Apr 2024 • Jacob Chalk, Jaesung Huh, Evangelos Kazakos, Andrew Zisserman, Dima Damen
We address the interplay between the two modalities in long videos by explicitly modelling the temporal extents of audio and visual events.
no code implementations • 24 Jan 2024 • Hai X. Pham, Isma Hadji, Xinnuo Xu, Ziedune Degutyte, Jay Rainey, Evangelos Kazakos, Afsaneh Fazly, Georgios Tzimiropoulos, Brais Martinez
The key technological enabler is a novel mechanism for automatic question-answer generation from procedural text which can ingest large amounts of textual instructions and produce exhaustive in-domain QA training data.
1 code implementation • 1 Feb 2023 • Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, Andrew Zisserman
We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos.
1 code implementation • 1 Nov 2021 • Evangelos Kazakos, Jaesung Huh, Arsha Nagrani, Andrew Zisserman, Dima Damen
We capitalise on the action's temporal context and propose a method that learns to attend to surrounding actions in order to improve recognition performance.
2 code implementations • 5 Mar 2021 • Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen
We propose a two-stream convolutional network for audio recognition, that operates on time-frequency spectrogram inputs.
Ranked #1 on Human Interaction Recognition on EPIC-SOUNDS
7 code implementations • 23 Jun 2020 • Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray
This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS.
Ranked #6 on Action Anticipation on EPIC-KITCHENS-100
2 code implementations • 29 Apr 2020 • Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray
Our dataset features 55 hours of video consisting of 11. 5M frames, which we densely labelled for a total of 39. 6K action segments and 454. 2K object bounding boxes.
1 code implementation • ICCV 2019 • Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen
We focus on multi-modal fusion for egocentric action recognition, and propose a novel architecture for multi-modal temporal-binding, i. e. the combination of modalities within a range of temporal offsets.
Ranked #2 on Egocentric Activity Recognition on EPIC-KITCHENS-55
2 code implementations • ECCV 2018 • Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray
First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention.
no code implementations • 19 Sep 2017 • Michalis Vrigkas, Evangelos Kazakos, Christophoros Nikou, Ioannis A. Kakadiaris
In this work, a novel method based on the learning using privileged information (LUPI) paradigm for recognizing complex human activities is proposed that handles missing information during testing.
no code implementations • 31 Aug 2017 • Michalis Vrigkas, Evangelos Kazakos, Christophoros Nikou, Ioannis A. Kakadiaris
Classification models may often suffer from "structure imbalance" between training and testing data that may occur due to the deficient data collection process.