Search Results for author: Davide Moltisanti

Found 15 papers, 9 papers with code

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

no code implementations CVPR 2025 Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha, Omar Emara, Sam Pollard, Kranti Parida, Kaiting Liu, Prajwal Gatti, Siddhant Bansal, Kevin Flanagan, Jacob Chalk, Zhifan Zhu, Rhodri Guerrier, Fahd Abdelazim, Bin Zhu, Davide Moltisanti, Michael Wray, Hazel Doughty, Dima Damen

We present a validation dataset of newly-collected kitchen-based egocentric videos, manually annotated with highly detailed and interconnected ground-truth labels covering: recipe steps, fine-grained actions, ingredients with nutritional values, moving objects, and audio annotations.

Action Recognition Nutrition +5

Continual Learning Improves Zero-Shot Action Recognition

no code implementations14 Oct 2024 Shreyank N Gowda, Davide Moltisanti, Laura Sevilla-Lara

In this paper, we propose a novel method based on continual learning to address zero-shot action recognition.

Action Recognition Continual Learning +2

Coarse or Fine? Recognising Action End States without Labels

1 code implementation13 May 2024 Davide Moltisanti, Hakan Bilen, Laura Sevilla-Lara, Frank Keller

We use our synthetic data to train a model based on UNet and test it on real images showing coarsely/finely cut objects.

Action Recognition Object

Efficient Pre-training for Localized Instruction Generation of Videos

1 code implementation27 Nov 2023 Anil Batra, Davide Moltisanti, Laura Sevilla-Lara, Marcus Rohrbach, Frank Keller

To mitigate these issues, we propose a novel technique, Sieve-&-Swap, to automatically generate high-quality training data for the recipe domain: (i) Sieve: filters irrelevant transcripts and (ii) Swap: acquires high-quality text by replacing transcripts with human-written instruction from a text-only recipe dataset.

BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis

1 code implementation20 Jul 2022 Davide Moltisanti, Jinyi Wu, Bo Dai, Chen Change Loy

Estimating human keypoints from these videos is difficult due to the complexity of the dance, as well as the multiple moving cameras recording setup.

Motion Synthesis Pose Estimation

The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines

2 code implementations29 Apr 2020 Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

Our dataset features 55 hours of video consisting of 11. 5M frames, which we densely labelled for a total of 39. 6K action segments and 454. 2K object bounding boxes.

Object

Towards an Unequivocal Representation of Actions

no code implementations10 May 2018 Michael Wray, Davide Moltisanti, Dima Damen

This work introduces verb-only representations for actions and interactions; the problem of describing similar motions (e. g. 'open door', 'open cupboard'), and distinguish differing ones (e. g. 'open door' vs 'open bottle') using verb-only labels.

Action Recognition Retrieval +1

Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video

no code implementations ICCV 2017 Davide Moltisanti, Michael Wray, Walterio Mayol-Cuevas, Dima Damen

Manual annotations of temporal bounds for object interactions (i. e. start and end times) are typical training input to recognition, localization and detection algorithms.

Object

SEMBED: Semantic Embedding of Egocentric Action Videos

no code implementations28 Jul 2016 Michael Wray, Davide Moltisanti, Walterio Mayol-Cuevas, Dima Damen

We present SEMBED, an approach for embedding an egocentric object interaction video in a semantic-visual graph to estimate the probability distribution over its potential semantic labels.

General Classification Object

Cannot find the paper you are looking for? You can Submit a new open access paper.