Search Results for author: Dima Damen

Found 53 papers, 34 papers with code

Use Your Head: Improving Long-Tail Video Recognition

1 code implementation CVPR 2023 Toby Perrett, Saptarshi Sinha, Tilo Burghardt, Majid Mirmehdi, Dima Damen

We demonstrate that, unlike naturally-collected video datasets and existing long-tail image benchmarks, current video benchmarks fall short on multiple long-tailed properties.

Video Recognition

Epic-Sounds: A Large-scale Dataset of Actions That Sound

1 code implementation1 Feb 2023 Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, Andrew Zisserman

We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos.

Action Recognition

Refining Action Boundaries for One-stage Detection

1 code implementation25 Oct 2022 Hanyuan Wang, Majid Mirmehdi, Dima Damen, Toby Perrett

We obtain state-of-the-art performance on the challenging EPIC-KITCHENS-100 action detection as well as the standard THUMOS14 action detection benchmarks, and achieve improvement on the ActivityNet-1. 3 benchmark.

Action Detection

Play It Back: Iterative Attention for Audio Recognition

1 code implementation20 Oct 2022 Alexandros Stergiou, Dima Damen

A key function of auditory cognition is the association of characteristic sounds with their corresponding semantics over time.

Audio Classification

ConTra: (Con)text (Tra)nsformer for Cross-Modal Video Retrieval

1 code implementation9 Oct 2022 Adriano Fragomeni, Michael Wray, Dima Damen

When the clip is short or visually ambiguous, knowledge of its local temporal context (i. e. surrounding video segments) can be used to improve the retrieval performance.

Retrieval Video Retrieval

EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

3 code implementations26 Sep 2022 Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, Dima Damen

VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets.

Semantic Segmentation Video Object Segmentation +2

Inertial Hallucinations -- When Wearable Inertial Devices Start Seeing Things

no code implementations14 Jul 2022 Alessandro Masullo, Toby Perrett, Tilo Burghardt, Ian Craddock, Dima Damen, Majid Mirmehdi

We propose a novel approach to multimodal sensor fusion for Ambient Assisted Living (AAL) which takes advantage of learning using privileged information (LUPI).

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022

1 code implementation4 Jul 2022 Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, RongCheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou

In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR).

Language Modelling Object State Change Classification

An Evaluation of OCR on Egocentric Data

1 code implementation11 Jun 2022 Valentin Popescu, Dima Damen, Toby Perrett

In this paper, we evaluate state-of-the-art OCR methods on Egocentric data.

Optical Character Recognition (OCR)

Egocentric Video-Language Pretraining

1 code implementation3 Jun 2022 Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, RongCheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou

Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention.

Action Recognition Contrastive Learning +9

The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction

1 code implementation CVPR 2023 Alexandros Stergiou, Dima Damen

We propose a bottleneck-based attention model that captures the evolution of the action, through progressive sampling over fine-to-coarse scales.

Early Action Prediction

Dual-Domain Image Synthesis using Segmentation-Guided GAN

1 code implementation19 Apr 2022 Dena Bazazian, Andrew Calway, Dima Damen

We build on the successes of few-shot StyleGAN and single-shot semantic segmentation to minimise the amount of training required in utilising two domains.

Caricature Image Generation +1

Hand-Object Interaction Reasoning

no code implementations13 Jan 2022 Jian Ma, Dima Damen

This paper proposes an interaction reasoning network for modelling spatio-temporal relationships between hands and objects in video.

Action Recognition

TVNet: Temporal Voting Network for Action Localization

1 code implementation2 Jan 2022 Hanyuan Wang, Dima Damen, Majid Mirmehdi, Toby Perrett

This incorporates a novel Voting Evidence Module to locate temporal boundaries, more accurately, where temporal contextual evidence is accumulated to predict frame-level probabilities of start and end action boundaries.

Action Localization

UnweaveNet: Unweaving Activity Stories

1 code implementation CVPR 2022 Will Price, Carl Vondrick, Dima Damen

Our lives can be seen as a complex weaving of activities; we switch from one activity to another, to maximise our achievements or in reaction to demands placed upon us.

With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition

1 code implementation1 Nov 2021 Evangelos Kazakos, Jaesung Huh, Arsha Nagrani, Andrew Zisserman, Dima Damen

We capitalise on the action's temporal context and propose a method that learns to attend to surrounding actions in order to improve recognition performance.

Action Recognition Language Modelling

Domain Adaptation in Multi-View Embedding for Cross-Modal Video Retrieval

no code implementations25 Oct 2021 Jonathan Munro, Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen

Given a gallery of uncaptioned video sequences, this paper considers the task of retrieving videos based on their relevance to an unseen text query.

Retrieval Unsupervised Domain Adaptation +1

Ego4D: Around the World in 3,000 Hours of Egocentric Video

3 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

On Semantic Similarity in Video Retrieval

3 code implementations CVPR 2021 Michael Wray, Hazel Doughty, Dima Damen

Current video retrieval efforts all found their evaluation on an instance-based assumption, that only a single caption is relevant to a query video and vice versa.

Retrieval Semantic Similarity +2

Slow-Fast Auditory Streams For Audio Recognition

2 code implementations5 Mar 2021 Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen

We propose a two-stream convolutional network for audio recognition, that operates on time-frequency spectrogram inputs.

Audio Classification

Play Fair: Frame Attributions in Video Models

1 code implementation24 Nov 2020 Will Price, Dima Damen

We offer detailed analysis of supporting/distracting frames, and the relationships of ESVs to the frame's position, class prediction, and sequence length.

Action Recognition Relational Reasoning

Supervision Levels Scale (SLS)

1 code implementation22 Aug 2020 Dima Damen, Michael Wray

We propose a three-dimensional discrete and incremental scale to encode a method's level of supervision - i. e. the data and labels used when training a model to achieve a given performance.

Meta-Learning with Context-Agnostic Initialisations

1 code implementation29 Jul 2020 Toby Perrett, Alessandro Masullo, Tilo Burghardt, Majid Mirmehdi, Dima Damen

This produces an initialisation for fine-tuning to target which is both context-agnostic and task-generalised.


The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines

2 code implementations29 Apr 2020 Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

Our dataset features 55 hours of video consisting of 11. 5M frames, which we densely labelled for a total of 39. 6K action segments and 454. 2K object bounding boxes.

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

1 code implementation CVPR 2020 Jonathan Munro, Dima Damen

We then combine adversarial training with multi-modal self-supervision, showing that our approach outperforms other UDA methods by 3%.

Fine-grained Action Recognition Optical Flow Estimation +1

Action Modifiers: Learning from Adverbs in Instructional Videos

1 code implementation CVPR 2020 Hazel Doughty, Ivan Laptev, Walterio Mayol-Cuevas, Dima Damen

We present a method to learn a representation for adverbs from instructional videos using weak supervision from the accompanying narrations.

Retrieval Weakly-supervised Learning

Weakly-Supervised Completion Moment Detection using Temporal Attention

no code implementations22 Oct 2019 Farnoosh Heidarivincheh, Majid Mirmehdi, Dima Damen

In this work, we target detecting the completion moment of actions, that is the moment when the action's goal has been successfully accomplished.

Sit-to-Stand Analysis in the Wild using Silhouettes for Longitudinal Health Monitoring

no code implementations3 Oct 2019 Alessandro Masullo, Tilo Burghardt, Toby Perrett, Dima Damen, Majid Mirmehdi

We present the first fully automated Sit-to-Stand or Stand-to-Sit (StS) analysis framework for long-term monitoring of patients in free-living environments using video silhouettes.


EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition

1 code implementation ICCV 2019 Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen

We focus on multi-modal fusion for egocentric action recognition, and propose a novel architecture for multi-modal temporal-binding, i. e. the combination of modalities within a range of temporal offsets.

Action Recognition Egocentric Activity Recognition

Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings

no code implementations ICCV 2019 Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen

We report the first retrieval results on fine-grained actions for the large-scale EPIC dataset, in a generalised zero-shot setting.

Cross-Modal Retrieval POS +3

An Evaluation of Action Recognition Models on EPIC-Kitchens

2 code implementations2 Aug 2019 Will Price, Dima Damen

We benchmark contemporary action recognition models (TSN, TRN, and TSM) on the recently introduced EPIC-Kitchens dataset and release pretrained models on GitHub (https://github. com/epic-kitchens/action-models) for others to build upon.

Action Classification Action Recognition

Learning Visual Actions Using Multiple Verb-Only Labels

1 code implementation25 Jul 2019 Michael Wray, Dima Damen

We collect multi-verb annotations for three action video datasets and evaluate the verb-only labelling representations for action recognition and cross-modal retrieval (video-to-text and text-to-video).

Action Recognition Cross-Modal Retrieval +1

DDLSTM: Dual-Domain LSTM for Cross-Dataset Action Recognition

no code implementations CVPR 2019 Toby Perrett, Dima Damen

Domain alignment in convolutional networks aims to learn the degree of layer-specific feature alignment beneficial to the joint learning of source and target datasets.

Action Recognition Temporal Action Localization

The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos

1 code implementation CVPR 2019 Hazel Doughty, Walterio Mayol-Cuevas, Dima Damen

In addition to attending to task relevant video parts, our proposed loss jointly trains two attention modules to separately attend to video parts which are indicative of higher (pros) and lower (cons) skill.

CaloriNet: From silhouettes to calorie estimation in private environments

no code implementations21 Jun 2018 Alessandro Masullo, Tilo Burghardt, Dima Damen, Sion Hannuna, Victor Ponce-López, Majid Mirmehdi

We propose a novel deep fusion architecture, CaloriNet, for the online estimation of energy expenditure for free living monitoring in private environments, where RGB data is discarded and replaced by silhouettes.

Semantically Selective Augmentation for Deep Compact Person Re-Identification

no code implementations11 Jun 2018 Víctor Ponce-López, Tilo Burghardt, Sion Hannunna, Dima Damen, Alessandro Masullo, Majid Mirmehdi

We present a deep person re-identification approach that combines semantically selective, deep data augmentation with clustering-based network compression to generate high performance, light and fast inference networks.

Data Augmentation Person Re-Identification +1

Action Completion: A Temporal Model for Moment Detection

1 code implementation17 May 2018 Farnoosh Heidarivincheh, Majid Mirmehdi, Dima Damen

The paper proposes a joint classification-regression recurrent model that predicts completion from a given frame, and then integrates frame-level contributions to detect sequence-level completion moment.

General Classification regression

Towards an Unequivocal Representation of Actions

no code implementations10 May 2018 Michael Wray, Davide Moltisanti, Dima Damen

This work introduces verb-only representations for actions and interactions; the problem of describing similar motions (e. g. 'open door', 'open cupboard'), and distinguish differing ones (e. g. 'open door' vs 'open bottle') using verb-only labels.

Action Recognition Retrieval +1

Detecting the Moment of Completion: Temporal Models for Localising Action Completion

no code implementations6 Oct 2017 Farnoosh Heidarivincheh, Majid Mirmehdi, Dima Damen

Action completion detection is the problem of modelling the action's progression towards localising the moment of completion - when the action's goal is confidently considered achieved.

Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination

no code implementations CVPR 2018 Hazel Doughty, Dima Damen, Walterio Mayol-Cuevas

We present a method for assessing skill from video, applicable to a variety of tasks, ranging from surgery to drawing and rolling pizza dough.

Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video

no code implementations ICCV 2017 Davide Moltisanti, Michael Wray, Walterio Mayol-Cuevas, Dima Damen

Manual annotations of temporal bounds for object interactions (i. e. start and end times) are typical training input to recognition, localization and detection algorithms.

SEMBED: Semantic Embedding of Egocentric Action Videos

no code implementations28 Jul 2016 Michael Wray, Davide Moltisanti, Walterio Mayol-Cuevas, Dima Damen

We present SEMBED, an approach for embedding an egocentric object interaction video in a semantic-visual graph to estimate the probability distribution over its potential semantic labels.

General Classification

Calorie Counter: RGB-Depth Visual Estimation of Energy Expenditure at Home

no code implementations27 Jul 2016 Lili Tao, Tilo Burghardt, Majid Mirmehdi, Dima Damen, Ashley Cooper, Sion Hannuna, Massimo Camplani, Adeline Paiement, Ian Craddock

We present a new framework for vision-based estimation of calorific expenditure from RGB-D data - the first that is validated on physical gas exchange measurements and applied to daily living scenarios.

Multiple Human Tracking in RGB-D Data: A Survey

no code implementations14 Jun 2016 Massimo Camplani, Adeline Paiement, Majid Mirmehdi, Dima Damen, Sion Hannuna, Tilo Burghardt, Lili Tao

Finally, we present a brief comparative evaluation of the performance of those works that have applied their methods to these datasets.

You-Do, I-Learn: Unsupervised Multi-User egocentric Approach Towards Video-Based Guidance

no code implementations16 Oct 2015 Dima Damen, Teesid Leelasawassuk, Walterio Mayol-Cuevas

This paper presents an unsupervised approach towards automatically extracting video-based guidance on object usage, from egocentric video and wearable gaze tracking, collected from multiple users while performing tasks.

Cannot find the paper you are looking for? You can Submit a new open access paper.