Search Results for author: Hazel Doughty

Found 20 papers, 13 papers with code

HD-EPIC: A Highly-Detailed Egocentric Video Dataset

no code implementations6 Feb 2025 Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha, Omar Emara, Sam Pollard, Kranti Parida, Kaiting Liu, Prajwal Gatti, Siddhant Bansal, Kevin Flanagan, Jacob Chalk, Zhifan Zhu, Rhodri Guerrier, Fahd Abdelazim, Bin Zhu, Davide Moltisanti, Michael Wray, Hazel Doughty, Dima Damen

We present a validation dataset of newly-collected kitchen-based egocentric videos, manually annotated with highly detailed and interconnected ground-truth labels covering: recipe steps, fine-grained actions, ingredients with nutritional values, moving objects, and audio annotations.

Action Recognition Nutrition +5

Beyond Coarse-Grained Matching in Video-Text Retrieval

no code implementations16 Oct 2024 Aozhu Chen, Hazel Doughty, Xirong Li, Cees G. M. Snoek

We perform comprehensive experiments using four state-of-the-art models across two standard benchmarks (MSR-VTT and VATEX) and two specially curated datasets enriched with detailed descriptions (VLN-UVO and VLN-OOPS), resulting in a number of novel insights: 1) our analyses show that the current evaluation benchmarks fall short in detecting a model's ability to perceive subtle single-word differences, 2) our fine-grained evaluation highlights the difficulty models face in distinguishing such subtle variations.

Text Retrieval Video-Text Retrieval

LocoMotion: Learning Motion-Focused Video-Language Representations

no code implementations15 Oct 2024 Hazel Doughty, Fida Mohammad Thoker, Cees G. M. Snoek

Furthermore, we propose verb-variation paraphrasing to increase the caption variety and learn the link between primitive motions and high-level verbs.

SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery

2 code implementations26 Aug 2024 Sarah Rastegar, Mohammadreza Salehi, Yuki M. Asano, Hazel Doughty, Cees G. M. Snoek

In this paper, we address Generalized Category Discovery, aiming to simultaneously uncover novel categories and accurately classify known ones.

Contrastive Learning

Low-Resource Vision Challenges for Foundation Models

no code implementations CVPR 2024 Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek

Low-resource settings are well-established in natural language processing, where many languages lack sufficient data for deep learning at scale.

Data Augmentation Transfer Learning

Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery

2 code implementations NeurIPS 2023 Sarah Rastegar, Hazel Doughty, Cees G. M. Snoek

In the quest for unveiling novel categories at test time, we confront the inherent limitations of traditional supervised recognition models that are restricted by a predefined category set.

Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization

2 code implementations ICCV 2023 Fida Mohammad Thoker, Hazel Doughty, Cees Snoek

By simulating different tubelet motions and applying transformations, such as scaling and rotation, we introduce motion patterns beyond what is present in the pretraining data.

Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight

no code implementations5 Dec 2022 Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek

The main causes are the limited availability of labeled dark videos to learn from, as well as the distribution shift towards the lower color contrast at test-time.

Activity Recognition Domain Adaptation +1

Audio-Adaptive Activity Recognition Across Video Domains

1 code implementation CVPR 2022 Yunhua Zhang, Hazel Doughty, Ling Shao, Cees G. M. Snoek

This paper strives for activity recognition under domain shift, for example caused by change of scenery or camera viewpoint.

Activity Recognition Domain Adaptation +1

How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?

1 code implementation27 Mar 2022 Fida Mohammad Thoker, Hazel Doughty, Piyush Bagad, Cees Snoek

Despite the recent success of video self-supervised learning models, there is much still to be understood about their generalization capability.

Self-Supervised Learning Video Understanding

How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs

1 code implementation CVPR 2022 Hazel Doughty, Cees G. M. Snoek

We aim to understand how actions are performed and identify subtle differences, such as 'fold firmly' vs. 'fold gently'.

Video-Adverb Retrieval (Unseen Compositions)

Skeleton-Contrastive 3D Action Representation Learning

1 code implementation8 Aug 2021 Fida Mohammad Thoker, Hazel Doughty, Cees G. M. Snoek

In particular, we propose inter-skeleton contrastive learning, which learns from multiple different input skeleton representations in a cross-contrastive manner.

Action Recognition Contrastive Learning +6

On Semantic Similarity in Video Retrieval

3 code implementations CVPR 2021 Michael Wray, Hazel Doughty, Dima Damen

Current video retrieval efforts all found their evaluation on an instance-based assumption, that only a single caption is relevant to a query video and vice versa.

Retrieval Semantic Similarity +2

WiCV 2020: The Seventh Women In Computer Vision Workshop

no code implementations11 Jan 2021 Hazel Doughty, Nour Karessli, Kathryn Leonard, Boyi Li, Carianne Martinez, Azadeh Mobasher, Arsha Nagrani, Srishti Yadav

It provides a voice to a minority (female) group in computer vision community and focuses on increasingly the visibility of these researchers, both in academia and industry.

The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines

2 code implementations29 Apr 2020 Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

Our dataset features 55 hours of video consisting of 11. 5M frames, which we densely labelled for a total of 39. 6K action segments and 454. 2K object bounding boxes.

Object

Action Modifiers: Learning from Adverbs in Instructional Videos

1 code implementation CVPR 2020 Hazel Doughty, Ivan Laptev, Walterio Mayol-Cuevas, Dima Damen

We present a method to learn a representation for adverbs from instructional videos using weak supervision from the accompanying narrations.

Video-Adverb Retrieval

The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos

1 code implementation CVPR 2019 Hazel Doughty, Walterio Mayol-Cuevas, Dima Damen

In addition to attending to task relevant video parts, our proposed loss jointly trains two attention modules to separately attend to video parts which are indicative of higher (pros) and lower (cons) skill.

Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination

no code implementations CVPR 2018 Hazel Doughty, Dima Damen, Walterio Mayol-Cuevas

We present a method for assessing skill from video, applicable to a variety of tasks, ranging from surgery to drawing and rolling pizza dough.

Cannot find the paper you are looking for? You can Submit a new open access paper.