Search Results for author: Cees G. M. Snoek

Found 128 papers, 58 papers with code

CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation

no code implementations7 Nov 2024 Jie Liu, Pan Zhou, Yingjun Du, Ah-Hwee Tan, Cees G. M. Snoek, Jan-Jakob Sonke, Efstratios Gavves

To solve this issue, we propose Cooperative Plan Optimization (CaPo) to enhance the cooperation efficiency of LLM-based embodied agents.

Large Language Model

Beyond Model Adaptation at Test Time: A Survey

1 code implementation6 Nov 2024 Zehao Xiao, Cees G. M. Snoek

Machine learning algorithms have achieved remarkable success across various disciplines, use cases and applications, under the prevailing assumption that training and test samples are drawn from the same distribution.

Domain Generalization Survey +1

Prompt Diffusion Robustifies Any-Modality Prompt Learning

no code implementations26 Oct 2024 Yingjun Du, Gaowen Liu, Yuzhang Shang, Yuguang Yao, Ramana Kompella, Cees G. M. Snoek

This paper introduces prompt diffusion, which uses a diffusion model to gradually refine the prompts to obtain a customized prompt for each sample.

Computational Efficiency Domain Generalization +1

IPO: Interpretable Prompt Optimization for Vision-Language Models

1 code implementation20 Oct 2024 Yingjun Du, Wenfang Sun, Cees G. M. Snoek

Pre-trained vision-language models like CLIP have remarkably adapted to various downstream tasks.

Specificity

Beyond Coarse-Grained Matching in Video-Text Retrieval

no code implementations16 Oct 2024 Aozhu Chen, Hazel Doughty, Xirong Li, Cees G. M. Snoek

We perform comprehensive experiments using four state-of-the-art models across two standard benchmarks (MSR-VTT and VATEX) and two specially curated datasets enriched with detailed descriptions (VLN-UVO and VLN-OOPS), resulting in a number of novel insights: 1) our analyses show that the current evaluation benchmarks fall short in detecting a model's ability to perceive subtle single-word differences, 2) our fine-grained evaluation highlights the difficulty models face in distinguishing such subtle variations.

Text Retrieval Video-Text Retrieval

LocoMotion: Learning Motion-Focused Video-Language Representations

no code implementations15 Oct 2024 Hazel Doughty, Fida Mohammad Thoker, Cees G. M. Snoek

Furthermore, we propose verb-variation paraphrasing to increase the caption variety and learn the link between primitive motions and high-level verbs.

TULIP: Token-length Upgraded CLIP

no code implementations13 Oct 2024 Ivona Najdenkoska, Mohammad Mahdi Derakhshani, Yuki M. Asano, Nanne van Noord, Marcel Worring, Cees G. M. Snoek

By effectively encoding captions longer than the default 77 tokens, our model outperforms baselines on cross-modal tasks such as retrieval and text-to-image generation.

Position Text-to-Image Generation

TVBench: Redesigning Video-Language Evaluation

no code implementations10 Oct 2024 Daniel Cores, Michael Dorkenwald, Manuel Mucientes, Cees G. M. Snoek, Yuki M. Asano

Large language models have demonstrated impressive performance when integrated with vision models even enabling video understanding.

Multiple-choice Open-Ended Question Answering +3

SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery

2 code implementations26 Aug 2024 Sarah Rastegar, Mohammadreza Salehi, Yuki M. Asano, Hazel Doughty, Cees G. M. Snoek

In this paper, we address Generalized Category Discovery, aiming to simultaneously uncover novel categories and accurately classify known ones.

Contrastive Learning

SIGMA:Sinkhorn-Guided Masked Video Modeling

no code implementations22 Jul 2024 Mohammadreza Salehi, Michael Dorkenwald, Fida Mohammad Thoker, Efstratios Gavves, Cees G. M. Snoek, Yuki M. Asano

To tackle this, we present Sinkhorn-guided Masked Video Modelling (SIGMA), a novel video pretraining method that jointly learns the video model in addition to a target feature space using a projection network.

Training-Free Semantic Segmentation via LLM-Supervision

no code implementations31 Mar 2024 Wenfang Sun, Yingjun Du, Gaowen Liu, Ramana Kompella, Cees G. M. Snoek

Additionally, we propose an assembly that merges the segmentation maps from the various subclass descriptors to ensure a more comprehensive representation of the different aspects in the test images.

Language Modelling Large Language Model +4

Graph Neural Networks for Learning Equivariant Representations of Neural Networks

1 code implementation18 Mar 2024 Miltiadis Kofinas, Boris Knyazev, Yan Zhang, Yunlu Chen, Gertjan J. Burghouts, Efstratios Gavves, Cees G. M. Snoek, David W. Zhang

Neural networks that process the parameters of other neural networks find applications in domains as diverse as classifying implicit neural representations, generating neural network weights, and predicting generalization errors.

Any-Shift Prompting for Generalization over Distributions

no code implementations CVPR 2024 Zehao Xiao, Jiayi Shen, Mohammad Mahdi Derakhshani, Shengcai Liao, Cees G. M. Snoek

To effectively encode the distribution information and their relationships, we further introduce a transformer inference network with a pseudo-shift training mechanism.

Language Modelling

PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

no code implementations CVPR 2024 Michael Dorkenwald, Nimrod Barazani, Cees G. M. Snoek, Yuki M. Asano

Vision-Language Models (VLMs), such as Flamingo and GPT-4V, have shown immense potential by integrating large language models with vision systems.

Low-Resource Vision Challenges for Foundation Models

no code implementations CVPR 2024 Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek

Low-resource settings are well-established in natural language processing, where many languages lack sufficient data for deep learning at scale.

Data Augmentation Transfer Learning

Latent Space Editing in Transformer-Based Flow Matching

no code implementations17 Dec 2023 Vincent Tao Hu, David W Zhang, Pascal Mettes, Meng Tang, Deli Zhao, Cees G. M. Snoek

Flow Matching is an emerging generative modeling technique that offers the advantage of simple and efficient training.

Guided Diffusion from Self-Supervised Diffusion Features

no code implementations14 Dec 2023 Vincent Tao Hu, Yunlu Chen, Mathilde Caron, Yuki M. Asano, Cees G. M. Snoek, Bjorn Ommer

However, recent studies have revealed that the feature representation derived from diffusion model itself is discriminative for numerous downstream tasks as well, which prompts us to propose a framework to extract guidance from, and specifically for, diffusion models.

Self-Supervised Learning

Motion Flow Matching for Human Motion Synthesis and Editing

no code implementations14 Dec 2023 Vincent Tao Hu, Wenzhe Yin, Pingchuan Ma, Yunlu Chen, Basura Fernando, Yuki M Asano, Efstratios Gavves, Pascal Mettes, Bjorn Ommer, Cees G. M. Snoek

In this paper, we propose \emph{Motion Flow Matching}, a novel generative model designed for human motion generation featuring efficient sampling and effectiveness in motion editing applications.

Motion Generation Motion Interpolation +2

Revisiting Proposal-based Object Detection

no code implementations30 Nov 2023 Aritra Bhowmik, Martin R. Oswald, Pascal Mettes, Cees G. M. Snoek

For proposal regression, we solve a simpler problem where we regress to the area of intersection between proposal and ground truth.

Instance Segmentation Object +4

Unlocking Spatial Comprehension in Text-to-Image Diffusion Models

no code implementations28 Nov 2023 Mohammad Mahdi Derakhshani, Menglin Xia, Harkirat Behl, Cees G. M. Snoek, Victor Rühle

We propose CompFuser, an image generation pipeline that enhances spatial comprehension and attribute assignment in text-to-image generative models.

Attribute Image Generation +3

Query by Activity Video in the Wild

1 code implementation23 Nov 2023 Tao Hu, William Thong, Pascal Mettes, Cees G. M. Snoek

In this paper, we propose a visual-semantic embedding network that explicitly deals with the imbalanced scenario for activity retrieval.

Retrieval

Data Augmentations in Deep Weight Spaces

no code implementations15 Nov 2023 Aviv Shamsian, David W. Zhang, Aviv Navon, Yan Zhang, Miltiadis Kofinas, Idan Achituve, Riccardo Valperga, Gertjan J. Burghouts, Efstratios Gavves, Cees G. M. Snoek, Ethan Fetaya, Gal Chechik, Haggai Maron

Learning in weight spaces, where neural networks process the weights of other deep neural networks, has emerged as a promising research direction with applications in various fields, from analyzing and editing neural fields and implicit neural representations, to network pruning and quantization.

Data Augmentation Network Pruning +1

Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery

2 code implementations NeurIPS 2023 Sarah Rastegar, Hazel Doughty, Cees G. M. Snoek

In the quest for unveiling novel categories at test time, we confront the inherent limitations of traditional supervised recognition models that are restricted by a predefined category set.

Self-Supervised Open-Ended Classification with Small Visual Language Models

no code implementations30 Sep 2023 Mohammad Mahdi Derakhshani, Ivona Najdenkoska, Cees G. M. Snoek, Marcel Worring, Yuki M. Asano

We present Self-Context Adaptation (SeCAt), a self-supervised approach that unlocks few-shot abilities for open-ended classification with small visual language models.

Few-Shot Learning Image Captioning

Probabilistic Test-Time Generalization by Variational Neighbor-Labeling

no code implementations8 Jul 2023 Sameer Ambekar, Zehao Xiao, Jiayi Shen, XianTong Zhen, Cees G. M. Snoek

We formulate the generalization at test time as a variational inference problem, by modeling pseudo labels as distributions, to consider the uncertainty during generalization and alleviate the misleading signal of inaccurate pseudo labels.

Domain Generalization Variational Inference

Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph Generation

1 code implementation16 Jun 2023 Shuo Chen, Yingjun Du, Pascal Mettes, Cees G. M. Snoek

This paper investigates the problem of scene graph generation in videos with the aim of capturing semantic relations between subjects and objects in the form of $\langle$subject, predicate, object$\rangle$ triplets.

Graph Generation Meta-Learning +1

EMO: Episodic Memory Optimization for Few-Shot Meta-Learning

no code implementations8 Jun 2023 Yingjun Du, Jiayi Shen, XianTong Zhen, Cees G. M. Snoek

By learning to retain and recall the learning process of past training tasks, EMO nudges parameter updates in the right direction, even when the gradients provided by a limited number of examples are uninformative.

Few-Shot Learning

Focus for Free in Density-Based Counting

1 code implementation8 Jun 2023 Zenglin Shi, Pascal Mettes, Cees G. M. Snoek

Where density-based counting methods typically use the point annotations only to create Gaussian-density maps, which act as the supervision signal, the starting point of this work is that point annotations have counting potential beyond density map generation.

R-MAE: Regions Meet Masked Autoencoders

1 code implementation8 Jun 2023 Duy-Kien Nguyen, Vaibhav Aggarwal, Yanghao Li, Martin R. Oswald, Alexander Kirillov, Cees G. M. Snoek, Xinlei Chen

In this work, we explore regions as a potential visual analogue of words for self-supervised image representation learning.

Contrastive Learning Interactive Segmentation +4

MetaModulation: Learning Variational Feature Hierarchies for Few-Shot Learning with Fewer Tasks

1 code implementation17 May 2023 Wenfang Sun, Yingjun Du, XianTong Zhen, Fan Wang, Ling Wang, Cees G. M. Snoek

To account for the uncertainty caused by the limited training tasks, we propose a variational MetaModulation where the modulation parameters are treated as latent variables.

Diversity Few-Shot Learning

Self-Ordering Point Clouds

no code implementations ICCV 2023 Pengwan Yang, Cees G. M. Snoek, Yuki M. Asano

In this paper we address the task of finding representative subsets of points in a 3D point cloud by means of a point-wise ordering.

Test of Time: Instilling Video-Language Models with a Sense of Time

1 code implementation CVPR 2023 Piyush Bagad, Makarand Tapaswi, Cees G. M. Snoek

Our work serves as a first step towards probing and instilling a sense of time in existing video-language models without the need for data and compute-intense training from scratch.

Ranked #3 on Video-Text Retrieval on Test-of-Time (using extra training data)

Video-Text Retrieval Video Understanding

Detecting Objects with Context-Likelihood Graphs and Graph Refinement

no code implementations ICCV 2023 Aritra Bhowmik, Yu Wang, Nora Baka, Martin R. Oswald, Cees G. M. Snoek

Contrary to existing methods, which learn objects and relations separately, our key idea is to learn the object-relation distribution jointly.

Object object-detection +2

Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight

no code implementations5 Dec 2022 Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek

The main causes are the limited availability of labeled dark videos to learn from, as well as the distribution shift towards the lower color contrast at test-time.

Activity Recognition Domain Adaptation +1

Variational Model Perturbation for Source-Free Domain Adaptation

1 code implementation19 Oct 2022 Mengmeng Jing, XianTong Zhen, Jingjing Li, Cees G. M. Snoek

Our model perturbation provides a new probabilistic way for domain adaptation which enables efficient adaptation to target domains while maximally preserving knowledge in source models.

Bayesian Inference Source-Free Domain Adaptation

Self-Guided Diffusion Models

1 code implementation CVPR 2023 Vincent Tao Hu, David W Zhang, Yuki M. Asano, Gertjan J. Burghouts, Cees G. M. Snoek

Diffusion models have demonstrated remarkable progress in image generation quality, especially when guidance is used to control the generative process.

Image Generation

Fake It Till You Make It: Towards Accurate Near-Distribution Novelty Detection

1 code implementation28 May 2022 Hossein Mirzaei, Mohammadreza Salehi, Sajjad Shahabi, Efstratios Gavves, Cees G. M. Snoek, Mohammad Sabokrou, Mohammad Hossein Rohban

Effectiveness of our method for both the near-distribution and standard novelty detection is assessed through extensive experiments on datasets in diverse applications such as medical images, object classification, and quality control.

Ranked #3 on Anomaly Detection on One-class CIFAR-10 (using extra training data)

Anomaly Detection Novelty Detection

Less than Few: Self-Shot Video Instance Segmentation

no code implementations19 Apr 2022 Pengwan Yang, Yuki M. Asano, Pascal Mettes, Cees G. M. Snoek

The goal of this paper is to bypass the need for labelled examples in few-shot video understanding at run time.

Few-Shot Learning Instance Segmentation +5

LifeLonger: A Benchmark for Continual Disease Classification

1 code implementation12 Apr 2022 Mohammad Mahdi Derakhshani, Ivona Najdenkoska, Tom van Sonsbeek, XianTong Zhen, Dwarikanath Mahapatra, Marcel Worring, Cees G. M. Snoek

Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch, while cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.

Classification class-incremental learning +2

Audio-Adaptive Activity Recognition Across Video Domains

1 code implementation CVPR 2022 Yunhua Zhang, Hazel Doughty, Ling Shao, Cees G. M. Snoek

This paper strives for activity recognition under domain shift, for example caused by change of scenery or camera viewpoint.

Activity Recognition Domain Adaptation +1

How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs

1 code implementation CVPR 2022 Hazel Doughty, Cees G. M. Snoek

We aim to understand how actions are performed and identify subtle differences, such as 'fold firmly' vs. 'fold gently'.

Video-Adverb Retrieval (Unseen Compositions)

Learning to Generalize across Domains on Single Test Samples

1 code implementation ICLR 2022 Zehao Xiao, XianTong Zhen, Ling Shao, Cees G. M. Snoek

We leverage a meta-learning paradigm to learn our model to acquire the ability of adaptation with single samples at training time so as to further adapt itself to each single test sample at test time.

Bayesian Inference Domain Generalization +1

Generative Kernel Continual learning

no code implementations26 Dec 2021 Mohammad Mahdi Derakhshani, XianTong Zhen, Ling Shao, Cees G. M. Snoek

Kernel continual learning by \citet{derakhshani2021kernel} has recently emerged as a strong continual learner due to its non-parametric ability to tackle task interference and catastrophic forgetting.

Continual Learning

Hierarchical Variational Memory for Few-shot Learning Across Domains

1 code implementation ICLR 2022 Yingjun Du, XianTong Zhen, Ling Shao, Cees G. M. Snoek

To explore and exploit the importance of different semantic levels, we further propose to learn the weights associated with the prototype at each level in a data-driven way, which enables the model to adaptively choose the most generalizable features.

Few-Shot Learning Variational Inference

BoxeR: Box-Attention for 2D and 3D Transformers

1 code implementation CVPR 2022 Duy-Kien Nguyen, Jihong Ju, Olaf Booij, Martin R. Oswald, Cees G. M. Snoek

Specifically, we present BoxeR, short for Box Transformer, which attends to a set of boxes by predicting their transformation from a reference window on an input feature map.

3D Object Detection Instance Segmentation +2

Feature and Label Embedding Spaces Matter in Addressing Image Classifier Bias

1 code implementation27 Oct 2021 William Thong, Cees G. M. Snoek

This paper strives to address image classifier bias, with a focus on both feature and label embedding spaces.

Attribute Fairness

Diagnosing Errors in Video Relation Detectors

1 code implementation25 Oct 2021 Shuo Chen, Pascal Mettes, Cees G. M. Snoek

Video relation detection forms a new and challenging problem in computer vision, where subjects and objects need to be localized spatio-temporally and a predicate label needs to be assigned if and only if there is an interaction between the two.

Action Localization Object +3

Social Fabric: Tubelet Compositions for Video Relation Detection

1 code implementation ICCV 2021 Shuo Chen, Zenglin Shi, Pascal Mettes, Cees G. M. Snoek

We also propose Social Fabric: an encoding that represents a pair of object tubelets as a composition of interaction primitives.

Object Relation +3

Skeleton-Contrastive 3D Action Representation Learning

1 code implementation8 Aug 2021 Fida Mohammad Thoker, Hazel Doughty, Cees G. M. Snoek

In particular, we propose inter-skeleton contrastive learning, which learns from multiple different input skeleton representations in a cross-contrastive manner.

Action Recognition Contrastive Learning +6

Feature-Supervised Action Modality Transfer

no code implementations6 Aug 2021 Fida Mohammad Thoker, Cees G. M. Snoek

This paper strives for action recognition and detection in video modalities like RGB, depth maps or 3D-skeleton sequences when only limited modality-specific labeled examples are available.

Action Recognition Optical Flow Estimation +1

Kernel Continual Learning

1 code implementation12 Jul 2021 Mohammad Mahdi Derakhshani, XianTong Zhen, Ling Shao, Cees G. M. Snoek

We further introduce variational random features to learn a data-driven kernel for each task.

Continual Learning Variational Inference

On Measuring and Controlling the Spectral Bias of the Deep Image Prior

1 code implementation2 Jul 2021 Zenglin Shi, Pascal Mettes, Subhransu Maji, Cees G. M. Snoek

The deep image prior showed that a randomly initialized network with a suitable architecture can be trained to solve inverse imaging problems by simply optimizing it's parameters to reconstruct a single degraded image.

Denoising Super-Resolution

Pruning Edges and Gradients to Learn Hypergraphs from Larger Sets

1 code implementation26 Jun 2021 David W. Zhang, Gertjan J. Burghouts, Cees G. M. Snoek

We address two common scaling problems encountered in set-to-hypergraph tasks that limit the size of the input set: the exponentially growing number of hyperedges and the run-time complexity, both leading to higher memory requirements.

Combinatorial Optimization

Meta-Learning with Variational Semantic Memory for Word Sense Disambiguation

no code implementations ACL 2021 Yingjun Du, Nithin Holla, XianTong Zhen, Cees G. M. Snoek, Ekaterina Shutova

A critical challenge faced by supervised word sense disambiguation (WSD) is the lack of large annotated datasets with sufficient coverage of words in their diversity of senses.

Diversity Meta-Learning +2

Unsharp Mask Guided Filtering

1 code implementation2 Jun 2021 Zenglin Shi, Yunlu Chen, Efstratios Gavves, Pascal Mettes, Cees G. M. Snoek

The state-of-the-art leverages deep networks to estimate the two core coefficients of the guided filter.

Denoising

Attentional Prototype Inference for Few-Shot Segmentation

1 code implementation14 May 2021 Haoliang Sun, Xiankai Lu, Haochen Wang, Yilong Yin, XianTong Zhen, Cees G. M. Snoek, Ling Shao

We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution.

Bayesian Inference Few-Shot Semantic Segmentation +2

A Bit More Bayesian: Domain-Invariant Learning with Uncertainty

1 code implementation9 May 2021 Zehao Xiao, Jiayi Shen, XianTong Zhen, Ling Shao, Cees G. M. Snoek

Domain generalization is challenging due to the domain shift and the uncertainty caused by the inaccessibility of target domain data.

Bayesian Inference Domain Generalization

MetaKernel: Learning Variational Random Features with Limited Labels

1 code implementation8 May 2021 Yingjun Du, Haoliang Sun, XianTong Zhen, Jun Xu, Yilong Yin, Ling Shao, Cees G. M. Snoek

Specifically, we propose learning variational random features in a data-driven manner to obtain task-specific kernels by leveraging the shared knowledge provided by related tasks in a meta-learning setting.

Few-Shot Image Classification Few-Shot Learning +1

Motion-Augmented Self-Training for Video Recognition at Smaller Scale

no code implementations ICCV 2021 Kirill Gavrilyuk, Mihir Jain, Ilia Karmanov, Cees G. M. Snoek

With the motion model we generate pseudo-labels for a large unlabeled video collection, which enables us to transfer knowledge by learning to predict these pseudo-labels with an appearance model.

Action Recognition Optical Flow Estimation +3

Safe Fakes: Evaluating Face Anonymizers for Face Detectors

no code implementations23 Apr 2021 Sander R. Klomp, Matthew van Rijn, Rob G. J. Wijnhoven, Cees G. M. Snoek, Peter H. N. de With

Our experiments investigate the suitability of anonymization methods for maintaining face detector performance, the effect of detectors overtraining on anonymization artefacts, dataset size for training an anonymizer, and the effect of training time of anonymization GANs.

Face Detection Generative Adversarial Network +1

Object Priors for Classifying and Localizing Unseen Actions

1 code implementation10 Apr 2021 Pascal Mettes, William Thong, Cees G. M. Snoek

This work strives for the classification and localization of human actions in videos, without the need for any labeled video training examples.

Action Classification Action Localization +5

LiftPool: Bidirectional ConvNet Pooling

no code implementations ICLR 2021 Jiaojiao Zhao, Cees G. M. Snoek

Pooling is a critical operation in convolutional neural networks for increasing receptive fields and improving robustness to input variations.

Image Classification Image-to-Image Translation +3

Repetitive Activity Counting by Sight and Sound

1 code implementation CVPR 2021 Yunhua Zhang, Ling Shao, Cees G. M. Snoek

We also introduce a variant of this dataset for repetition counting under challenging vision conditions.

Variational Invariant Learning for Bayesian Domain Generalization

no code implementations1 Jan 2021 Zehao Xiao, Jiayi Shen, XianTong Zhen, Ling Shao, Cees G. M. Snoek

In the probabilistic modeling framework, we introduce a domain-invariant principle to explore invariance across domains in a unified way.

Domain Generalization

Learning to Learn Variational Semantic Memory

1 code implementation NeurIPS 2020 XianTong Zhen, Yingjun Du, Huan Xiong, Qiang Qiu, Cees G. M. Snoek, Ling Shao

The variational semantic memory accrues and stores semantic information for the probabilistic inference of class prototypes in a hierarchical Bayesian framework.

Few-Shot Learning General Knowledge +1

Bias-Awareness for Zero-Shot Learning the Seen and Unseen

1 code implementation25 Aug 2020 William Thong, Cees G. M. Snoek

We propose a bias-aware learner to map inputs to a semantic embedding space for generalized zero-shot learning.

Generalized Zero-Shot Learning

Localizing the Common Action Among a Few Videos

1 code implementation ECCV 2020 Pengwan Yang, Vincent Tao Hu, Pascal Mettes, Cees G. M. Snoek

The start and end of an action in a long untrimmed video is determined based on just a hand-full of trimmed video examples containing the same action, without knowing their common class label.

Action Localization

Open Cross-Domain Visual Search

2 code implementations19 Nov 2019 William Thong, Pascal Mettes, Cees G. M. Snoek

In this paper, we make the step towards an open setting where multiple visual domains are available.

Domain Adaptation

Go with the Flow: Perception-refined Physics Simulation

no code implementations17 Oct 2019 Tom F. H. Runia, Kirill Gavrilyuk, Cees G. M. Snoek, Arnold W. M. Smeulders

Nevertheless, inferring specifics from visual observations is challenging due to the high number of causally underlying physical parameters -- including material properties and external forces.

Spherical Regression: Learning Viewpoints, Surface Normals and 3D Rotations on n-Spheres

2 code implementations CVPR 2019 Shuai Liao, Efstratios Gavves, Cees G. M. Snoek

We observe many continuous output problems in computer vision are naturally contained in closed geometrical manifolds, like the Euler angles in viewpoint estimation or the normals in surface normal estimation.

3D Rotation Estimation regression +3

Dance with Flow: Two-in-One Stream Action Detection

1 code implementation CVPR 2019 Jiaojiao Zhao, Cees G. M. Snoek

With only half the computation and parameters of the state-of-the-art two-stream methods, our two-in-one stream still achieves impressive results on UCF101-24, UCFSports and J-HMDB.

 Ranked #1 on Action Detection on UCF Sports (Video-mAP 0.5 metric)

Action Detection Optical Flow Estimation +1

Counting with Focus for Free

1 code implementation ICCV 2019 Zenglin Shi, Pascal Mettes, Cees G. M. Snoek

To assist both the density estimation and the focus from segmentation, we also introduce an improved kernel size estimator for the point annotations.

Density Estimation

Anomaly Locality in Video Surveillance

no code implementations29 Jan 2019 Federico Landi, Cees G. M. Snoek, Rita Cucchiara

This paper strives for the detection of real-world anomalies such as burglaries and assaults in surveillance videos.

Anomaly Detection

Hyperspherical Prototype Networks

1 code implementation NeurIPS 2019 Pascal Mettes, Elise van der Pol, Cees G. M. Snoek

This paper introduces hyperspherical prototype networks, which unify classification and regression with prototypes on hyperspherical output spaces.

Classification General Classification +1

Pixelated Semantic Colorization

no code implementations27 Jan 2019 Jiaojiao Zhao, Jungong Han, Ling Shao, Cees G. M. Snoek

We propose two ways to incorporate object semantics into the colorization model: through a pixelated semantic embedding and a pixelated semantic generator.

Colorization Image Colorization +2

Pixel-level Semantics Guided Image Colorization

no code implementations5 Aug 2018 Jiaojiao Zhao, Li Liu, Cees G. M. Snoek, Jungong Han, Ling Shao

While many image colorization algorithms have recently shown the capability of producing plausible color versions from gray-scale photographs, they still suffer from the problems of context confusion and edge color bleeding.

Colorization Image Colorization +2

Video Time: Properties, Encoders and Evaluation

no code implementations18 Jul 2018 Amir Ghodrati, Efstratios Gavves, Cees G. M. Snoek

Time-aware encoding of frame sequences in a video is a fundamental problem in video understanding.

Video Understanding

Spatio-Temporal Instance Learning: Action Tubes from Class Supervision

no code implementations8 Jul 2018 Pascal Mettes, Cees G. M. Snoek

Rather than disconnecting the spatio-temporal learning from the training, we propose Spatio-Temporal Instance Learning, which enables action localization directly from box proposals in video frames.

Multiple Instance Learning Spatio-Temporal Action Localization +1

Repetition Estimation

1 code implementation18 Jun 2018 Tom F. H. Runia, Cees G. M. Snoek, Arnold W. M. Smeulders

Estimating visual repetition from realistic video is challenging as periodic motion is rarely perfectly static and stationary.

Pointly-Supervised Action Localization

no code implementations29 May 2018 Pascal Mettes, Cees G. M. Snoek

Experimental evaluation on three action localization datasets shows our pointly-supervised approach (i) is as effective as traditional box-supervision at a fraction of the annotation cost, (ii) is robust to sparse and noisy point annotations, (iii) benefits from pseudo-points during inference, and (iv) outperforms recent weakly-supervised alternatives.

Action Localization Multiple Instance Learning +1

Real-World Repetition Estimation by Div, Grad and Curl

no code implementations CVPR 2018 Tom F. H. Runia, Cees G. M. Snoek, Arnold W. M. Smeulders

We consider the problem of estimating repetition in video, such as performing push-ups, cutting a melon or playing violin.

The New Modality: Emoji Challenges in Prediction, Anticipation, and Retrieval

no code implementations30 Jan 2018 Spencer Cappallo, Stacey Svetlichnaya, Pierre Garrigues, Thomas Mensink, Cees G. M. Snoek

Over the past decade, emoji have emerged as a new and widespread form of digital communication, spanning diverse social networks and spoken languages.

Retrieval

Predicting Visual Features from Text for Image and Video Caption Retrieval

1 code implementation5 Sep 2017 Jianfeng Dong, Xirong Li, Cees G. M. Snoek

This paper strives to find amidst a set of sentences the one best describing the content of a given image or video.

Retrieval Sentence +1

Localizing Actions from Video Labels and Pseudo-Annotations

no code implementations28 Jul 2017 Pascal Mettes, Cees G. M. Snoek, Shih-Fu Chang

The goal of this paper is to determine the spatio-temporal location of actions in video.

Action Localization

Searching Scenes by Abstracting Things

no code implementations6 Oct 2016 Svetlana Kordumova, Jan C. van Gemert, Cees G. M. Snoek, Arnold W. M. Smeulders

Second, we propose translating the things syntax in linguistic abstract statements and study their descriptive effect to retrieve scenes.

Attribute Descriptive +1

Tubelets: Unsupervised action proposals from spatiotemporal super-voxels

no code implementations7 Jul 2016 Mihir Jain, Jan van Gemert, Hervé Jégou, Patrick Bouthemy, Cees G. M. Snoek

First, inspired by selective search for object proposals, we introduce an approach to generate action proposals from spatiotemporal super-voxels in an unsupervised manner, we call them Tubelets.

Action Localization

Spot On: Action Localization from Pointly-Supervised Proposals

no code implementations26 Apr 2016 Pascal Mettes, Jan C. van Gemert, Cees G. M. Snoek

Rather than annotating boxes, we propose to annotate actions in video with points on a sparse subset of frames only.

Action Localization Multiple Instance Learning +1

Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction

no code implementations23 Apr 2016 Jianfeng Dong, Xirong Li, Cees G. M. Snoek

This paper strives to find the sentence best describing the content of an image or video.

Sentence

The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection

no code implementations23 Feb 2016 Pascal Mettes, Dennis C. Koelma, Cees G. M. Snoek

To deal with the problems of over-specific classes and classes with few images, we introduce a bottom-up and top-down approach for reorganization of the ImageNet hierarchy based on all its 21, 814 classes and more than 14 million images.

Event Detection Object Recognition

VideoStory Embeddings Recognize Events when Examples are Scarce

no code implementations8 Nov 2015 Amirhossein Habibian, Thomas Mensink, Cees G. M. Snoek

In our proposed embedding, which we call VideoStory, the correlations between the terms are utilized to learn a more effective representation by optimizing a joint objective balancing descriptiveness and predictability. We show how learning the VideoStory using a multimodal predictability loss, including appearance, motion and audio features, results in a better predictable representation.

Attribute Event Detection

TagBook: A Semantic Video Representation without Supervision for Event Detection

no code implementations10 Oct 2015 Masoud Mazloom, Xirong Li, Cees G. M. Snoek

We consider the problem of event detection in video for scenarios where only few, or even zero examples are available for training.

Event Detection Image Retrieval +2

Active Transfer Learning with Zero-Shot Priors: Reusing Past Datasets for Future Tasks

no code implementations ICCV 2015 Efstratios Gavves, Thomas Mensink, Tatiana Tommasi, Cees G. M. Snoek, Tinne Tuytelaars

How can we reuse existing knowledge, in the form of available datasets, when solving a new and apparently unrelated target task from a set of unlabeled data?

Active Learning General Classification +2

Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval

1 code implementation28 Mar 2015 Xirong Li, Tiberio Uricchio, Lamberto Ballan, Marco Bertini, Cees G. M. Snoek, Alberto del Bimbo

Where previous reviews on content-based image retrieval emphasize on what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image.

Content-Based Image Retrieval Retrieval +2

Fisher and VLAD with FLAIR

no code implementations CVPR 2014 Koen E. A. van de Sande, Cees G. M. Snoek, Arnold W. M. Smeulders

Finally, by multiple codeword assignments, we achieve exact and approximate Fisher vectors with FLAIR.

Action Localization with Tubelets from Motion

no code implementations CVPR 2014 Mihir Jain, Jan van Gemert, Herve Jegou, Patrick Bouthemy, Cees G. M. Snoek

Our approach significantly outperforms the state-of-the-art on both datasets, while restricting the search of actions to a fraction of possible bounding box sequences.

Action Localization

COSTA: Co-Occurrence Statistics for Zero-Shot Classification

no code implementations CVPR 2014 Thomas Mensink, Efstratios Gavves, Cees G. M. Snoek

In this paper we aim for zero-shot classification, that is visual recognition of an unseen class by using knowledge transfer from known classes.

Classification Few-Shot Learning +3

Cannot find the paper you are looking for? You can Submit a new open access paper.