Search Results for author: Ivan Laptev

Found 60 papers, 31 papers with code

Learning Actionness via Long-range Temporal Order Verification

no code implementations ECCV 2020 Dimitri Zhukov, Jean-Baptiste Alayrac, Ivan Laptev, Josef Sivic

The annotation is particularly difficult for temporal action localization where large parts of the video present no action, or background.

Action Recognition

Reconstructing and grounding narrated instructional videos in 3D

no code implementations9 Sep 2021 Dimitri Zhukov, Ignacio Rocco, Ivan Laptev, Josef Sivic, Johannes L. Schönberger, Bugra Tekin, Marc Pollefeys

Contrary to the standard scenario of instance-level 3D reconstruction, where identical objects or scenes are present in all views, objects in different instructional videos may have large appearance variations given varying conditions and versions of the same product.

3D Reconstruction

Airbert: In-domain Pretraining for Vision-and-Language Navigation

1 code implementation20 Aug 2021 Pierre-Louis Guhur, Makarand Tapaswi, ShiZhe Chen, Ivan Laptev, Cordelia Schmid

Given the scarcity of domain-specific training data and the high diversity of image and language inputs, the generalization of VLN agents to unseen environments remains challenging.

Vision and Language Navigation

Goal-Conditioned Reinforcement Learning with Imagined Subgoals

no code implementations1 Jul 2021 Elliot Chane-Sane, Cordelia Schmid, Ivan Laptev

Goal-conditioned reinforcement learning endows an agent with a large variety of skills, but it often struggles to solve tasks that require more temporally extended reasoning.

XCiT: Cross-Covariance Image Transformers

4 code implementations17 Jun 2021 Alaaeldin El-Nouby, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou

We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries.

Instance Segmentation Object Detection +2

Training Vision Transformers for Image Retrieval

no code implementations10 Feb 2021 Alaaeldin El-Nouby, Natalia Neverova, Ivan Laptev, Hervé Jégou

Transformers have shown outstanding results for natural language understanding and, more recently, for image classification.

Image Classification Image Retrieval +2

Just Ask: Learning to Answer Questions from Millions of Narrated Videos

1 code implementation1 Dec 2020 Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid

In this work, we propose to avoid manual annotation and generate a large-scale training dataset for video question answering making use of automatic cross-modal supervision.

 Ranked #1 on Visual Question Answering on MSVD-QA (using extra training data)

Question Answering Question Generation +3

Learning Object Manipulation Skills via Approximate State Estimation from Real Videos

1 code implementation13 Nov 2020 Vladimír Petrík, Makarand Tapaswi, Ivan Laptev, Josef Sivic

We evaluate our method on simple single- and two-object actions from the Something-Something dataset.

Learning Obstacle Representations for Neural Motion Planning

1 code implementation25 Aug 2020 Robin Strudel, Ricardo Garcia, Justin Carpentier, Jean-Paul Laumond, Ivan Laptev, Cordelia Schmid

Motion planning and obstacle avoidance is a key challenge in robotics applications.

Robotics

RareAct: A video dataset of unusual interactions

1 code implementation3 Aug 2020 Antoine Miech, Jean-Baptiste Alayrac, Ivan Laptev, Josef Sivic, Andrew Zisserman

This paper introduces a manually annotated video dataset of unusual actions, namely RareAct, including actions such as "blend phone", "cut keyboard" and "microwave shoes".

Action Recognition

Occlusion resistant learning of intuitive physics from videos

no code implementations30 Apr 2020 Ronan Riochet, Josef Sivic, Ivan Laptev, Emmanuel Dupoux

In this work we propose a probabilistic formulation of learning intuitive physics in 3D scenes with significant inter-object occlusions.

Learning visual policies for building 3D shape categories

no code implementations15 Apr 2020 Alexander Pashevich, Igor Kalevatykh, Ivan Laptev, Cordelia Schmid

We then show the success of our visual policies for building arches from different primitives.

Learning Interactions and Relationships between Movie Characters

1 code implementation CVPR 2020 Anna Kukleva, Makarand Tapaswi, Ivan Laptev

Localizing the pair of interacting characters in video is a time-consuming process, instead, we train our model to learn from clip-level weak labels.

Action Modifiers: Learning from Adverbs in Instructional Videos

1 code implementation CVPR 2020 Hazel Doughty, Ivan Laptev, Walterio Mayol-Cuevas, Dima Damen

We present a method to learn a representation for adverbs from instructional videos using weak supervision from the accompanying narrations.

Synthetic Humans for Action Recognition from Unseen Viewpoints

1 code implementation9 Dec 2019 Gül Varol, Ivan Laptev, Cordelia Schmid, Andrew Zisserman

Although synthetic training data has been shown to be beneficial for tasks such as human pose estimation, its use for RGB human action recognition is relatively unexplored.

Action Classification Action Recognition +1

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips

3 code implementations ICCV 2019 Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, Josef Sivic

In this work, we propose instead to learn such embeddings from video data with readily available natural language annotations in the form of automatically transcribed narrations.

Action Localization Video Retrieval

Monte-Carlo Tree Search for Efficient Visually Guided Rearrangement Planning

2 code implementations23 Apr 2019 Yann Labbé, Sergey Zagoruyko, Igor Kalevatykh, Ivan Laptev, Justin Carpentier, Mathieu Aubry, Josef Sivic

We address the problem of visually guided rearrangement planning with many movable objects, i. e., finding a sequence of actions to move a set of objects from an initial arrangement to a desired one, while relying on visual inputs coming from an RGB camera.

Deep Metric Learning Beyond Binary Supervision

1 code implementation CVPR 2019 Sungyeon Kim, Minkyo Seo, Ivan Laptev, Minsu Cho, Suha Kwak

Metric Learning for visual similarity has mostly adopted binary supervision indicating whether a pair of images are of the same class or not.

Image Captioning Image Retrieval +3

Estimating 3D Motion and Forces of Person-Object Interactions from Monocular Video

1 code implementation CVPR 2019 Zongmian Li, Jiri Sedlar, Justin Carpentier, Ivan Laptev, Nicolas Mansard, Josef Sivic

First, we introduce an approach to jointly estimate the motion and the actuation forces of the person on the manipulated object by modeling contacts and the dynamics of their interactions.

Cross-task weakly supervised learning from instructional videos

2 code implementations CVPR 2019 Dimitri Zhukov, Jean-Baptiste Alayrac, Ramazan Gokberk Cinbis, David Fouhey, Ivan Laptev, Josef Sivic

In this paper we investigate learning visual models for the steps of ordinary tasks using weak supervision via instructional narrations and an ordered list of steps instead of strong supervision via temporal annotations.

Learning to Augment Synthetic Images for Sim2Real Policy Transfer

1 code implementation18 Mar 2019 Alexander Pashevich, Robin Strudel, Igor Kalevatykh, Ivan Laptev, Cordelia Schmid

Policies learned in simulators, however, do not transfer well to real scenes given the domain gap between real and synthetic data.

Object Localization

Detecting unseen visual relations using analogies

no code implementations ICCV 2019 Julia Peyre, Ivan Laptev, Cordelia Schmid, Josef Sivic

We seek to detect visual relations in images of the form of triplets t = (subject, predicate, object), such as "person riding dog", where training examples of the individual entities are available but their combinations are unseen at training.

Tube-CNN: Modeling temporal evolution of appearance for object detection in video

no code implementations6 Dec 2018 Tuan-Hung Vu, Anton Osokin, Ivan Laptev

Our goal in this paper is to learn discriminative models for the temporal evolution of object appearance and to use such models for object detection.

Object Detection

Learning to Localize and Align Fine-Grained Actions to Sparse Instructions

no code implementations22 Sep 2018 Meera Hahn, Nataniel Ruiz, Jean-Baptiste Alayrac, Ivan Laptev, James M. Rehg

Automatic generation of textual video descriptions that are time-aligned with video content is a long-standing goal in computer vision.

Object Recognition

Modeling Spatio-Temporal Human Track Structure for Action Localization

no code implementations28 Jun 2018 Guilhem Chéron, Anton Osokin, Ivan Laptev, Cordelia Schmid

In order to localize actions in time, we propose a recurrent localization network (RecLNet) designed to model the temporal structure of actions on the level of person tracks.

Human Detection Optical Flow Estimation +3

Learning a Text-Video Embedding from Incomplete and Heterogeneous Data

5 code implementations7 Apr 2018 Antoine Miech, Ivan Laptev, Josef Sivic

We evaluate our method on the task of video retrieval and report results for the MPII Movie Description and MSR-VTT datasets.

Ranked #9 on Video Retrieval on LSMDC (using extra training data)

Video Retrieval

Learnable pooling with Context Gating for video classification

4 code implementations21 Jun 2017 Antoine Miech, Ivan Laptev, Josef Sivic

In particular, we evaluate our method on the large-scale multi-modal Youtube-8M v2 dataset and outperform all other methods in the Youtube 8M Large-Scale Video Understanding challenge.

Classification General Classification +2

Joint Discovery of Object States and Manipulation Actions

1 code implementation ICCV 2017 Jean-Baptiste Alayrac, Josev Sivic, Ivan Laptev, Simon Lacoste-Julien

We assume a consistent temporal order for the changes in object states and manipulation actions, and introduce new optimization techniques to learn model parameters without additional supervision.

Action Recognition

Learning from Synthetic Humans

2 code implementations CVPR 2017 Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev, Cordelia Schmid

In this work we present SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data.

3D Human Pose Estimation Human Part Segmentation +1

Much Ado About Time: Exhaustive Annotation of Temporal Data

no code implementations25 Jul 2016 Gunnar A. Sigurdsson, Olga Russakovsky, Ali Farhadi, Ivan Laptev, Abhinav Gupta

We conclude that the optimal strategy is to ask as many questions as possible in a HIT (up to 52 binary questions after watching a 30-second video clip in our experiments).

Thin-Slicing for Pose: Learning to Understand Pose Without Explicit Pose Estimation

no code implementations CVPR 2016 Suha Kwak, Minsu Cho, Ivan Laptev

We address the problem of learning a pose-aware, compact embedding that projects images with similar human poses to be placed close-by in the embedding space.

Action Recognition Image Retrieval +1

The THUMOS Challenge on Action Recognition for Videos "in the Wild"

no code implementations21 Apr 2016 Haroon Idrees, Amir R. Zamir, Yu-Gang Jiang, Alex Gorban, Ivan Laptev, Rahul Sukthankar, Mubarak Shah

Additionally, we include a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimmed videos, and how well methods trained on trimmed videos generalize to untrimmed videos.

Action Classification Action Recognition +3

Long-term Temporal Convolutions for Action Recognition

1 code implementation15 Apr 2016 Gül Varol, Ivan Laptev, Cordelia Schmid

Typical human actions last several seconds and exhibit characteristic spatio-temporal structure.

Action Recognition Optical Flow Estimation

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

no code implementations6 Apr 2016 Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, Abhinav Gupta

Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects.

Action Recognition

Context-aware CNNs for person head detection

1 code implementation ICCV 2015 Tuan-Hung Vu, Anton Osokin, Ivan Laptev

First, we leverage person-scene relations and propose a Global CNN model trained to predict positions and scales of heads directly from the full image.

Face Detection Head Detection +1

Unsupervised Learning from Narrated Instruction Videos

no code implementations CVPR 2016 Jean-Baptiste Alayrac, Piotr Bojanowski, Nishant Agrawal, Josef Sivic, Ivan Laptev, Simon Lacoste-Julien

Third, we experimentally demonstrate that the proposed method can automatically discover, in an unsupervised manner, the main steps to achieve the task and locate the steps in the input videos.

Weakly-Supervised Alignment of Video With Text

no code implementations ICCV 2015 Piotr Bojanowski, Rémi Lajugie, Edouard Grave, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid

Given vectorial features for both video and text, we propose to cast this task as a temporal assignment problem, with an implicit linear mapping between the two feature modalities.

Unsupervised Object Discovery and Tracking in Video Collections

no code implementations ICCV 2015 Suha Kwak, Minsu Cho, Ivan Laptev, Jean Ponce, Cordelia Schmid

This paper addresses the problem of automatically localizing dominant objects as spatio-temporal tubes in a noisy collection of videos with minimal or even no supervision.

Object Discovery Video Understanding

Weakly Supervised Action Labeling in Videos Under Ordering Constraints

no code implementations4 Jul 2014 Piotr Bojanowski, Rémi Lajugie, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid, Josef Sivic

We are given a set of video clips, each one annotated with an {\em ordered} list of actions, such as "walk" then "sit" then "answer phone" extracted from, for example, the associated text script.

Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks

no code implementations CVPR 2014 Maxime Oquab, Leon Bottou, Ivan Laptev, Josef Sivic

We show that despite differences in image statistics and tasks in the two datasets, the transferred representation leads to significantly improved results for object and action classification, outperforming the current state of the art on Pascal VOC 2007 and 2012 datasets.

Action Classification Action Localization +4

Learning person-object interactions for action recognition in still images

no code implementations NeurIPS 2011 Vincent Delaitre, Josef Sivic, Ivan Laptev

First, we replace the standard quantized local HOG/SIFT features with stronger discriminatively trained body part and object detectors.

Action Recognition In Still Images

Cannot find the paper you are looking for? You can Submit a new open access paper.