Search Results for author: Oswald Lanz

Found 26 papers, 10 papers with code

A Feature-space Multimodal Data Augmentation Technique for Text-video Retrieval

1 code implementation3 Aug 2022 Alex Falcon, Giuseppe Serra, Oswald Lanz

Data augmentation techniques were introduced to increase the performance on unseen test examples by creating new training samples with the application of semantics-preserving techniques, such as color space or geometric transformations on images.

Data Augmentation Retrieval +1

Unified Recurrence Modeling for Video Action Anticipation

1 code implementation2 Jun 2022 Tsung-Ming Tai, Giuseppe Fiameni, Cheng-Kuang Lee, Simon See, Oswald Lanz

To this end, we propose a unified recurrence modeling for video action anticipation via message passing framework.

Action Anticipation Decision Making

Relevance-based Margin for Contrastively-trained Video Retrieval Models

1 code implementation27 Apr 2022 Alex Falcon, Swathikiran Sudhakaran, Giuseppe Serra, Sergio Escalera, Oswald Lanz

We show that even if we carefully tuned the fixed margin, our technique (which does not have the margin as a hyper-parameter) would still achieve better performance.

Multi-Instance Retrieval Natural Language Queries +2

Learning video retrieval models with relevance-aware online mining

2 code implementations16 Mar 2022 Alex Falcon, Giuseppe Serra, Oswald Lanz

Due to the amount of videos and related captions uploaded every hour, deep learning-based solutions for cross-modal video retrieval are attracting more and more attention.

Multi-Instance Retrieval Retrieval +2

Gate-Shift-Fuse for Video Action Recognition

1 code implementation16 Mar 2022 Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

3D kernel factorization approaches have been proposed to reduce the complexity of 3D CNNs.

Ranked #17 on Action Recognition on EPIC-KITCHENS-100 (using extra training data)

Action Recognition Temporal Action Localization +1

Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries

no code implementations16 Feb 2021 Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

We present EgoACO, a deep neural architecture for video action recognition that learns to pool action-context-object descriptors from frame level features by leveraging the verb-noun structure of action labels in egocentric video datasets.

Action Recognition Object +1

Data augmentation techniques for the Video Question Answering task

no code implementations22 Aug 2020 Alex Falcon, Oswald Lanz, Giuseppe Serra

Video Question Answering (VideoQA) is a task that requires a model to analyze and understand both the visual content given by the input video and the textual part given by the question, and the interaction between them in order to produce a meaningful answer.

Data Augmentation Question Answering +1

Novel-View Human Action Synthesis

1 code implementation6 Jul 2020 Mohamed Ilyes Lakhal, Davide Boscaini, Fabio Poiesi, Oswald Lanz, Andrea Cavallaro

We first estimate the 3D mesh of the target body and transfer the rough textures from the 2D images to the mesh.

Novel View Synthesis Video Generation

FBK-HUPBA Submission to the EPIC-Kitchens Action Recognition 2020 Challenge

no code implementations24 Jun 2020 Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

In this report we describe the technical details of our submission to the EPIC-Kitchens Action Recognition 2020 Challenge.

Action Recognition

Gate-Shift Networks for Video Action Recognition

2 code implementations CVPR 2020 Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

Deep 3D CNNs for video action recognition are designed to learn powerful representations in the joint spatio-temporal feature space.

Ranked #26 on Action Recognition on Something-Something V1 (using extra training data)

Action Recognition

An Analysis of Deep Neural Networks with Attention for Action Recognition from a Neurophysiological Perspective

no code implementations2 Jul 2019 Swathikiran Sudhakaran, Oswald Lanz

We review three recent deep learning based methods for action recognition and present a brief comparative analysis of the methods from a neurophyisiological point of view.

Action Recognition

FBK-HUPBA Submission to the EPIC-Kitchens 2019 Action Recognition Challenge

no code implementations21 Jun 2019 Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

In this report we describe the technical details of our submission to the EPIC-Kitchens 2019 action recognition challenge.

Action Recognition

Hierarchical Feature Aggregation Networks for Video Action Recognition

no code implementations29 May 2019 Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

Most action recognition methods base on a) a late aggregation of frame level CNN features using average pooling, max pooling, or RNN, among others, or b) spatio-temporal aggregation via 3D convolutions.

Ranked #51 on Action Recognition on HMDB-51 (using extra training data)

Action Recognition Temporal Action Localization

Top-down Attention Recurrent VLAD Encoding for Action Recognition in Videos

no code implementations29 Aug 2018 Swathikiran Sudhakaran, Oswald Lanz

Most recent approaches for action recognition from video leverage deep architectures to encode the video clip into a fixed length representation vector that is then used for classification.

Action Recognition In Videos General Classification +2

Convolutional Long Short-Term Memory Networks for Recognizing First Person Interactions

no code implementations19 Sep 2017 Swathikiran Sudhakaran, Oswald Lanz

The proposed approach uses a pair of convolutional neural networks, whose parameters are shared, for extracting frame level features from successive frames of the video.

Learning to Detect Violent Videos using Convolutional Long Short-Term Memory

no code implementations19 Sep 2017 Swathikiran Sudhakaran, Oswald Lanz

A convolutional neural network is used to extract frame level features from a video.

Uncovering Interactions and Interactors: Joint Estimation of Head, Body Orientation and F-Formations From Surveillance Videos

no code implementations ICCV 2015 Elisa Ricci, Jagannadan Varadarajan, Ramanathan Subramanian, Samuel Rota Bulo, Narendra Ahuja, Oswald Lanz

We present a novel approach for jointly estimating tar- gets' head, body orientations and conversational groups called F-formations from a distant social scene (e. g., a cocktail party captured by surveillance cameras).

TAR

SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

no code implementations23 Jun 2015 Xavier Alameda-Pineda, Jacopo Staiano, Ramanathan Subramanian, Ligia Batrinca, Elisa Ricci, Bruno Lepri, Oswald Lanz, Nicu Sebe

Studying free-standing conversational groups (FCGs) in unstructured social settings (e. g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels.

Cannot find the paper you are looking for? You can Submit a new open access paper.