no code implementations • 22 Mar 2024 • Sofia Casarin, Cynthia I. Ugwu, Sergio Escalera, Oswald Lanz
The landscape of deep learning research is moving towards innovative strategies to harness the true potential of data.
no code implementations • 17 Dec 2022 • Tsung-Ming Tai, Giuseppe Fiameni, Cheng-Kuang Lee, Simon See, Oswald Lanz
Consequently, existing solutions based on the action recognition models are only suboptimal.
1 code implementation • 3 Aug 2022 • Alex Falcon, Giuseppe Serra, Oswald Lanz
Data augmentation techniques were introduced to increase the performance on unseen test examples by creating new training samples with the application of semantics-preserving techniques, such as color space or geometric transformations on images.
no code implementations • 22 Jun 2022 • Alex Falcon, Giuseppe Serra, Sergio Escalera, Oswald Lanz
This report presents the technical details of our submission to the EPIC-Kitchens-100 Multi-Instance Retrieval Challenge 2022.
Ranked #3 on Multi-Instance Retrieval on EPIC-KITCHENS-100
no code implementations • 22 Jun 2022 • Tsung-Ming Tai, Oswald Lanz, Giuseppe Fiameni, Yi-Kwan Wong, Sze-Sen Poon, Cheng-Kuang Lee, Ka-Chun Cheung, Simon See
In this report, we describe the technical details of our submission for the EPIC-Kitchen-100 action anticipation challenge.
1 code implementation • 2 Jun 2022 • Tsung-Ming Tai, Giuseppe Fiameni, Cheng-Kuang Lee, Simon See, Oswald Lanz
To this end, we propose a unified recurrence modeling for video action anticipation via message passing framework.
1 code implementation • 27 Apr 2022 • Alex Falcon, Swathikiran Sudhakaran, Giuseppe Serra, Sergio Escalera, Oswald Lanz
We show that even if we carefully tuned the fixed margin, our technique (which does not have the margin as a hyper-parameter) would still achieve better performance.
Ranked #7 on Multi-Instance Retrieval on EPIC-KITCHENS-100
2 code implementations • 16 Mar 2022 • Alex Falcon, Giuseppe Serra, Oswald Lanz
Due to the amount of videos and related captions uploaded every hour, deep learning-based solutions for cross-modal video retrieval are attracting more and more attention.
Ranked #5 on Multi-Instance Retrieval on EPIC-KITCHENS-100
1 code implementation • 16 Mar 2022 • Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz
3D kernel factorization approaches have been proposed to reduce the complexity of 3D CNNs.
Ranked #17 on Action Recognition on EPIC-KITCHENS-100 (using extra training data)
no code implementations • 6 Oct 2021 • Swathikiran Sudhakaran, Adrian Bulat, Juan-Manuel Perez-Rua, Alex Falcon, Sergio Escalera, Oswald Lanz, Brais Martinez, Georgios Tzimiropoulos
This report presents the technical details of our submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021.
1 code implementation • 17 Apr 2021 • Tsung-Ming Tai, Giuseppe Fiameni, Cheng-Kuang Lee, Oswald Lanz
Endowing visual agents with predictive capability is a key step towards video intelligence at scale.
no code implementations • 16 Feb 2021 • Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz
We present EgoACO, a deep neural architecture for video action recognition that learns to pool action-context-object descriptors from frame level features by leveraging the verb-noun structure of action labels in egocentric video datasets.
no code implementations • 22 Aug 2020 • Alex Falcon, Oswald Lanz, Giuseppe Serra
Video Question Answering (VideoQA) is a task that requires a model to analyze and understand both the visual content given by the input video and the textual part given by the question, and the interaction between them in order to produce a meaningful answer.
1 code implementation • 6 Jul 2020 • Mohamed Ilyes Lakhal, Davide Boscaini, Fabio Poiesi, Oswald Lanz, Andrea Cavallaro
We first estimate the 3D mesh of the target body and transfer the rough textures from the 2D images to the mesh.
no code implementations • 24 Jun 2020 • Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz
In this report we describe the technical details of our submission to the EPIC-Kitchens Action Recognition 2020 Challenge.
2 code implementations • CVPR 2020 • Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz
Deep 3D CNNs for video action recognition are designed to learn powerful representations in the joint spatio-temporal feature space.
Ranked #26 on Action Recognition on Something-Something V1 (using extra training data)
no code implementations • 2 Jul 2019 • Swathikiran Sudhakaran, Oswald Lanz
We review three recent deep learning based methods for action recognition and present a brief comparative analysis of the methods from a neurophyisiological point of view.
no code implementations • 21 Jun 2019 • Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz
In this report we describe the technical details of our submission to the EPIC-Kitchens 2019 action recognition challenge.
no code implementations • 29 May 2019 • Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz
Most action recognition methods base on a) a late aggregation of frame level CNN features using average pooling, max pooling, or RNN, among others, or b) spatio-temporal aggregation via 3D convolutions.
Ranked #51 on Action Recognition on HMDB-51 (using extra training data)
1 code implementation • CVPR 2019 • Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz
Egocentric activity recognition is one of the most challenging tasks in video analysis.
Ranked #5 on Egocentric Activity Recognition on EGTEA
no code implementations • 29 Aug 2018 • Swathikiran Sudhakaran, Oswald Lanz
Most recent approaches for action recognition from video leverage deep architectures to encode the video clip into a fixed length representation vector that is then used for classification.
1 code implementation • 31 Jul 2018 • Swathikiran Sudhakaran, Oswald Lanz
Our model is built on the observation that egocentric activities are highly characterized by the objects and their locations in the video.
Ranked #6 on Egocentric Activity Recognition on EGTEA
no code implementations • 19 Sep 2017 • Swathikiran Sudhakaran, Oswald Lanz
The proposed approach uses a pair of convolutional neural networks, whose parameters are shared, for extracting frame level features from successive frames of the video.
no code implementations • 19 Sep 2017 • Swathikiran Sudhakaran, Oswald Lanz
A convolutional neural network is used to extract frame level features from a video.
no code implementations • ICCV 2015 • Elisa Ricci, Jagannadan Varadarajan, Ramanathan Subramanian, Samuel Rota Bulo, Narendra Ahuja, Oswald Lanz
We present a novel approach for jointly estimating tar- gets' head, body orientations and conversational groups called F-formations from a distant social scene (e. g., a cocktail party captured by surveillance cameras).
no code implementations • 23 Jun 2015 • Xavier Alameda-Pineda, Jacopo Staiano, Ramanathan Subramanian, Ligia Batrinca, Elisa Ricci, Bruno Lepri, Oswald Lanz, Nicu Sebe
Studying free-standing conversational groups (FCGs) in unstructured social settings (e. g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels.