Search Results for author: Xavier Alameda-Pineda

Found 38 papers, 11 papers with code

Self-Supervised Models are Continual Learners

no code implementations8 Dec 2021 Enrico Fini, Victor G. Turrisi da Costa, Xavier Alameda-Pineda, Elisa Ricci, Karteek Alahari, Julien Mairal

Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale.

Continual Learning Representation Learning

Successor Feature Neural Episodic Control

no code implementations4 Nov 2021 David Emukpere, Xavier Alameda-Pineda, Chris Reinke

A longstanding goal in reinforcement learning is to build intelligent agents that show fast learning and a flexible transfer of skills akin to humans and animals.

Transfer Learning

Xi-Learning: Successor Feature Transfer Learning for General Reward Functions

no code implementations29 Oct 2021 Chris Reinke, Xavier Alameda-Pineda

Transfer in Reinforcement Learning aims to improve learning performance on target tasks using knowledge from experienced source tasks.

Transfer Learning

Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders

1 code implementation23 Jun 2021 Xiaoyu Bie, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin

Dynamical variational auto-encoders (DVAEs) are a class of deep generative models with latent variables, dedicated to time series data modeling.

Representation Learning Speech Enhancement +1

Multi-Person Extreme Motion Prediction

no code implementations18 May 2021 Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, Francesc Moreno-Noguer

In this paper, we explore this problem when dealing with humans performing collaborative tasks, we seek to predict the future motion of two interacted persons given two sequences of their past skeletons.

Human motion prediction motion prediction

TransCenter: Transformers with Dense Queries for Multiple-Object Tracking

1 code implementation28 Mar 2021 Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus, Xavier Alameda-Pineda

Methodologically, we propose the use of dense pixel-level multi-scale queries in a transformer dual-decoder network, to be able to globally and robustly infer the heatmap of targets' centers and associate them through time.

Multiple Object Tracking

SocialInteractionGAN: Multi-person Interaction Sequence Generation

no code implementations10 Mar 2021 Louis Airale, Dominique Vaufreydaz, Xavier Alameda-Pineda

Our model builds on a recurrent encoder-decoder generator network and a dual-stream discriminator.

Variational Structured Attention Networks for Deep Visual Representation Learning

1 code implementation5 Mar 2021 Guanglei Yang, Paolo Rota, Xavier Alameda-Pineda, Dan Xu, Mingli Ding, Elisa Ricci

Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework, leading to Variational STructured Attention networks (VISTA-Net).

Depth Estimation Representation Learning +1

Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement

no code implementations8 Feb 2021 Mostafa Sadeghi, Xavier Alameda-Pineda

Recently, audio-visual speech enhancement has been tackled in the unsupervised settings based on variational auto-encoders (VAEs), where during training only clean data is used to train a generative model for speech, which at test time is combined with a noise model, e. g. nonnegative matrix factorization (NMF), whose parameters are learned without supervision.

Speech Enhancement

Probabilistic Graph Attention Network with Conditional Kernels for Pixel-Wise Prediction

no code implementations8 Jan 2021 Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe

In contrast to previous works directly considering multi-scale feature maps obtained from the inner layers of a primary CNN architecture, and simply fusing the features with weighted averaging or concatenation, we propose a probabilistic graph attention network structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs) model for learning and fusing multi-scale representations in a principled manner.

BSDS500 Graph Attention +2

Variational Structured Attention Networks for Dense Pixel-Wise Prediction

no code implementations1 Jan 2021 Guanglei Yang, Paolo Rota, Xavier Alameda-Pineda, Dan Xu, Mingli Ding, Elisa Ricci

State-of-the-art performances in dense pixel-wise prediction tasks are obtained with specifically designed convolutional networks.

PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation

no code implementations11 Oct 2020 Wen Guo, Enric Corona, Francesc Moreno-Noguer, Xavier Alameda-Pineda

Our pose interacting network, or PI-Net, inputs the initial pose estimates of a variable number of interactees into a recurrent architecture used to refine the pose of the person-of-interest.

3D Pose Estimation

Dynamical Variational Autoencoders: A Comprehensive Review

2 code implementations28 Aug 2020 Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, Xavier Alameda-Pineda

The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner.


Deep Variational Generative Models for Audio-visual Speech Separation

no code implementations17 Aug 2020 Viet-Nhat Nguyen, Mostafa Sadeghi, Elisa Ricci, Xavier Alameda-Pineda

To better utilize the visual information, the posteriors of the latent variables are inferred from mixed speech (instead of clean speech) as well as the visual data.

Speech Separation

Describe What to Change: A Text-guided Unsupervised Image-to-Image Translation Approach

1 code implementation10 Aug 2020 Yahui Liu, Marco De Nadai, Deng Cai, Huayang Li, Xavier Alameda-Pineda, Nicu Sebe, Bruno Lepri

Our proposed model disentangles the image content from the visual attributes, and it learns to modify the latter using the textual description, before generating a new image from the content and the modified attribute representation.

Image Captioning Translation +1

Unsupervised Performance Analysis of 3D Face Alignment

no code implementations14 Apr 2020 Mostafa Sadeghi, Sylvain Guy, Adrien Raison, Xavier Alameda-Pineda, Radu Horaud

We empirically show that the proposed pipeline is neither method-biased nor data-biased, and that it can be used to assess both the performance of 3DFA algorithms and the accuracy of annotations of face datasets.

3D Face Alignment Face Alignment

Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement

no code implementations23 Dec 2019 Mostafa Sadeghi, Xavier Alameda-Pineda

Two encoder networks input, respectively, audio and visual data, and the posterior of the latent variables is modeled as a mixture of two Gaussian distributions output from each encoder network.

Speech Enhancement Variational Inference

Robust Unsupervised Audio-visual Speech Enhancement Using a Mixture of Variational Autoencoders

no code implementations10 Nov 2019 Mostafa Sadeghi, Xavier Alameda-Pineda

When visual data is clean, speech enhancement with audio-visual VAE shows a better performance than with audio-only VAE, which is trained on audio-only data.

Speech Enhancement

A Recurrent Variational Autoencoder for Speech Enhancement

no code implementations24 Oct 2019 Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud

This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE).

Speech Enhancement

Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders

no code implementations7 Aug 2019 Mostafa Sadeghi, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud

Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data.

Speech Enhancement

How To Train Your Deep Multi-Object Tracker

3 code implementations CVPR 2020 Yihong Xu, Aljosa Osep, Yutong Ban, Radu Horaud, Laura Leal-Taixe, Xavier Alameda-Pineda

In this paper, we bridge this gap by proposing a differentiable proxy of MOTA and MOTP, which we combine in a loss function suitable for end-to-end training of deep multi-object trackers.

Multi-Object Tracking Multiple Object Tracking

Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers

no code implementations28 Sep 2018 Yutong Ban, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud

We propose a variational inference model which amounts to approximate the joint distribution with a factorized distribution.

Bayesian Inference Variational Inference +1

A Comprehensive Analysis of Deep Regression

2 code implementations22 Mar 2018 Stéphane Lathuilière, Pablo Mesejo, Xavier Alameda-Pineda, Radu Horaud

Deep learning revolutionized data science, and recently its popularity has grown exponentially, as did the amount of papers employing deep networks.

Pose Estimation

Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction

no code implementations NeurIPS 2017 Dan Xu, Wanli Ouyang, Xavier Alameda-Pineda, Elisa Ricci, Xiaogang Wang, Nicu Sebe

Recent works have shown that exploiting multi-scale representations deeply learned via convolutional neural networks (CNN) is of tremendous importance for accurate contour detection.

BSDS500 Contour Detection

How to Make an Image More Memorable? A Deep Style Transfer Approach

1 code implementation6 Apr 2017 Aliaksandr Siarohin, Gloria Zen, Cveta Majtanovic, Xavier Alameda-Pineda, Elisa Ricci, Nicu Sebe

In this work, we show that it is possible to automatically retrieve the best style seeds for a given image, thus remarkably reducing the number of human attempts needed to find a good match.

Image Generation Style Transfer

Viraliency: Pooling Local Virality

1 code implementation CVPR 2017 Xavier Alameda-Pineda, Andrea Pilzer, Dan Xu, Nicu Sebe, Elisa Ricci

In our overly-connected world, the automatic recognition of virality - the quality of an image or video to be rapidly and widely spread in social networks - is of crucial importance, and has recently awaken the interest of the computer vision community.

Self-Adaptive Matrix Completion for Heart Rate Estimation From Face Videos Under Realistic Conditions

no code implementations CVPR 2016 Sergey Tulyakov, Xavier Alameda-Pineda, Elisa Ricci, Lijun Yin, Jeffrey F. Cohn, Nicu Sebe

Recent studies in computer vision have shown that, while practically invisible to a human observer, skin color changes due to blood flow can be captured on face videos and, surprisingly, be used to estimate the heart rate (HR).

Heart rate estimation Matrix Completion

An On-line Variational Bayesian Model for Multi-Person Tracking from Cluttered Scenes

no code implementations4 Sep 2015 Sileye . Ba, Xavier Alameda-Pineda, Alessio Xompero, Radu Horaud

In this paper, we propose an on-line variational Bayesian model for multi-person tracking from cluttered visual observations provided by person detectors.

Human robot interaction Multiple Object Tracking

EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis

no code implementations4 Sep 2015 Israel D. Gebru, Xavier Alameda-Pineda, Florence Forbes, Radu Horaud

We propose a model selection method based on a minimum message length criterion, provide a weight initialization strategy, and validate the proposed algorithms by comparing them with several state of the art parametric and non-parametric clustering techniques.

Model Selection

SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

no code implementations23 Jun 2015 Xavier Alameda-Pineda, Jacopo Staiano, Ramanathan Subramanian, Ligia Batrinca, Elisa Ricci, Bruno Lepri, Oswald Lanz, Nicu Sebe

Studying free-standing conversational groups (FCGs) in unstructured social settings (e. g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels.

Vision-Guided Robot Hearing

no code implementations6 Nov 2013 Xavier Alameda-Pineda, Radu Horaud

Natural human-robot interaction in complex and unpredictable environments is one of the main research lines in robotics.

Human robot interaction Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.