Search Results for author: Xavier Alameda-Pineda

Found 60 papers, 23 papers with code

VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space

no code implementations13 Dec 2023 Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Antonio Agudo, Francesc Moreno-Noguer

Instead of predicting body model parameters or 3D vertex coordinates, our focus is on forecasting the proposed discrete latent representation, which can be decoded into a registered human mesh.

Mixture of Dynamical Variational Autoencoders for Multi-Source Trajectory Modeling and Separation

no code implementations7 Dec 2023 Xiaoyu Lin, Laurent Girin, Xavier Alameda-Pineda

In this paper, we propose a latent-variable generative model called mixture of dynamical variational autoencoders (MixDVAE) to model the dynamics of a system composed of multiple moving sources.

Audio Source Separation Multi-Object Tracking +1

Univariate Radial Basis Function Layers: Brain-inspired Deep Neural Layers for Low-Dimensional Inputs

1 code implementation7 Nov 2023 Daniel Jost, Basavasagar Patil, Xavier Alameda-Pineda, Chris Reinke

Deep Neural Networks (DNNs) became the standard tool for function approximation with most of the introduced architectures being developed for high-dimensional input data.

On the Effectiveness of LayerNorm Tuning for Continual Learning in Vision Transformers

1 code implementation18 Aug 2023 Thomas De Min, Massimiliano Mancini, Karteek Alahari, Xavier Alameda-Pineda, Elisa Ricci

State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts, drastically reducing catastrophic forgetting.

Continual Learning Transfer Learning

A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation

1 code implementation4 Jul 2023 Louis Airale, Dominique Vaufreydaz, Xavier Alameda-Pineda

Animating still face images with deep generative models using a speech input signal is an active research topic and has seen important recent progress.

Talking Head Generation

Unsupervised speech enhancement with deep dynamical generative speech and noise models

no code implementations13 Jun 2023 Xiaoyu Lin, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda

This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model.

Speech Enhancement

A multimodal dynamical variational autoencoder for audiovisual speech representation learning

no code implementations5 May 2023 Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier

The latent space is structured to dissociate the latent dynamical factors that are shared between the modalities from those that are specific to each modality.

Disentanglement Image Denoising +2

Speech Modeling with a Hierarchical Transformer Dynamical VAE

no code implementations7 Mar 2023 Xiaoyu Lin, Xiaoyu Bie, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda

The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep generative models that extends the VAE to model a sequence of observed data and a corresponding sequence of latent vectors.

Speech Enhancement

A weighted-variance variational autoencoder model for speech enhancement

no code implementations2 Nov 2022 Ali Golmakani, Mostafa Sadeghi, Xavier Alameda-Pineda, Romain Serizel

A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable.

Speech Enhancement

Autoregressive GAN for Semantic Unconditional Head Motion Generation

1 code implementation2 Nov 2022 Louis Airale, Xavier Alameda-Pineda, Stéphane Lathuilière, Dominique Vaufreydaz

In this work, we address the task of unconditional head motion generation to animate still human faces in a low-dimensional semantic space from a single reference pose.

Talking Head Generation

Variational Meta Reinforcement Learning for Social Robotics

no code implementations7 Jun 2022 Anand Ballou, Xavier Alameda-Pineda, Chris Reinke

We demonstrate the interest of the RBF layer and the usage of meta-RL for social robotics on four robotic simulation tasks.

Meta Reinforcement Learning Navigate +2

Learning and controlling the source-filter representation of speech with a variational autoencoder

1 code implementation14 Apr 2022 Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier

Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we propose a method to identify the latent subspaces encoding $f_0$ and the first three formant frequencies, we show that these subspaces are orthogonal, and based on this orthogonality, we develop a method to accurately and independently control the source-filter speech factors within the latent subspaces.

Expression-preserving face frontalization improves visually assisted speech processing

no code implementations6 Apr 2022 Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda

The method alternates between the estimation of (i)~the rigid transformation (scale, rotation, and translation) and (ii)~the non-rigid deformation between an arbitrarily-viewed face and a face model.

Face Model Lip Reading +1

Uncertainty-aware Contrastive Distillation for Incremental Semantic Segmentation

1 code implementation26 Mar 2022 Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding, Moin Nabi, Xavier Alameda-Pineda, Elisa Ricci

This problem has been widely investigated in the research community and several Incremental Learning (IL) approaches have been proposed in the past years.

Contrastive Learning Image Classification +5

Unsupervised Multiple-Object Tracking with a Dynamical Variational Autoencoder

no code implementations18 Feb 2022 Xiaoyu Lin, Laurent Girin, Xavier Alameda-Pineda

In this paper, we present an unsupervised probabilistic model and associated estimation algorithm for multi-object tracking (MOT) based on a dynamical variational autoencoder (DVAE), called DVAE-UMOT.

Multi-Object Tracking Multiple Object Tracking +3

Continual Attentive Fusion for Incremental Learning in Semantic Segmentation

1 code implementation1 Feb 2022 Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding, Hao Tang, Xavier Alameda-Pineda, Elisa Ricci

To fill this gap, in this paper we introduce a novel attentive feature distillation approach to mitigate catastrophic forgetting while accounting for semantic spatial- and channel-level dependencies.

Incremental Learning Semantic Segmentation

A Proposal-Based Paradigm for Self-Supervised Sound Source Localization in Videos

no code implementations CVPR 2022 Hanyu Xuan, Zhiliang Wu, Jian Yang, Yan Yan, Xavier Alameda-Pineda

Humans can easily recognize where and how the sound is produced via watching a scene and listening to corresponding audio cues.

Multiple Instance Learning

Self-Supervised Models are Continual Learners

1 code implementation CVPR 2022 Enrico Fini, Victor G. Turrisi da Costa, Xavier Alameda-Pineda, Elisa Ricci, Karteek Alahari, Julien Mairal

Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale.

Continual Learning Representation Learning

Successor Feature Neural Episodic Control

no code implementations4 Nov 2021 David Emukpere, Xavier Alameda-Pineda, Chris Reinke

A longstanding goal in reinforcement learning is to build intelligent agents that show fast learning and a flexible transfer of skills akin to humans and animals.

reinforcement-learning Reinforcement Learning (RL) +1

Successor Feature Representations

no code implementations29 Oct 2021 Chris Reinke, Xavier Alameda-Pineda

Successor Representations (SR) and their extension Successor Features (SF) are prominent transfer mechanisms in domains where reward functions change between tasks.

Transfer Learning

Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders

1 code implementation23 Jun 2021 Xiaoyu Bie, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin

We propose an unsupervised speech enhancement algorithm that combines a DVAE speech prior pre-trained on clean speech signals with a noise model based on nonnegative matrix factorization, and we derive a variational expectation-maximization (VEM) algorithm to perform speech enhancement.

Representation Learning Speech Enhancement +2

Multi-Person Extreme Motion Prediction

1 code implementation CVPR 2022 Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, Francesc Moreno-Noguer

In this paper, we explore this problem when dealing with humans performing collaborative tasks, we seek to predict the future motion of two interacted persons given two sequences of their past skeletons.

Human motion prediction motion prediction +2

TransCenter: Transformers with Dense Representations for Multiple-Object Tracking

2 code implementations28 Mar 2021 Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus, Xavier Alameda-Pineda

Methodologically, we propose the use of image-related dense detection queries and efficient sparse tracking queries produced by our carefully designed query learning networks (QLN).

Ranked #11 on Multi-Object Tracking on MOT20 (using extra training data)

Image Classification Multi-Object Tracking +4

SocialInteractionGAN: Multi-person Interaction Sequence Generation

no code implementations10 Mar 2021 Louis Airale, Dominique Vaufreydaz, Xavier Alameda-Pineda

In this paper, we focus on a unimodal representation of interactions and propose to tackle interaction generation in a data-driven fashion.

Variational Structured Attention Networks for Deep Visual Representation Learning

1 code implementation5 Mar 2021 Guanglei Yang, Paolo Rota, Xavier Alameda-Pineda, Dan Xu, Mingli Ding, Elisa Ricci

Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework, leading to Variational STructured Attention networks (VISTA-Net).

Depth Estimation Representation Learning +1

Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement

no code implementations8 Feb 2021 Mostafa Sadeghi, Xavier Alameda-Pineda

Recently, audio-visual speech enhancement has been tackled in the unsupervised settings based on variational auto-encoders (VAEs), where during training only clean data is used to train a generative model for speech, which at test time is combined with a noise model, e. g. nonnegative matrix factorization (NMF), whose parameters are learned without supervision.

Speech Enhancement

Probabilistic Graph Attention Network with Conditional Kernels for Pixel-Wise Prediction

no code implementations8 Jan 2021 Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe

In contrast to previous works directly considering multi-scale feature maps obtained from the inner layers of a primary CNN architecture, and simply fusing the features with weighted averaging or concatenation, we propose a probabilistic graph attention network structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs) model for learning and fusing multi-scale representations in a principled manner.

Graph Attention Monocular Depth Estimation +1

Variational Structured Attention Networks for Dense Pixel-Wise Prediction

1 code implementation1 Jan 2021 Guanglei Yang, Paolo Rota, Xavier Alameda-Pineda, Dan Xu, Mingli Ding, Elisa Ricci

State-of-the-art performances in dense pixel-wise prediction tasks are obtained with specifically designed convolutional networks.

PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation

no code implementations11 Oct 2020 Wen Guo, Enric Corona, Francesc Moreno-Noguer, Xavier Alameda-Pineda

Our pose interacting network, or PI-Net, inputs the initial pose estimates of a variable number of interactees into a recurrent architecture used to refine the pose of the person-of-interest.

3D Multi-Person Pose Estimation (root-relative) 3D Pose Estimation

Dynamical Variational Autoencoders: A Comprehensive Review

1 code implementation28 Aug 2020 Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, Xavier Alameda-Pineda

Recently, a series of papers have presented different extensions of the VAE to process sequential data, which model not only the latent space but also the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks or state-space models.

3D Human Dynamics Resynthesis +2

Deep Variational Generative Models for Audio-visual Speech Separation

no code implementations17 Aug 2020 Viet-Nhat Nguyen, Mostafa Sadeghi, Elisa Ricci, Xavier Alameda-Pineda

To better utilize the visual information, the posteriors of the latent variables are inferred from mixed speech (instead of clean speech) as well as the visual data.

Speech Separation

Describe What to Change: A Text-guided Unsupervised Image-to-Image Translation Approach

1 code implementation10 Aug 2020 Yahui Liu, Marco De Nadai, Deng Cai, Huayang Li, Xavier Alameda-Pineda, Nicu Sebe, Bruno Lepri

Our proposed model disentangles the image content from the visual attributes, and it learns to modify the latter using the textual description, before generating a new image from the content and the modified attribute representation.

Attribute Image Captioning +3

Unsupervised Performance Analysis of 3D Face Alignment with a Statistically Robust Confidence Test

no code implementations14 Apr 2020 Mostafa Sadeghi, Xavier Alameda-Pineda, Radu Horaud

The results show that the proposed analysis is consistent with supervised metrics and that it can be used to measure the accuracy of both predicted landmarks and of automatically annotated 3DFA datasets, to detect errors and to eliminate them.

3D Face Alignment Face Alignment

GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image Translation via Attribute Gaussian Mixture Modeling

1 code implementation15 Mar 2020 Yahui Liu, Marco De Nadai, Jian Yao, Nicu Sebe, Bruno Lepri, Xavier Alameda-Pineda

Unsupervised image-to-image translation (UNIT) aims at learning a mapping between several visual domains by using unpaired training images.

Attribute Translation +1

Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement

no code implementations23 Dec 2019 Mostafa Sadeghi, Xavier Alameda-Pineda

Two encoder networks input, respectively, audio and visual data, and the posterior of the latent variables is modeled as a mixture of two Gaussian distributions output from each encoder network.

Speech Enhancement Variational Inference

Robust Unsupervised Audio-visual Speech Enhancement Using a Mixture of Variational Autoencoders

no code implementations10 Nov 2019 Mostafa Sadeghi, Xavier Alameda-Pineda

When visual data is clean, speech enhancement with audio-visual VAE shows a better performance than with audio-only VAE, which is trained on audio-only data.

Speech Enhancement

A Recurrent Variational Autoencoder for Speech Enhancement

no code implementations24 Oct 2019 Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud

This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE).

Speech Enhancement

Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders

no code implementations7 Aug 2019 Mostafa Sadeghi, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud

Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data.

Speech Enhancement

How To Train Your Deep Multi-Object Tracker

2 code implementations CVPR 2020 Yihong Xu, Aljosa Osep, Yutong Ban, Radu Horaud, Laura Leal-Taixe, Xavier Alameda-Pineda

In this paper, we bridge this gap by proposing a differentiable proxy of MOTA and MOTP, which we combine in a loss function suitable for end-to-end training of deep multi-object trackers.

Multi-Object Tracking Multiple Object Tracking +1

CANU-ReID: A Conditional Adversarial Network for Unsupervised person Re-IDentification

no code implementations2 Apr 2019 Guillaume Delorme, Yihong Xu, Stephane Lathuilière, Radu Horaud, Xavier Alameda-Pineda

Unsupervised person re-ID is the task of identifying people on a target data set for which the ID labels are unavailable during training.

Clustering Domain Adaptation +1

Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers

no code implementations28 Sep 2018 Yutong Ban, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud

We propose a variational inference model which amounts to approximate the joint distribution with a factorized distribution.

Bayesian Inference Variational Inference +1

A Comprehensive Analysis of Deep Regression

2 code implementations22 Mar 2018 Stéphane Lathuilière, Pablo Mesejo, Xavier Alameda-Pineda, Radu Horaud

Deep learning revolutionized data science, and recently its popularity has grown exponentially, as did the amount of papers employing deep networks.

Pose Estimation regression

Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction

no code implementations NeurIPS 2017 Dan Xu, Wanli Ouyang, Xavier Alameda-Pineda, Elisa Ricci, Xiaogang Wang, Nicu Sebe

Recent works have shown that exploiting multi-scale representations deeply learned via convolutional neural networks (CNN) is of tremendous importance for accurate contour detection.

Contour Detection

How to Make an Image More Memorable? A Deep Style Transfer Approach

1 code implementation6 Apr 2017 Aliaksandr Siarohin, Gloria Zen, Cveta Majtanovic, Xavier Alameda-Pineda, Elisa Ricci, Nicu Sebe

In this work, we show that it is possible to automatically retrieve the best style seeds for a given image, thus remarkably reducing the number of human attempts needed to find a good match.

Image Generation Style Transfer

Viraliency: Pooling Local Virality

1 code implementation CVPR 2017 Xavier Alameda-Pineda, Andrea Pilzer, Dan Xu, Nicu Sebe, Elisa Ricci

In our overly-connected world, the automatic recognition of virality - the quality of an image or video to be rapidly and widely spread in social networks - is of crucial importance, and has recently awaken the interest of the computer vision community.

Self-Adaptive Matrix Completion for Heart Rate Estimation From Face Videos Under Realistic Conditions

no code implementations CVPR 2016 Sergey Tulyakov, Xavier Alameda-Pineda, Elisa Ricci, Lijun Yin, Jeffrey F. Cohn, Nicu Sebe

Recent studies in computer vision have shown that, while practically invisible to a human observer, skin color changes due to blood flow can be captured on face videos and, surprisingly, be used to estimate the heart rate (HR).

Heart rate estimation Matrix Completion

An On-line Variational Bayesian Model for Multi-Person Tracking from Cluttered Scenes

no code implementations4 Sep 2015 Sileye . Ba, Xavier Alameda-Pineda, Alessio Xompero, Radu Horaud

In this paper, we propose an on-line variational Bayesian model for multi-person tracking from cluttered visual observations provided by person detectors.

Multiple Object Tracking Object

EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis

no code implementations4 Sep 2015 Israel D. Gebru, Xavier Alameda-Pineda, Florence Forbes, Radu Horaud

We propose a model selection method based on a minimum message length criterion, provide a weight initialization strategy, and validate the proposed algorithms by comparing them with several state of the art parametric and non-parametric clustering techniques.

Clustering Model Selection

SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

no code implementations23 Jun 2015 Xavier Alameda-Pineda, Jacopo Staiano, Ramanathan Subramanian, Ligia Batrinca, Elisa Ricci, Bruno Lepri, Oswald Lanz, Nicu Sebe

Studying free-standing conversational groups (FCGs) in unstructured social settings (e. g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels.

Vision-Guided Robot Hearing

no code implementations6 Nov 2013 Xavier Alameda-Pineda, Radu Horaud

Natural human-robot interaction in complex and unpredictable environments is one of the main research lines in robotics.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.