We test our proposed method on finetuning multiple natural language understanding tasks by employing BERT-Large as an instantiation of the Transformer and the GLUE as the evaluation benchmark.
We present a locality-aware method for interpreting the latent space of wavelet-based Generative Adversarial Networks (GANs), that can well capture the large spatial and spectral variability that is characteristic to satellite imagery.
In this work, we investigate the personalization of text-to-music diffusion models in a few-shot setting.
The implicit assumption of this task is that the sound signal is either missing or contains a high amount of noise/corruption such that it is not useful for processing.
Most established approaches to date involve a two-step process, whereby an intermediate representation from the video, such as a spectrogram, is extracted first and then passed to a vocoder to produce the raw audio.
In this paper, we propose to separate representations of the different visual modalities in CLIP's joint vision-language space by leveraging the association between parts of speech and specific visual modes of variation (e. g. nouns relate to objects, adjectives describe appearance).
Finally, the validity of our theory is assessed and numerical comparisons to a state-of-the-art unfolding network are made, on synthetic and real-world datasets.
In recent years, considerable advancements have been made in the area of Generative Adversarial Networks (GANs), particularly with the advent of style-based architectures that address many key shortcomings - both in terms of modeling capabilities and network interpretability.
Recent advances in the understanding of Generative Adversarial Networks (GANs) have led to remarkable progress in visual editing and synthesis tasks, capitalizing on the rich semantics that are embedded in the latent spaces of pre-trained GANs.
no code implementations • 14 Feb 2022 • Xiaoxi Wei, A. Aldo Faisal, Moritz Grosse-Wentrup, Alexandre Gramfort, Sylvain Chevallier, Vinay Jayaram, Camille Jeunet, Stylianos Bakas, Siegfried Ludwig, Konstantinos Barmpas, Mehdi Bahri, Yannis Panagakis, Nikolaos Laskaris, Dimitrios A. Adamos, Stefanos Zafeiriou, William C. Duong, Stephen M. Gordon, Vernon J. Lawhern, Maciej Śliwowski, Vincent Rouanne, Piotr Tempczyk
Task 2 is centred on Brain-Computer Interfacing (BCI), addressing motor imagery decoding across both subjects and data sets.
The second task required to transfer models trained on the subjects of one or more source motor imagery datasets to perform inference on two target datasets, providing a small set of personalized calibration data for multiple test subjects.
The results highlight the ability of our approach to condition image generation on attributes like gender, pose and hair style on faces, as well as a variety of features on different object classes.
This paper addresses the problem of finding interpretable directions in the latent space of pre-trained Generative Adversarial Networks (GANs) to facilitate controllable image synthesis.
We propose defensive tensorization, an adversarial defence technique that leverages a latent high-order factorization of the network.
Patterns of brain activity are associated with different brain processes and can be used to identify different brain states and make behavioral predictions.
Tensors, or multidimensional arrays, are data structures that can naturally represent visual data of multiple dimensions.
The efficacy of the proposed models is evaluated on standard image and audio classification benchmarks.
Ranked #2 on Audio Classification on Speech Commands
We exhibit how CoPE can be trivially augmented to accept an arbitrary number of input variables.
The conditional variable can be discrete (e. g., a class label) or continuous (e. g., an input image) resulting into class-conditional (image) generation and image-to-image translation models, respectively.
Deep generative models rely on their inductive bias to facilitate generalization, especially for problems with high dimensional data, like images.
We introduce three tensor decompositions that significantly reduce the number of parameters and show how they can be efficiently implemented by hierarchical neural networks.
Ranked #1 on Face Recognition on CALFW
By evaluating on several age-annotated datasets in both single- and cross-database experiments, we show that the proposed method outperforms state-of-the-art algorithms for age transfer, especially in the case of age groups that lie in the tails of the label distribution.
In this work, we investigate the demographic bias of deep learning models in face recognition, age estimation, gender recognition and kinship verification.
Deep Convolutional Neural Networks (DCNNs) is currently the method of choice both for generative, as well as for discriminative learning in computer vision and machine learning.
Ranked #1 on Graph Representation Learning on COMA
Speech-driven facial animation involves using a speech signal to generate realistic videos of talking faces.
As deep neural networks become widely adopted for solving most problems in computer vision and audio-understanding, there are rising concerns about their potential vulnerability.
Generative Adversarial Networks (GANs) have become the gold standard when it comes to learning generative models for high-dimensional distributions.
To alleviate this, one approach is to apply low-rank tensor decompositions to convolution kernels in order to compress the network and reduce its number of parameters.
Recently, a multitude of methods for image-to-image translation have demonstrated impressive results on problems such as multi-domain or multi-attribute transfer.
CNNs achieve remarkable performance by leveraging deep, over-parametrized architectures, trained on large datasets.
no code implementations • 9 Jan 2019 • Jean Kossaifi, Robert Walecki, Yannis Panagakis, Jie Shen, Maximilian Schmitt, Fabien Ringeval, Jing Han, Vedhas Pandit, Antoine Toisoul, Bjorn Schuller, Kam Star, Elnar Hajiyev, Maja Pantic
Natural human-computer interaction and audio-visual human behaviour sensing systems, which would achieve robust performance in-the-wild are more needed than ever as digital devices are increasingly becoming an indispensable part of our life.
Computational facial models that capture properties of facial cues related to aging and kinship increasingly attract the attention of the research community, enabling the development of reliable methods for age progression, age estimation, age-invariant facial characterization, and kinship verification from visual data.
In addition, the state-of-the-art data-driven methods demand a vast amount of data, hence a standard engineering trick employed is artificial data augmentation for instance by adding into the data cropped and (affinely) transformed images.
Robust principal component analysis (RPCA) is a powerful method for learning low-rank feature representation of various visual data.
Dictionary learning and component analysis models are fundamental for learning compact representations that are relevant to a given task (feature extraction, dimensionality reduction, denoising, etc.).
In this paper, we propose a novel component analysis technique that is suitable for facial UV maps containing a considerable amount of missing information and outliers, while additionally, incorporates knowledge from various attributes (such as age and identity).
Deep generative models learned through adversarial training have become increasingly popular for their ability to generate naturalistic image textures.
Several factors contribute to the appearance of an object in a visual scene, including pose, illumination, and deformation, among others.
We revisit the problem of robust principal component analysis with features acting as prior side information.
Even though the CCA is a powerful tool, it has several drawbacks that render its application challenging for computer vision applications.
To extract these modes of variations from visual data, several supervised methods, such as the TensorFaces, that rely on multilinear (tensor) decomposition (e. g., Higher Order SVD) have been developed.
In this paper, we introduce a new robust decomposition of images by combining ideas from sparse dictionary learning and PCP.
Robust Principal Component Analysis (RPCA) aims at recovering a low-rank subspace from grossly corrupted high-dimensional (often visual) data and is a cornerstone in many machine learning and computer vision applications.
In this paper, we propose the first, to the best of our knowledge, "in-the-wild" 3DMM by combining a powerful statistical model of facial shape, which describes both identity and expression, with an "in-the-wild" texture model.
Ranked #3 on 3D Face Reconstruction on Florence (Average 3D Error metric)
Networks have been a general tool for representing, analyzing, and modeling relational data arising in several domains.
The proposed method is assessed in frontal face reconstruction, face landmark localization, pose-invariant face recognition, and face verification in unconstrained conditions.
In this paper we propose a method to automatically recover a class specific low dimensional spherical harmonic basis from a set of in-the-wild facial images.
The proposed method is assessed in frontal face reconstruction (pose correction), face landmark localization, and pose-invariant face recognition and verification by conducting experiments on $6$ facial images databases.
Next, to correct the fittings of a generic model, image congealing (i. e., batch image aliment) is performed by employing only the learnt orthonormal subspace.
The superiority of the proposed method against the state-of-the-art time alignment methods, namely the canonical time warping and the generalized time warping, is indicated by the experimental results on both synthetic and real datasets.