Search Results for author: Adrian Bulat

Found 45 papers, 19 papers with code

Human pose estimation via Convolutional Part Heatmap Regression

1 code implementation6 Sep 2016 Adrian Bulat, Georgios Tzimiropoulos

Our main contribution is a CNN cascaded architecture specifically designed for learning part relationships and spatial context, and robustly inferring pose even for the case of severe part occlusions.

Pose Estimation regression

Convolutional aggregation of local evidence for large pose face alignment

no code implementations British Machine Vision Conference 2016 Adrian Bulat, Georgios Tzimiropoulos

Besides playing the role of a graphical model, CNN regression is a key feature of our system, guiding the network to rely on context for predicting the location of occluded landmarks, typically encountered in very large poses.

Face Alignment Face Detection +1

Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources

3 code implementations ICCV 2017 Adrian Bulat, Georgios Tzimiropoulos

(d) We present results for experiments on the most challenging datasets for human pose estimation and face alignment, reporting in many cases state-of-the-art performance.

Binarization Face Alignment +1

How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)

8 code implementations ICCV 2017 Adrian Bulat, Georgios Tzimiropoulos

To this end, we make the following 5 contributions: (a) we construct, for the first time, a very strong baseline by combining a state-of-the-art architecture for landmark localization with a state-of-the-art residual block, train it on a very large yet synthetically expanded 2D facial landmark dataset and finally evaluate it on all other 2D facial landmark datasets.

3D Face Alignment Face Alignment +1

Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression

1 code implementation ICCV 2017 Aaron S. Jackson, Adrian Bulat, Vasileios Argyriou, Georgios Tzimiropoulos

Our CNN works with just a single 2D facial image, does not require accurate alignment nor establishes dense correspondence between images, works for arbitrary facial poses and expressions, and can be used to reconstruct the whole 3D facial geometry (including the non-visible parts of the face) bypassing the construction (during training) and fitting (during testing) of a 3D Morphable Model.

3D Face Reconstruction Face Alignment +1

Hierarchical binary CNNs for landmark localization with limited resources

1 code implementation14 Aug 2018 Adrian Bulat, Georgios Tzimiropoulos

To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tasks, namely human pose estimation and face alignment.

3D Face Alignment Binarization +2

Tensor Dropout for Robust Learning

no code implementations27 Feb 2019 Arinbjörn Kolbeinsson, Jean Kossaifi, Yannis Panagakis, Adrian Bulat, Anima Anandkumar, Ioanna Tzoulaki, Paul Matthews

CNNs achieve remarkable performance by leveraging deep, over-parametrized architectures, trained on large datasets.

Image Classification Inductive Bias

Improved training of binary networks for human pose estimation and image recognition

1 code implementation11 Apr 2019 Adrian Bulat, Georgios Tzimiropoulos, Jean Kossaifi, Maja Pantic

Big neural networks trained on large datasets have advanced the state-of-the-art for a large variety of challenging problems, improving performance by a large margin.

Binarization Classification with Binary Neural Network +4

Incremental multi-domain learning with network latent tensor factorization

no code implementations12 Apr 2019 Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic

Adapting the learned classification to new domains is a hard problem due to at least three reasons: (1) the new domains and the tasks might be drastically different; (2) there might be very limited amount of annotated data on the new domain and (3) full training of a new model for each new task is prohibitive in terms of computation and memory, due to the sheer number of parameters of deep CNNs.

General Classification Image Classification +2

Factorized Higher-Order CNNs with an Application to Spatio-Temporal Emotion Estimation

no code implementations CVPR 2020 Jean Kossaifi, Antoine Toisoul, Adrian Bulat, Yannis Panagakis, Timothy Hospedales, Maja Pantic

To alleviate this, one approach is to apply low-rank tensor decompositions to convolution kernels in order to compress the network and reduce its number of parameters.

Emotion Recognition Image Classification

Defensive Tensorization: Randomized Tensor Parametrization for Robust Neural Networks

no code implementations25 Sep 2019 Adrian Bulat, Jean Kossaifi, Sourav Bhattacharya, Yannis Panagakis, Georgios Tzimiropoulos, Nicholas D. Lane, Maja Pantic

As deep neural networks become widely adopted for solving most problems in computer vision and audio-understanding, there are rising concerns about their potential vulnerability.

Adversarial Defense Audio Classification +1

XNOR-Net++: Improved Binary Neural Networks

1 code implementation30 Sep 2019 Adrian Bulat, Georgios Tzimiropoulos

This paper proposes an improved training algorithm for binary neural networks in which both weights and activations are binary numbers.

Binarization Classification with Binary Neural Network +3

Towards Pose-invariant Lip-Reading

no code implementations14 Nov 2019 Shiyang Cheng, Pingchuan Ma, Georgios Tzimiropoulos, Stavros Petridis, Adrian Bulat, Jie Shen, Maja Pantic

The proposed model significantly outperforms previous approaches on non-frontal views while retaining the superior performance on frontal and near frontal mouth views.

Lip Reading

Toward fast and accurate human pose estimation via soft-gated skip connections

3 code implementations25 Feb 2020 Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic

In addition, with a reduction of 3x in model size and complexity, we show no decrease in performance when compared to the original HourGlass network.

Pose Estimation

Knowledge distillation via adaptive instance normalization

no code implementations9 Mar 2020 Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

To this end, we propose a new knowledge distillation method based on transferring feature statistics, specifically the channel-wise mean and variance, from the teacher to the student.

Knowledge Distillation Model Compression

A Transfer Learning approach to Heatmap Regression for Action Unit intensity estimation

no code implementations14 Apr 2020 Ioanna Ntinou, Enrique Sanchez, Adrian Bulat, Michel Valstar, Georgios Tzimiropoulos

Action Units (AUs) are geometrically-based atomic facial muscle movements known to produce appearance changes at specific facial locations.

Face Alignment regression +1

Semi-supervised Facial Action Unit Intensity Estimation with Contrastive Learning

no code implementations3 Nov 2020 Enrique Sanchez, Adrian Bulat, Anestis Zaganidis, Georgios Tzimiropoulos

The second stage uses another dataset of randomly chosen labeled frames to train a regressor on top of our spatio-temporal model for estimating the AU intensity.

Contrastive Learning Unsupervised Pre-training

Knowledge distillation via softmax regression representation learning

no code implementations ICLR 2021 Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

We advocate for a method that optimizes the output feature of the penultimate layer of the student network and hence is directly related to representation learning.

Knowledge Distillation Model Compression +2

Improving memory banks for unsupervised learning with large mini-batch, consistency and hard negative mining

no code implementations8 Feb 2021 Adrian Bulat, Enrique Sánchez-Lozano, Georgios Tzimiropoulos

An important component of unsupervised learning by instance-based discrimination is a memory bank for storing a feature representation for each training sample in the dataset.

Pre-training strategies and datasets for facial representation learning

2 code implementations30 Mar 2021 Adrian Bulat, Shiyang Cheng, Jing Yang, Andrew Garbett, Enrique Sanchez, Georgios Tzimiropoulos

Recent work on Deep Learning in the area of face analysis has focused on supervised learning for specific tasks of interest (e. g. face recognition, facial landmark localization etc.)

3D Face Reconstruction 3D Facial Landmark Localization +11

Bit-Mixer: Mixed-precision networks with runtime bit-width selection

no code implementations ICCV 2021 Adrian Bulat, Georgios Tzimiropoulos

In this work, we propose Bit-Mixer, the very first method to train a meta-quantized network where during test time any layer can change its bid-width without affecting at all the overall network's ability for highly accurate inference.

AutoML Binarization +1

Space-time Mixing Attention for Video Transformer

1 code implementation NeurIPS 2021 Adrian Bulat, Juan-Manuel Perez-Rua, Swathikiran Sudhakaran, Brais Martinez, Georgios Tzimiropoulos

In this work, we propose a Video Transformer model the complexity of which scales linearly with the number of frames in the video sequence and hence induces no overhead compared to an image-based Transformer model.

Action Classification Action Recognition In Videos +1

Defensive Tensorization

no code implementations26 Oct 2021 Adrian Bulat, Jean Kossaifi, Sourav Bhattacharya, Yannis Panagakis, Timothy Hospedales, Georgios Tzimiropoulos, Nicholas D Lane, Maja Pantic

We propose defensive tensorization, an adversarial defence technique that leverages a latent high-order factorization of the network.

Audio Classification Image Classification

Subpixel Heatmap Regression for Facial Landmark Localization

no code implementations3 Nov 2021 Adrian Bulat, Enrique Sanchez, Georgios Tzimiropoulos

Deep Learning models based on heatmap regression have revolutionized the task of facial landmark localization with existing models working robustly under large poses, non-uniform illumination and shadows, occlusions and self-occlusions, low resolution and blur.

 Ranked #1 on Face Alignment on WFW (Extra Data) (using extra training data)

Face Alignment regression

EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers

1 code implementation6 May 2022 Junting Pan, Adrian Bulat, Fuwen Tan, Xiatian Zhu, Lukasz Dudziak, Hongsheng Li, Georgios Tzimiropoulos, Brais Martinez

In this work, pushing further along this under-studied direction we introduce EdgeViTs, a new family of light-weight ViTs that, for the first time, enable attention-based vision models to compete with the best light-weight CNNs in the tradeoff between accuracy and on-device efficiency.

Knowledge Distillation Meets Open-Set Semi-Supervised Learning

1 code implementation13 May 2022 Jing Yang, Xiatian Zhu, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

The key idea is that we leverage the teacher's classifier as a semantic critic for evaluating the representations of both teacher and student and distilling the semantic knowledge with high-order structured information over all feature dimensions.

Face Recognition Knowledge Distillation

iBoot: Image-bootstrapped Self-Supervised Video Representation Learning

no code implementations16 Jun 2022 Fatemeh Saleh, Fuwen Tan, Adrian Bulat, Georgios Tzimiropoulos, Brais Martinez

Video self-supervised learning (SSL) suffers from added challenges: video datasets are typically not as large as image datasets, compute is an order of magnitude larger, and the amount of spurious patterns the optimizer has to sieve through is multiplied several fold.

Data Augmentation Representation Learning +1

Efficient Attention-free Video Shift Transformers

no code implementations23 Aug 2022 Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

To address this gap, in this paper, we make the following contributions: (a) we construct a highly efficient \& accurate attention-free block based on the shift operator, coined Affine-Shift block, specifically designed to approximate as closely as possible the operations in the MHSA block of a Transformer layer.

Action Recognition Video Recognition

REST: REtrieve & Self-Train for generative action recognition

no code implementations29 Sep 2022 Adrian Bulat, Enrique Sanchez, Brais Martinez, Georgios Tzimiropoulos

We evaluate REST on the problem of zero-shot action recognition where we show that our approach is very competitive when compared to contrastive learning-based methods.

Action Recognition Contrastive Learning +4

LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models

1 code implementation CVPR 2023 Adrian Bulat, Georgios Tzimiropoulos

Through evaluations on 11 datasets, we show that our approach (a) significantly outperforms all prior works on soft prompting, and (b) matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets.

Few-Shot Learning Language Modelling +3

FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training

no code implementations ICCV 2023 Adrian Bulat, Ricardo Guerrero, Brais Martinez, Georgios Tzimiropoulos

Importantly, we show that our system is not only more flexible than existing methods, but also, it makes a step towards satisfying desideratum (c).

Few-Shot Object Detection object-detection +1

ReGen: A good Generative Zero-Shot Video Classifier Should be Rewarded

no code implementations ICCV 2023 Adrian Bulat, Enrique Sanchez, Brais Martinez, Georgios Tzimiropoulos

Specifically, we propose ReGen, a novel reinforcement learning based framework with a three-fold objective and reward functions: (1) a class-level discrimination reward that enforces the generated caption to be correctly classified into the corresponding action class, (2) a CLIP reward that encourages the generated caption to continue to be descriptive of the input video (i. e. video-specific), and (3) a grammar reward that preserves the grammatical correctness of the caption.

Action Classification Action Recognition +4

Black Box Few-Shot Adaptation for Vision-Language models

1 code implementation ICCV 2023 Yassine Ouali, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

Vision-Language (V-L) models trained with contrastive learning to align the visual and language modalities have been shown to be strong few-shot learners.

Contrastive Learning Re-Ranking

SimDETR: Simplifying self-supervised pretraining for DETR

no code implementations28 Jul 2023 Ioannis Maniadis Metaxas, Adrian Bulat, Ioannis Patras, Brais Martinez, Georgios Tzimiropoulos

DETR-based object detectors have achieved remarkable performance but are sample-inefficient and exhibit slow convergence.

Few-Shot Object Detection Object +2

You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation

no code implementations30 Jan 2024 Mehdi Noroozi, Isma Hadji, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

We show that the combination of spatially distilled U-Net and fine-tuned decoder outperforms state-of-the-art methods requiring 200 steps with only one single step.

Image Super-Resolution

Cannot find the paper you are looking for? You can Submit a new open access paper.