Search Results for author: Brais Martinez

Found 36 papers, 13 papers with code

You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation

no code implementations • 30 Jan 2024 • Mehdi Noroozi, Isma Hadji, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

We show that the combination of spatially distilled U-Net and fine-tuned decoder outperforms state-of-the-art methods requiring 200 steps with only one single step.

Image Super-Resolution

Paper
Add Code

Graph Guided Question Answer Generation for Procedural Question-Answering

no code implementations • 24 Jan 2024 • Hai X. Pham, Isma Hadji, Xinnuo Xu, Ziedune Degutyte, Jay Rainey, Evangelos Kazakos, Afsaneh Fazly, Georgios Tzimiropoulos, Brais Martinez

The key technological enabler is a novel mechanism for automatic question-answer generation from procedural text which can ingest large amounts of textual instructions and produce exhaustive in-domain QA training data.

Answer Generation Question-Answer-Generation +1

Paper
Add Code

SimDETR: Simplifying self-supervised pretraining for DETR

no code implementations • 28 Jul 2023 • Ioannis Maniadis Metaxas, Adrian Bulat, Ioannis Patras, Brais Martinez, Georgios Tzimiropoulos

DETR-based object detectors have achieved remarkable performance but are sample-inefficient and exhibit slow convergence.

Few-Shot Object Detection Object +2

Paper
Add Code

Black Box Few-Shot Adaptation for Vision-Language models

1 code implementation • ICCV 2023 • Yassine Ouali, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

Vision-Language (V-L) models trained with contrastive learning to align the visual and language modalities have been shown to be strong few-shot learners.

Contrastive Learning Re-Ranking

Paper
Code

ReGen: A good Generative Zero-Shot Video Classifier Should be Rewarded

no code implementations • ICCV 2023 • Adrian Bulat, Enrique Sanchez, Brais Martinez, Georgios Tzimiropoulos

Specifically, we propose ReGen, a novel reinforcement learning based framework with a three-fold objective and reward functions: (1) a class-level discrimination reward that enforces the generated caption to be correctly classified into the corresponding action class, (2) a CLIP reward that encourages the generated caption to continue to be descriptive of the input video (i. e. video-specific), and (3) a grammar reward that preserves the grammatical correctness of the caption.

Action Classification Action Recognition +4

Paper
Add Code

Graph2Vid: Flow graph to Video Grounding for Weakly-supervised Multi-Step Localization

1 code implementation • 10 Oct 2022 • Nikita Dvornik, Isma Hadji, Hai Pham, Dhaivat Bhatt, Brais Martinez, Afsaneh Fazly, Allan D. Jepson

In this setup, we seek the optimal step ordering consistent with the procedure flow graph and a given video.

Video Grounding

Paper
Code

FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training

no code implementations • ICCV 2023 • Adrian Bulat, Ricardo Guerrero, Brais Martinez, Georgios Tzimiropoulos

Importantly, we show that our system is not only more flexible than existing methods, but also, it makes a step towards satisfying desideratum (c).

Few-Shot Object Detection object-detection +1

Paper
Add Code

Effective Self-supervised Pre-training on Low-compute Networks without Distillation

1 code implementation • 6 Oct 2022 • Fuwen Tan, Fatemeh Saleh, Brais Martinez

This hints at view sampling being one of the performance bottlenecks for SSL on low-capacity networks.

Attribute Instance Segmentation +5

Paper
Code

Bayesian Prompt Learning for Image-Language Model Generalization

1 code implementation • ICCV 2023 • Mohammad Mahdi Derakhshani, Enrique Sanchez, Adrian Bulat, Victor Guilherme Turrisi da Costa, Cees G. M. Snoek, Georgios Tzimiropoulos, Brais Martinez

Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts.

Ranked #1 on Few-Shot Learning on food101

Few-Shot Learning Language Modelling +3

Paper
Code

REST: REtrieve & Self-Train for generative action recognition

no code implementations • 29 Sep 2022 • Adrian Bulat, Enrique Sanchez, Brais Martinez, Georgios Tzimiropoulos

We evaluate REST on the problem of zero-shot action recognition where we show that our approach is very competitive when compared to contrastive learning-based methods.

Action Recognition Caption Generation +5

Paper
Add Code

Efficient Attention-free Video Shift Transformers

no code implementations • 23 Aug 2022 • Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

To address this gap, in this paper, we make the following contributions: (a) we construct a highly efficient \& accurate attention-free block based on the shift operator, coined Affine-Shift block, specifically designed to approximate as closely as possible the operations in the MHSA block of a Transformer layer.

Action Recognition Video Recognition

Paper
Add Code

iBoot: Image-bootstrapped Self-Supervised Video Representation Learning

no code implementations • 16 Jun 2022 • Fatemeh Saleh, Fuwen Tan, Adrian Bulat, Georgios Tzimiropoulos, Brais Martinez

Video self-supervised learning (SSL) suffers from added challenges: video datasets are typically not as large as image datasets, compute is an order of magnitude larger, and the amount of spurious patterns the optimizer has to sieve through is multiplied several fold.

Data Augmentation Representation Learning +1

Paper
Add Code

Knowledge Distillation Meets Open-Set Semi-Supervised Learning

1 code implementation • 13 May 2022 • Jing Yang, Xiatian Zhu, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

The key idea is that we leverage the teacher's classifier as a semantic critic for evaluating the representations of both teacher and student and distilling the semantic knowledge with high-order structured information over all feature dimensions.

Face Recognition Knowledge Distillation

Paper
Code

EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers

1 code implementation • 6 May 2022 • Junting Pan, Adrian Bulat, Fuwen Tan, Xiatian Zhu, Lukasz Dudziak, Hongsheng Li, Georgios Tzimiropoulos, Brais Martinez

In this work, pushing further along this under-studied direction we introduce EdgeViTs, a new family of light-weight ViTs that, for the first time, enable attention-based vision models to compete with the best light-weight CNNs in the tradeoff between accuracy and on-device efficiency.

Paper
Code

SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition

no code implementations • 10 Apr 2022 • Victor Escorcia, Ricardo Guerrero, Xiatian Zhu, Brais Martinez

To overcome both limitations, we introduce Self-Supervised Learning Over Sets (SOS), an approach to pre-train a generic Objects In Contact (OIC) representation model from video object regions detected by an off-the-shelf hand-object contact detector.

Action Recognition Object +2

Paper
Add Code

Low-Fidelity Video Encoder Optimization for Temporal Action Localization

no code implementations • NeurIPS 2021 • Mengmeng Xu, Juan Manuel Perez Rua, Xiatian Zhu, Bernard Ghanem, Brais Martinez

This results in a task discrepancy problem for the video encoder – trained for action classification, but used for TAL.

Ranked #9 on Temporal Action Localization on HACS

Action Classification Optical Flow Estimation +2

Paper
Add Code

SAIC_Cambridge-HuPBA-FBK Submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021

no code implementations • 6 Oct 2021 • Swathikiran Sudhakaran, Adrian Bulat, Juan-Manuel Perez-Rua, Alex Falcon, Sergio Escalera, Oswald Lanz, Brais Martinez, Georgios Tzimiropoulos

This report presents the technical details of our submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021.

Action Recognition Temporal Action Localization

Paper
Add Code

Space-time Mixing Attention for Video Transformer

1 code implementation • NeurIPS 2021 • Adrian Bulat, Juan-Manuel Perez-Rua, Swathikiran Sudhakaran, Brais Martinez, Georgios Tzimiropoulos

In this work, we propose a Video Transformer model the complexity of which scales linearly with the number of frames in the video sequence and hence induces no overhead compared to an image-based Transformer model.

Ranked #32 on Action Classification on Kinetics-600

Action Classification Action Recognition In Videos +1

Paper
Code

Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization

no code implementations • 28 Mar 2021 • Mengmeng Xu, Juan-Manuel Perez-Rua, Xiatian Zhu, Bernard Ghanem, Brais Martinez

This results in a task discrepancy problem for the video encoder -- trained for action classification, but used for TAL.

Action Classification Model Optimization +3

Paper
Add Code

Few-shot Action Recognition with Prototype-centered Attentive Learning

1 code implementation • 20 Jan 2021 • Xiatian Zhu, Antoine Toisoul, Juan-Manuel Perez-Rua, Li Zhang, Brais Martinez, Tao Xiang

Extensive experiments on four standard few-shot action benchmarks show that our method clearly outperforms previous state-of-the-art methods, with the improvement particularly significant (10+\%) on the most challenging fine-grained action recognition benchmark.

Contrastive Learning Few-Shot action recognition +3

Paper
Code

Knowledge distillation via softmax regression representation learning

no code implementations • ICLR 2021 • Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

We advocate for a method that optimizes the output feature of the penultimate layer of the student network and hence is directly related to representation learning.

Knowledge Distillation Model Compression +2

Paper
Add Code

Boundary-sensitive Pre-training for Temporal Localization in Videos

1 code implementation • ICCV 2021 • Mengmeng Xu, Juan-Manuel Perez-Rua, Victor Escorcia, Brais Martinez, Xiatian Zhu, Li Zhang, Bernard Ghanem, Tao Xiang

However, most existing models developed for these tasks are pre-trained on general video action classification tasks.

Ranked #23 on Temporal Action Localization on ActivityNet-1.3

Action Classification Classification +3

Paper
Code

High-Capacity Expert Binary Networks

1 code implementation • ICLR 2021 • Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

Network binarization is a promising hardware-aware direction for creating efficient deep models.

Ranked #2 on Classification with Binary Neural Network on ImageNet

Binarization Classification with Binary Neural Network +1

Paper
Code

Towards Practical Lipreading with Distilled and Efficient Models

1 code implementation • 13 Jul 2020 • Pingchuan Ma, Brais Martinez, Stavros Petridis, Maja Pantic

However, our most promising lightweight models are on par with the current state-of-the-art while showing a reduction of 8. 2x and 3. 9x in terms of computational cost and number of parameters, respectively, which we hope will enable the deployment of lipreading models in practical applications.

Ranked #4 on Lipreading on Lip Reading in the Wild

Knowledge Distillation Lipreading

362

Paper
Code

Egocentric Action Recognition by Video Attention and Temporal Context

no code implementations • 3 Jul 2020 • Juan-Manuel Perez-Rua, Antoine Toisoul, Brais Martinez, Victor Escorcia, Li Zhang, Xiatian Zhu, Tao Xiang

In this challenge, action recognition is posed as the problem of simultaneously predicting a single `verb' and `noun' class label given an input trimmed video clip.

Action Recognition

Paper
Add Code

Knowing What, Where and When to Look: Efficient Video Action Modeling with Attention

no code implementations • 2 Apr 2020 • Juan-Manuel Perez-Rua, Brais Martinez, Xiatian Zhu, Antoine Toisoul, Victor Escorcia, Tao Xiang

Departing from existing alternatives, our W3 module models all three facets of video attention jointly.

Ranked #1 on Action Recognition on EgoGesture

Action Recognition

Paper
Add Code

Training Binary Neural Networks with Real-to-Binary Convolutions

1 code implementation • ICLR 2020 • Brais Martinez, Jing Yang, Adrian Bulat, Georgios Tzimiropoulos

This paper shows how to train binary networks to within a few percent points ($\sim 3-5 \%$) of the full precision counterpart.

Ranked #2 on Classification with Binary Neural Network on CIFAR-100

Binarization Classification with Binary Neural Network

Paper
Code

Knowledge distillation via adaptive instance normalization

no code implementations • 9 Mar 2020 • Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

To this end, we propose a new knowledge distillation method based on transferring feature statistics, specifically the channel-wise mean and variance, from the teacher to the student.

Knowledge Distillation Model Compression

Paper
Add Code

BATS: Binary ArchitecTure Search

no code implementations • ECCV 2020 • Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

We show that directly applying NAS to the binary domain provides very poor results.

Ranked #1 on Classification with Binary Neural Network on CIFAR-100

Binarization Classification with Binary Neural Network +1

Paper
Add Code

Lipreading using Temporal Convolutional Networks

2 code implementations • 23 Jan 2020 • Brais Martinez, Pingchuan Ma, Stavros Petridis, Maja Pantic

We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively.

Ranked #7 on Lipreading on CAS-VSR-W1k (LRW-1000)

Lipreading Lip Reading

362

Paper
Code

Action recognition with spatial-temporal discriminative filter banks

no code implementations • ICCV 2019 • Brais Martinez, Davide Modolo, Yuanjun Xiong, Joseph Tighe

In this work we focus on how to improve the representation capacity of the network, but rather than altering the backbone, we focus on improving the last layers of the network, where changes have low impact in terms of computational cost.

Ranked #36 on Action Recognition on Something-Something V1 (using extra training data)

Action Classification Action Recognition +1

Paper
Add Code

Fusing Deep Learned and Hand-Crafted Features of Appearance, Shape, and Dynamics for Automatic Pain Estimation

no code implementations • 17 Jan 2017 • Joy Egede, Michel Valstar, Brais Martinez

Automatic continuous time, continuous value assessment of a patient's pain from face video is highly sought after by the medical profession.

Time Series Time Series Analysis

Paper
Add Code

A Functional Regression approach to Facial Landmark Tracking

no code implementations • 7 Dec 2016 • Enrique Sánchez-Lozano, Georgios Tzimiropoulos, Brais Martinez, Fernando de la Torre, Michel Valstar

This paper presents a Functional Regression solution to the least squares problem, which we coin Continuous Regression, resulting in the first real-time incremental face tracker.

Face Detection Incremental Learning +2

Paper
Add Code

Cascaded Continuous Regression for Real-time Incremental Face Tracking

no code implementations • 3 Aug 2016 • Enrique Sánchez-Lozano, Brais Martinez, Georgios Tzimiropoulos, Michel Valstar

We then derive the incremental learning updates for CCR (iCCR) and show that it is an order of magnitude faster than standard incremental learning for cascaded regression, bringing the time required for the update from seconds down to a fraction of a second, thus enabling real-time tracking.

Face Alignment Incremental Learning +2

Paper
Add Code

Learning to Transfer: Transferring Latent Task Structures and Its Application to Person-Specific Facial Action Unit Detection

no code implementations • ICCV 2015 • Timur Almaev, Brais Martinez, Michel Valstar

We thus consider a novel problem: all AU models for the target subject are to be learnt using person-specific annotated data for a reference AU (AU12 in our case), and no data or little data regarding the target AU.

Action Unit Detection Facial Action Unit Detection +1

Paper
Add Code

TRIC-track: Tracking by Regression With Incrementally Learned Cascades

no code implementations • ICCV 2015 • Xiaomeng Wang, Michel Valstar, Brais Martinez, Muhammad Haris Khan, Tony Pridmore

This paper proposes a novel approach to part-based tracking by replacing local matching of an appearance model by direct prediction of the displacement between local image patches and part locations.

Incremental Learning regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.