no code implementations • 30 Jan 2024 • Mehdi Noroozi, Isma Hadji, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos
We show that the combination of spatially distilled U-Net and fine-tuned decoder outperforms state-of-the-art methods requiring 200 steps with only one single step.
no code implementations • 24 Jan 2024 • Hai X. Pham, Isma Hadji, Xinnuo Xu, Ziedune Degutyte, Jay Rainey, Evangelos Kazakos, Afsaneh Fazly, Georgios Tzimiropoulos, Brais Martinez
The key technological enabler is a novel mechanism for automatic question-answer generation from procedural text which can ingest large amounts of textual instructions and produce exhaustive in-domain QA training data.
no code implementations • 28 Jul 2023 • Ioannis Maniadis Metaxas, Adrian Bulat, Ioannis Patras, Brais Martinez, Georgios Tzimiropoulos
DETR-based object detectors have achieved remarkable performance but are sample-inefficient and exhibit slow convergence.
1 code implementation • ICCV 2023 • Yassine Ouali, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos
Vision-Language (V-L) models trained with contrastive learning to align the visual and language modalities have been shown to be strong few-shot learners.
no code implementations • ICCV 2023 • Adrian Bulat, Enrique Sanchez, Brais Martinez, Georgios Tzimiropoulos
Specifically, we propose ReGen, a novel reinforcement learning based framework with a three-fold objective and reward functions: (1) a class-level discrimination reward that enforces the generated caption to be correctly classified into the corresponding action class, (2) a CLIP reward that encourages the generated caption to continue to be descriptive of the input video (i. e. video-specific), and (3) a grammar reward that preserves the grammatical correctness of the caption.
1 code implementation • 10 Oct 2022 • Nikita Dvornik, Isma Hadji, Hai Pham, Dhaivat Bhatt, Brais Martinez, Afsaneh Fazly, Allan D. Jepson
In this setup, we seek the optimal step ordering consistent with the procedure flow graph and a given video.
no code implementations • ICCV 2023 • Adrian Bulat, Ricardo Guerrero, Brais Martinez, Georgios Tzimiropoulos
Importantly, we show that our system is not only more flexible than existing methods, but also, it makes a step towards satisfying desideratum (c).
1 code implementation • 6 Oct 2022 • Fuwen Tan, Fatemeh Saleh, Brais Martinez
This hints at view sampling being one of the performance bottlenecks for SSL on low-capacity networks.
1 code implementation • ICCV 2023 • Mohammad Mahdi Derakhshani, Enrique Sanchez, Adrian Bulat, Victor Guilherme Turrisi da Costa, Cees G. M. Snoek, Georgios Tzimiropoulos, Brais Martinez
Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts.
Ranked #1 on Few-Shot Learning on food101
no code implementations • 29 Sep 2022 • Adrian Bulat, Enrique Sanchez, Brais Martinez, Georgios Tzimiropoulos
We evaluate REST on the problem of zero-shot action recognition where we show that our approach is very competitive when compared to contrastive learning-based methods.
no code implementations • 23 Aug 2022 • Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos
To address this gap, in this paper, we make the following contributions: (a) we construct a highly efficient \& accurate attention-free block based on the shift operator, coined Affine-Shift block, specifically designed to approximate as closely as possible the operations in the MHSA block of a Transformer layer.
no code implementations • 16 Jun 2022 • Fatemeh Saleh, Fuwen Tan, Adrian Bulat, Georgios Tzimiropoulos, Brais Martinez
Video self-supervised learning (SSL) suffers from added challenges: video datasets are typically not as large as image datasets, compute is an order of magnitude larger, and the amount of spurious patterns the optimizer has to sieve through is multiplied several fold.
1 code implementation • 13 May 2022 • Jing Yang, Xiatian Zhu, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos
The key idea is that we leverage the teacher's classifier as a semantic critic for evaluating the representations of both teacher and student and distilling the semantic knowledge with high-order structured information over all feature dimensions.
1 code implementation • 6 May 2022 • Junting Pan, Adrian Bulat, Fuwen Tan, Xiatian Zhu, Lukasz Dudziak, Hongsheng Li, Georgios Tzimiropoulos, Brais Martinez
In this work, pushing further along this under-studied direction we introduce EdgeViTs, a new family of light-weight ViTs that, for the first time, enable attention-based vision models to compete with the best light-weight CNNs in the tradeoff between accuracy and on-device efficiency.
no code implementations • 10 Apr 2022 • Victor Escorcia, Ricardo Guerrero, Xiatian Zhu, Brais Martinez
To overcome both limitations, we introduce Self-Supervised Learning Over Sets (SOS), an approach to pre-train a generic Objects In Contact (OIC) representation model from video object regions detected by an off-the-shelf hand-object contact detector.
no code implementations • NeurIPS 2021 • Mengmeng Xu, Juan Manuel Perez Rua, Xiatian Zhu, Bernard Ghanem, Brais Martinez
This results in a task discrepancy problem for the video encoder – trained for action classification, but used for TAL.
Ranked #9 on Temporal Action Localization on HACS
no code implementations • 6 Oct 2021 • Swathikiran Sudhakaran, Adrian Bulat, Juan-Manuel Perez-Rua, Alex Falcon, Sergio Escalera, Oswald Lanz, Brais Martinez, Georgios Tzimiropoulos
This report presents the technical details of our submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021.
1 code implementation • NeurIPS 2021 • Adrian Bulat, Juan-Manuel Perez-Rua, Swathikiran Sudhakaran, Brais Martinez, Georgios Tzimiropoulos
In this work, we propose a Video Transformer model the complexity of which scales linearly with the number of frames in the video sequence and hence induces no overhead compared to an image-based Transformer model.
Ranked #32 on Action Classification on Kinetics-600
no code implementations • 28 Mar 2021 • Mengmeng Xu, Juan-Manuel Perez-Rua, Xiatian Zhu, Bernard Ghanem, Brais Martinez
This results in a task discrepancy problem for the video encoder -- trained for action classification, but used for TAL.
1 code implementation • 20 Jan 2021 • Xiatian Zhu, Antoine Toisoul, Juan-Manuel Perez-Rua, Li Zhang, Brais Martinez, Tao Xiang
Extensive experiments on four standard few-shot action benchmarks show that our method clearly outperforms previous state-of-the-art methods, with the improvement particularly significant (10+\%) on the most challenging fine-grained action recognition benchmark.
no code implementations • ICLR 2021 • Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos
We advocate for a method that optimizes the output feature of the penultimate layer of the student network and hence is directly related to representation learning.
1 code implementation • ICCV 2021 • Mengmeng Xu, Juan-Manuel Perez-Rua, Victor Escorcia, Brais Martinez, Xiatian Zhu, Li Zhang, Bernard Ghanem, Tao Xiang
However, most existing models developed for these tasks are pre-trained on general video action classification tasks.
Ranked #23 on Temporal Action Localization on ActivityNet-1.3
1 code implementation • ICLR 2021 • Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos
Network binarization is a promising hardware-aware direction for creating efficient deep models.
1 code implementation • 13 Jul 2020 • Pingchuan Ma, Brais Martinez, Stavros Petridis, Maja Pantic
However, our most promising lightweight models are on par with the current state-of-the-art while showing a reduction of 8. 2x and 3. 9x in terms of computational cost and number of parameters, respectively, which we hope will enable the deployment of lipreading models in practical applications.
Ranked #4 on Lipreading on Lip Reading in the Wild
no code implementations • 3 Jul 2020 • Juan-Manuel Perez-Rua, Antoine Toisoul, Brais Martinez, Victor Escorcia, Li Zhang, Xiatian Zhu, Tao Xiang
In this challenge, action recognition is posed as the problem of simultaneously predicting a single `verb' and `noun' class label given an input trimmed video clip.
no code implementations • 2 Apr 2020 • Juan-Manuel Perez-Rua, Brais Martinez, Xiatian Zhu, Antoine Toisoul, Victor Escorcia, Tao Xiang
Departing from existing alternatives, our W3 module models all three facets of video attention jointly.
Ranked #1 on Action Recognition on EgoGesture
1 code implementation • ICLR 2020 • Brais Martinez, Jing Yang, Adrian Bulat, Georgios Tzimiropoulos
This paper shows how to train binary networks to within a few percent points ($\sim 3-5 \%$) of the full precision counterpart.
no code implementations • 9 Mar 2020 • Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos
To this end, we propose a new knowledge distillation method based on transferring feature statistics, specifically the channel-wise mean and variance, from the teacher to the student.
no code implementations • ECCV 2020 • Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos
We show that directly applying NAS to the binary domain provides very poor results.
2 code implementations • 23 Jan 2020 • Brais Martinez, Pingchuan Ma, Stavros Petridis, Maja Pantic
We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively.
Ranked #7 on Lipreading on CAS-VSR-W1k (LRW-1000)
no code implementations • ICCV 2019 • Brais Martinez, Davide Modolo, Yuanjun Xiong, Joseph Tighe
In this work we focus on how to improve the representation capacity of the network, but rather than altering the backbone, we focus on improving the last layers of the network, where changes have low impact in terms of computational cost.
Ranked #36 on Action Recognition on Something-Something V1 (using extra training data)
no code implementations • 17 Jan 2017 • Joy Egede, Michel Valstar, Brais Martinez
Automatic continuous time, continuous value assessment of a patient's pain from face video is highly sought after by the medical profession.
no code implementations • 7 Dec 2016 • Enrique Sánchez-Lozano, Georgios Tzimiropoulos, Brais Martinez, Fernando de la Torre, Michel Valstar
This paper presents a Functional Regression solution to the least squares problem, which we coin Continuous Regression, resulting in the first real-time incremental face tracker.
no code implementations • 3 Aug 2016 • Enrique Sánchez-Lozano, Brais Martinez, Georgios Tzimiropoulos, Michel Valstar
We then derive the incremental learning updates for CCR (iCCR) and show that it is an order of magnitude faster than standard incremental learning for cascaded regression, bringing the time required for the update from seconds down to a fraction of a second, thus enabling real-time tracking.
no code implementations • ICCV 2015 • Timur Almaev, Brais Martinez, Michel Valstar
We thus consider a novel problem: all AU models for the target subject are to be learnt using person-specific annotated data for a reference AU (AU12 in our case), and no data or little data regarding the target AU.
no code implementations • ICCV 2015 • Xiaomeng Wang, Michel Valstar, Brais Martinez, Muhammad Haris Khan, Tony Pridmore
This paper proposes a novel approach to part-based tracking by replacing local matching of an appearance model by direct prediction of the displacement between local image patches and part locations.