no code implementations • 20 May 2025 • Zahraa Al Sahili, Ioannis Patras, Matthew Purver
Multilingual vision-language models promise universal image-text retrieval, yet their social biases remain under-explored.
no code implementations • CVPR 2025 • Yu Cao, Zengqun Zhao, Ioannis Patras, Shaogang Gong
Visual artifacts remain a persistent challenge in diffusion models, even with training on massive datasets.
1 code implementation • CVPR 2025 • Zengqun Zhao, Ziquan Liu, Yu Cao, Shaogang Gong, Ioannis Patras
Two key challenges are identified in this fine-tuning paradigm, 1) the low quality of synthetic data, which can still happen even with advanced generative models, and 2) the domain and bias gap between real and synthetic data.
no code implementations • 29 Jan 2025 • Mariano V. Ntrougkas, Vasileios Mezaris, Ioannis Patras
The adoption of Deep Neural Networks (DNNs) in critical fields where predictions need to be accompanied by justifications is hindered by their inherent black-box nature.
no code implementations • 22 Jan 2025 • Zahraa Al Sahili, Ioannis Patras, Matthew Purver
In the domain of text-to-image generative models, biases inherent in training datasets often propagate into generated content, posing significant ethical challenges, particularly in socially sensitive contexts.
1 code implementation • 23 Dec 2024 • Andreas Goulas, Vasileios Mezaris, Ioannis Patras
To address those shortcomings, in this paper, we introduce VidCtx, a novel training-free VideoQA framework which integrates both modalities, i. e. both visual information from input frames and textual descriptions of others frames that give the appropriate context.
Ranked #2 on
Zero-Shot Video Question Answer
on STAR Benchmark
no code implementations • 26 Nov 2024 • Omnia Alwazzan, Amaya Gallagher-Syed, Thomas O. Millner, Sebastian Brandner, Ioannis Patras, Silvia Marino, Gregory Slabaugh
In this paper, we propose the use of omic embeddings during early and late fusion to capture complementary information from local (patch-level) to global (slide-level) interactions, boosting performance through multimodal integration.
no code implementations • CVPR 2025 • Anxhelo Diko, Tinghuai Wang, Wassim Swaileh, Shiyan Sun, Ioannis Patras
We empirically demonstrate ReWind's superior performance in visual question answering (VQA) and temporal grounding tasks, surpassing previous methods on long video benchmarks.
2 code implementations • 27 Sep 2024 • Zhonglin Sun, Siyang Song, Ioannis Patras, Georgios Tzimiropoulos
Privacy issue is a main concern in developing face recognition techniques.
no code implementations • 26 Sep 2024 • Dimitrios Kollias, Chunchang Shao, Odysseus Kaloidas, Ioannis Patras
In this paper, we introduce Behavior4All, a comprehensive, open-source toolkit for in-the-wild facial behavior analysis, integrating Face Localization, Valence-Arousal Estimation, Basic Expression Recognition and Action Unit Detection, all within a single framework.
1 code implementation • 17 Sep 2024 • Debin Meng, Christos Tzelepis, Ioannis Patras, Georgios Tzimiropoulos
In this paper, we propose a practical framework - MM2Latent - for multimodal image generation and editing.
1 code implementation • 19 Aug 2024 • Chen Feng, Georgios Tzimiropoulos, Ioannis Patras
This has the advantage that the sample selection is decoupled from the in-training model and that the sample selection is aware of the semantic and visual similarities between the classes due to the way that CLIP is trained.
1 code implementation • 17 Aug 2024 • Dario Cioni, Christos Tzelepis, Lorenzo Seidenari, Ioannis Patras
The steady improvement of Diffusion Models for visual synthesis has given rise to many new and interesting use cases of synthetic images but also has raised concerns about their potential abuse, which poses significant societal threats.
no code implementations • 9 Aug 2024 • Zhaohan Zhang, Ziquan Liu, Ioannis Patras
To achieve a better trade-off between the effectiveness of TSM erasure and model utility in LLMs, our paper proposes a new framework based on Entropy Maximization with Selective Optimization (EMSO), where the updated weights are chosen with a novel contrastive gradient metric without any participation of additional model or data.
no code implementations • 23 Jul 2024 • Zahraa Al Sahili, Ioannis Patras, Matthew Purver
Multimodal machine learning (MML) is rapidly reshaping the way mental-health disorders are detected, characterized, and longitudinally monitored.
1 code implementation • 15 Jul 2024 • Ioannis Maniadis Metaxas, Georgios Tzimiropoulos, Ioannis Patras
We conduct extensive experiments to evaluate our approach and demonstrate that ExCB: a) achieves state-of-the-art results with significantly reduced resource requirements compared to previous works, b) is fully online, and therefore scalable to large datasets, and c) is stable and effective even with very small batch sizes.
no code implementations • 13 Jun 2024 • Zahraa Al Sahili, Ioannis Patras, Matthew Purver
In the domain of text-to-image generative models, biases inherent in training datasets often propagate into generated content, posing significant ethical challenges, particularly in socially sensitive contexts.
1 code implementation • 29 May 2024 • Zengqun Zhao, Yu Cao, Shaogang Gong, Ioannis Patras
Current facial expression recognition (FER) models are often designed in a supervised learning manner and thus are constrained by the lack of large-scale facial expression images with high-quality annotations.
no code implementations • 26 Apr 2024 • Abhishek Kumar Singh, Ioannis Patras
The rapid evolution of the fashion industry increasingly intersects with technological advancements, particularly through the integration of generative AI.
1 code implementation • 10 Apr 2024 • Alexandros Xenos, Niki Maria Foteinopoulou, Ioanna Ntinou, Ioannis Patras, Georgios Tzimiropoulos
In the first stage, we propose prompting VLLMs to generate descriptions in natural language of the subject's apparent emotion relative to the visual context.
Ranked #1 on
Emotion Recognition in Context
on EMOTIC
no code implementations • 25 Mar 2024 • Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos
To this end, in this paper we present DiffusionAct, a novel method that leverages the photo-realistic image generation of diffusion models to perform neural face reenactment.
2 code implementations • CVPR 2024 • Zhonglin Sun, Chen Feng, Ioannis Patras, Georgios Tzimiropoulos
This enables our method - namely LAndmark-based Facial Self-supervised learning LAFS), to learn key representation that is more critical for face recognition.
1 code implementation • 11 Mar 2024 • Omnia Alwazzan, Abbas Khan, Ioannis Patras, Gregory Slabaugh
We propose a novel Multi-modal Outer Arithmetic Block (MOAB) based on arithmetic operations to combine latent representations of the different modalities for predicting the tumor grade (Grade \rom{2}, \rom{3} and \rom{4}).
1 code implementation • 10 Mar 2024 • Omnia Alwazzan, Ioannis Patras, Gregory Slabaugh
Fusion of multimodal healthcare data holds great promise to provide a holistic view of a patient's health, taking advantage of the complementarity of different modalities while leveraging their correlation.
no code implementations • CVPR 2024 • Zheng Gao, Ioannis Patras
Recent efforts toward this goal are limited to treating each face image as a whole, i. e., learning consistent facial representations at the image-level, which overlooks the consistency of local facial representations (i. e., facial regions like eyes, nose, etc).
2 code implementations • 19 Feb 2024 • James Oldfield, Markos Georgopoulos, Grigorios G. Chrysos, Christos Tzelepis, Yannis Panagakis, Mihalis A. Nicolaou, Jiankang Deng, Ioannis Patras
The Mixture of Experts (MoE) paradigm provides a powerful way to decompose dense layers into smaller, modular computations often more amenable to human interpretation, debugging, and editability.
no code implementations • 5 Feb 2024 • Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos
Moreover, we show that by embedding real images in the GAN latent space, our method can be successfully used for the reenactment of real-world faces.
1 code implementation • 2 Nov 2023 • Moreno D'Incà, Christos Tzelepis, Ioannis Patras, Nicu Sebe
These paths are then applied to augment images to improve the fairness of a given dataset.
1 code implementation • 25 Oct 2023 • Niki Maria Foteinopoulou, Ioannis Patras
To test this, we evaluate using zero-shot classification of the model trained on sample-level descriptions on four popular dynamic FER datasets.
Ranked #1 on
Zero-Shot Facial Expression Recognition
on MAFW
no code implementations • 20 Oct 2023 • Alexandros Xenos, Themos Stafylakis, Ioannis Patras, Georgios Tzimiropoulos
This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA).
Ranked #5 on
Visual Question Answering (VQA)
on A-OKVQA
(DA VQA Score metric)
1 code implementation • 25 Aug 2023 • Zengqun Zhao, Ioannis Patras
For the visual part, based on the CLIP image encoder, a temporal model consisting of several Transformer encoders is introduced for extracting temporal facial expression features, and the final feature embedding is obtained as a learnable "class" token.
Dynamic Facial Expression Recognition
Facial Expression Recognition
+1
no code implementations • 25 Aug 2023 • Zheng Gao, Chen Feng, Ioannis Patras
Inspired by cross-modality learning, we extend this existing framework that only learns from global features by encouraging the global features and intermediate layer features to learn from each other.
no code implementations • 28 Jul 2023 • Ioannis Maniadis Metaxas, Adrian Bulat, Ioannis Patras, Brais Martinez, Georgios Tzimiropoulos
This results in complex pipelines and a task gap between the pretraining and the downstream task.
1 code implementation • ICCV 2023 • Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos
In this paper, we present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity, driven by a target facial pose.
2 code implementations • 23 May 2023 • James Oldfield, Christos Tzelepis, Yannis Panagakis, Mihalis A. Nicolaou, Ioannis Patras
Latent image representations arising from vision-language models have proved immensely useful for a variety of downstream tasks.
1 code implementation • 6 Apr 2023 • Giorgos Kordopatis-Zilos, Giorgos Tolias, Christos Tzelepis, Ioannis Kompatsiaris, Ioannis Patras, Symeon Papadopoulos
We introduce S$^2$VS, a video similarity learning approach with self-supervision.
Ranked #1 on
Video Retrieval
on FIVR-200K
1 code implementation • CVPR 2023 • Ioannis Maniadis Metaxas, Georgios Tzimiropoulos, Ioannis Patras
Clustering has been a major research topic in the field of machine learning, one to which Deep Learning has recently been applied with significant success.
1 code implementation • CVPR 2023 • Chen Feng, Ioannis Patras
More specifically, within the contrastive learning framework, for each sample our method generates soft-labels with the aid of coarse labels against other samples and another augmented view of the sample in question.
Ranked #1 on
Learning with coarse labels
on cifar100
1 code implementation • CVPR 2023 • Simone Barattin, Christos Tzelepis, Ioannis Patras, Nicu Sebe
By optimizing the latent codes directly, we ensure both that the identity is of a desired distance away from the original (with an identity obfuscation loss), whilst preserving the facial attributes (using a novel feature-matching loss in FaRL's deep feature space).
no code implementations • 1 Mar 2023 • Dimitrios Kollias, Andreas Psaroudakis, Anastasios Arsenos, Paraskevi Theofilou, Chunchang Shao, Guanyu Hu, Ioannis Patras
This paper presents MMA-MRNNet, a novel deep learning architecture for dynamic multi-output Facial Expression Intensity Estimation (FEIE) from video data.
1 code implementation • 21 Nov 2022 • Georgios Zoumpourlis, Ioannis Patras
The first loss applies curriculum learning, forcing each feature extractor to specialize to a subset of the training subjects and promoting feature diversity.
1 code implementation • 27 Sep 2022 • Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos
In this paper we address the problem of neural face reenactment, where, given a pair of a source and a target facial image, we need to transfer the target's pose (defined as the head pose and its facial expressions) to the source image, by preserving at the same time the source's identity characteristics (e. g., facial shape, hair style, etc), even in the challenging case where the source and the target faces belong to different identities.
1 code implementation • 22 Sep 2022 • Harsh Panwar, Ioannis Patras
Capsule Networks have shown tremendous advancement in the past decade, outperforming the traditional CNNs in various task due to it's equivariant properties.
1 code implementation • 22 Jul 2022 • Chen Feng, Ioannis Patras
Self-supervised learning has recently achieved great success in representation learning without human annotations.
1 code implementation • 12 Jul 2022 • Niki Maria Foteinopoulou, Ioannis Patras
In the case of affect recognition, we outperform previous vision-based methods in terms of CCC on both the OMG and the AMIGOS datasets.
Ranked #1 on
Continuous Affect Estimation
on AMIGOS
1 code implementation • ACM ICMR 2022 • Evlampios Apostolidis, Georgios Balaouras, Vasileios Mezaris, Ioannis Patras
Instead of simply modeling the frames' dependencies based on global attention, our method integrates a concentrated attention mechanism that is able to focus on non-overlapping blocks in the main diagonal of the attention matrix, and to enrich the existing information by extracting and exploiting knowledge about the uniqueness and diversity of the associated frames of the video.
Ranked #5 on
Unsupervised Video Summarization
on SumMe
1 code implementation • 5 Jun 2022 • Christos Tzelepis, James Oldfield, Georgios Tzimiropoulos, Ioannis Patras
This work addresses the problem of discovering non-linear interpretable paths in the latent space of pre-trained GANs in a model-agnostic manner.
1 code implementation • 31 May 2022 • James Oldfield, Christos Tzelepis, Yannis Panagakis, Mihalis A. Nicolaou, Ioannis Patras
Recent advances in the understanding of Generative Adversarial Networks (GANs) have led to remarkable progress in visual editing and synthesis tasks, capitalizing on the rich semantics that are embedded in the latent spaces of pre-trained GANs.
1 code implementation • IEEE International Symposium on Multimedia (ISM) 2021 • Evlampios Apostolidis, Georgios Balaouras, Vasileios Mezaris, Ioannis Patras
This paper presents a new method for supervised video summarization.
Ranked #1 on
Video Summarization
on SumMe
no code implementations • 23 Nov 2021 • James Oldfield, Markos Georgopoulos, Yannis Panagakis, Mihalis A. Nicolaou, Ioannis Patras
This paper addresses the problem of finding interpretable directions in the latent space of pre-trained Generative Adversarial Networks (GANs) to facilitate controllable image synthesis.
1 code implementation • 22 Nov 2021 • Chen Feng, Georgios Tzimiropoulos, Ioannis Patras
Under this setting, unlike previous methods that often introduce multiple assumptions and lead to complex solutions, we propose a simple, efficient and robust framework named Sample Selection and Relabelling(SSR), that with a minimal number of hyperparameters achieves SOTA results in various conditions.
Ranked #1 on
Image Classification
on CIFAR-10 (with noisy labels)
1 code implementation • ICCV 2021 • Christos Tzelepis, Georgios Tzimiropoulos, Ioannis Patras
This work addresses the problem of discovering, in an unsupervised manner, interpretable paths in the latent space of pretrained GANs, so as to provide an intuitive and easy way of controlling the underlying generative factors.
1 code implementation • 24 Jun 2021 • Giorgos Kordopatis-Zilos, Christos Tzelepis, Symeon Papadopoulos, Ioannis Kompatsiaris, Ioannis Patras
In this work, we propose a Knowledge Distillation framework, called Distill-and-Select (DnS), that starting from a well-performing fine-grained Teacher Network learns: a) Student Networks at different retrieval performance and computational efficiency trade-offs and b) a Selector Network that at test time rapidly directs samples to the appropriate student to maintain both high retrieval performance and high computational efficiency.
Ranked #2 on
Video Retrieval
on FIVR-200K
1 code implementation • 8 Jun 2021 • Ting-Ting Xie, Christos Tzelepis, Fan Fu, Ioannis Patras
Learning to localize actions in long, cluttered, and untrimmed videos is a hard task, that in the literature has typically been addressed assuming the availability of large amounts of annotated training samples for each class -- either in a fully-supervised setting, where action boundaries are known, or in a weakly-supervised setting, where only class labels are known for each video.
no code implementations • 8 Mar 2021 • Fan Fu, TingTing Xie, Ioannis Patras, Sepehr Jalali
Understanding interactions between objects in an image is an important element for generating captions.
2 code implementations • 11 Feb 2021 • Christos Tzelepis, Ioannis Patras
In this technical report we study the problem of propagation of uncertainty (in terms of variances of given uni-variate normal random variables) through typical building blocks of a Convolutional Neural Network (CNN).
no code implementations • 15 Jan 2021 • Evlampios Apostolidis, Eleni Adamantidou, Alexandros I. Metsai, Vasileios Mezaris, Ioannis Patras
Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content.
1 code implementation • IEEE Transactions on Circuits and Systems for Video Technology 2020 • Evlampios Apostolidis, Eleni Adamantidou, Alexandros I. Metsai, Vasileios Mezaris, Ioannis Patras
This paper presents a new method for unsupervised video summarization.
Ranked #3 on
Unsupervised Video Summarization
on TvSum
Generative Adversarial Network
Unsupervised Video Summarization
no code implementations • 25 Aug 2020 • Ting-Ting Xie, Christos Tzelepis, Ioannis Patras
Results in the action localization problem show that the incorporation of second order statistics improves over the baseline network, and that VANp surpasses the accuracy of virtually all other two-stage networks without involving any additional parameters.
no code implementations • 25 Aug 2020 • Ting-Ting Xie, Christos Tzelepis, Ioannis Patras
We use two uncertainty-aware boundary regression losses: first, the Kullback-Leibler divergence between the ground truth location of the boundary and the Gaussian modeling the prediction of the boundary and second, the expectation of the $\ell_1$ loss under the same Gaussian.
1 code implementation • MultiMedia Modeling (MMM) 2019 • Evlampios Apostolidis, Eleni Adamantidou, Alexandros I. Metsai, Vasileios Mezaris, Ioannis Patras
Experimental evaluation on two datasets (SumMe and TVSum) documents the contribution of the attention auto-encoder to faster and more stable training of the model, resulting in a significant performance improvement with respect to the original model and demonstrating the competitiveness of the proposed SUM-GAN-AAE against the state of the art.
Ranked #6 on
Unsupervised Video Summarization
on TvSum
1 code implementation • AI4TV 2019 • Evlampios Apostolidis, Alexandros I. Metsai, Eleni Adamantidou, Vasileios Mezaris, Ioannis Patras
In this paper we present our work on improving the efficiency of adversarial training for unsupervised video summarization.
Ranked #5 on
Unsupervised Video Summarization
on TvSum
1 code implementation • ICCV 2019 • Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, Ioannis Kompatsiaris
Subsequently, the similarity matrix between all video frames is fed to a four-layer CNN, and then summarized using Chamfer Similarity (CS) into a video-to-video similarity score -- this avoids feature aggregation before the similarity calculation between videos and captures the temporal similarity patterns between matching frame sequences.
Ranked #5 on
Video Retrieval
on FIVR-200K
no code implementations • 21 Jul 2019 • Mina Bishay, Georgios Zoumpourlis, Ioannis Patras
At the heart of our network is a meta-learning approach that learns to compare representations of variable temporal length, that is, either two videos of different length (in the case of few-shot action recognition) or a video and a semantic representation such as word vector (in the case of zero-shot action recognition).
Ranked #8 on
Few Shot Action Recognition
on Kinetics-100
no code implementations • 25 May 2019 • Ting-Ting Xie, Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Ioannis Patras
Temporal action localization has recently attracted significant interest in the Computer Vision community.
no code implementations • 11 Feb 2019 • Youngkyoon Jang, Hatice Gunes, Ioannis Patras
In this paper, we present a novel single shot face-related task analysis method, called Face-SSD, for detecting faces and for performing various face-related (classification/regression) tasks including smile recognition, face attribute prediction and valence-arousal estimation in the wild.
1 code implementation • 11 Sep 2018 • Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, Ioannis Kompatsiaris
To create the dataset, we devise a process for the collection of YouTube videos based on major news events from recent years crawled from Wikipedia and deploy a retrieval pipeline for the automatic selection of query videos based on their estimated suitability as benchmarks.
no code implementations • 7 Aug 2018 • Mina Bishay, Petar Palasek, Stefan Priebe, Ioannis Patras
Patients with schizophrenia often display impairments in the expression of emotion and speech and those are observed in their facial behaviour.
no code implementations • 13 Jan 2018 • Petar Palasek, Ioannis Patras
In this work we explore how the architecture proposed in [8], which expresses the processing steps of the classical Fisher vector pipeline approaches, i. e. dimensionality reduction by principal component analysis (PCA) projection, Gaussian mixture model (GMM) and Fisher vector descriptor extraction as network layers, can be modified into a hybrid network that combines the benefits of both unsupervised and supervised training methods, resulting in a model that learns a semi-supervised Fisher vector descriptor of the input data.
no code implementations • ICCV 2017 • Ioannis Marras, Petar Palasek, Ioannis Patras
We overcome this by introducing a Markov Random Field (MRF)-based spatial model network between the coarse and the refinement model that introduces geometric constraints on the relative locations of the body joints.
no code implementations • 19 Jul 2017 • Petar Palasek, Ioannis Patras
In this work we propose a novel neural network architecture for the problem of human action recognition in videos.
no code implementations • 22 Jan 2016 • Aria Ahmadi, Ioannis Patras
In this paper, we propose a direct method and train a Convolutional Neural Network (CNN) that when, at test time, is given a pair of images as input it produces a dense motion field F at its output layer.
no code implementations • 25 Nov 2015 • Christos Tzelepis, Damianos Galanopoulos, Vasileios Mezaris, Ioannis Patras
In this work we deal with the problem of high-level event detection in video.
1 code implementation • 11 Jul 2015 • Heng Yang, Wenxuan Mou, Yichi Zhang, Ioannis Patras, Hatice Gunes, Peter Robinson
In this paper we propose a supervised initialization scheme for cascaded face alignment based on explicit head pose estimation.
1 code implementation • 15 Apr 2015 • Christos Tzelepis, Vasileios Mezaris, Ioannis Patras
In this paper, we propose a maximum margin classifier that deals with uncertainty in data input.
no code implementations • CVPR 2015 • Heng Yang, Ioannis Patras
Our experiments lead to several interesting findings: 1) Surprisingly, most of state of the art methods struggle to preserve the mirror symmetry, despite the fact that they do have very similar overall performance on the original and mirror images; 2) the low mirrorability is not caused by training or testing sample bias - all algorithms are trained on both the original images and their mirrored versions; 3) the mirror error is strongly correlated to the localization/alignment error (with correlation coefficients around 0. 7).