no code implementations • 30 Oct 2024 • José-Fabian Villa-Vásquez, Marco Pedersoli
Unsupervised object discovery is commonly interpreted as the task of localizing and/or categorizing objects in visual data without the need for labeled examples.
1 code implementation • 1 Oct 2024 • Soufiane Belharbi, Marco Pedersoli, Alessandro Lameiras Koerich, Simon Bacon, Eric Granger
During training, this AU codebook is used, along with the input image expression label, and facial landmarks, to construct a AU heatmap that indicates the most discriminative image regions of interest w. r. t the facial expression.
Facial Expression Recognition Facial Expression Recognition (FER)
1 code implementation • 25 Sep 2024 • Simon Varailhon, Masih Aminbeidokhti, Marco Pedersoli, Eric Granger
Source-free domain adaptation (SFDA) is a challenging problem in object detection, where a pre-trained source model is adapted to a new target domain without using any source domain data for privacy and efficiency reasons.
1 code implementation • 16 Aug 2024 • Muhammad Haseeb Aslam, Marco Pedersoli, Alessandro Lameiras Koerich, Eric Granger
However, PKD methods based on structural similarity are primarily confined to learning from a single joint teacher representation, which limits their robustness, accuracy, and ability to learn from diverse multimodal sources.
2 code implementations • 17 Jul 2024 • Nicolas Richet, Soufiane Belharbi, Haseeb Aslam, Meike Emilie Schadt, Manuela González-González, Gustave Cortal, Alessandro Lameiras Koerich, Marco Pedersoli, Alain Finkel, Simon Bacon, Eric Granger
Our results indicate that multimodal textualization provides lower accuracy than feature-based models on C-EXPR-DB, where text transcripts are captured in the wild.
1 code implementation • 8 Jul 2024 • Shakeeb Murtaza, Marco Pedersoli, Aydin Sarraf, Eric Granger
Our TrCAM-V method allows training a localization network by sampling pseudo-pixels on the fly from these regions.
no code implementations • 27 May 2024 • Louis Fournier, Adel Nabli, Masih Aminbeidokhti, Marco Pedersoli, Eugene Belilovsky, Edouard Oyallon
The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models.
1 code implementation • 30 Apr 2024 • Rishav Pramanik, José-Fabian Villa-Vásquez, Marco Pedersoli
Moreover, we extend the slot attention to a multi-query approach, allowing the model to learn multiple sets of slots, producing more stable masks.
1 code implementation • 29 Apr 2024 • Heitor R. Medeiros, David Latortue, Eric Granger, Marco Pedersoli
Multimodal learning is a common way to leverage these modalities, where multiple modality-specific encoders and a fusion module are used to improve performance.
Ranked #1 on Object Detection on FLIR
1 code implementation • 15 Apr 2024 • Shakeeb Murtaza, Soufiane Belharbi, Marco Pedersoli, Eric Granger
These bboxes are also employed to estimate the threshold from LOC maps, circumventing the need for test-set bbox annotations.
1 code implementation • 1 Apr 2024 • Heitor Rapela Medeiros, Masih Aminbeidokhti, Fidel Guerrero Pena, David Latortue, Eric Granger, Marco Pedersoli
This paper focuses on adapting a large object detection model trained on RGB images to new data extracted from IR images with a substantial modality shift.
1 code implementation • 22 Mar 2024 • Shambhavi Mishra, Balamurali Murugesan, Ismail Ben Ayed, Marco Pedersoli, Jose Dolz
State-of-the-art semi-supervised learning (SSL) approaches rely on highly confident predictions to serve as pseudo-labels that guide the training on unlabeled samples.
1 code implementation • 15 Mar 2024 • Paul Waligora, Haseeb Aslam, Osama Zeeshan, Soufiane Belharbi, Alessandro Lameiras Koerich, Marco Pedersoli, Simon Bacon, Eric Granger
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems by leveraging the inter- and intra-modal relationships between, e. g., visual, textual, physiological, and auditory modalities.
1 code implementation • 14 Mar 2024 • Atif Belal, Akhil Meethal, Francisco Perdigon Romero, Marco Pedersoli, Eric Granger
Domain adaptation methods for object detection (OD) strive to mitigate the impact of distribution shifts by promoting feature alignment across source and target domains.
1 code implementation • 1 Feb 2024 • Soufiane Belharbi, Marco Pedersoli, Alessandro Lameiras Koerich, Simon Bacon, Eric Granger
During training, this \au codebook is used, along with the input image expression label, and facial landmarks, to construct a \au heatmap that indicates the most discriminative image regions of interest w. r. t the facial expression.
Facial Expression Recognition Facial Expression Recognition (FER)
no code implementations • 27 Jan 2024 • Muhammad Haseeb Aslam, Muhammad Osama Zeeshan, Soufiane Belharbi, Marco Pedersoli, Alessandro Koerich, Simon Bacon, Eric Granger
Results show that our proposed method can outperform state-of-the-art privileged KD methods on these problems.
1 code implementation • 17 Dec 2023 • Juan A. Rodriguez, Shubham Agarwal, Issam H. Laradji, Pau Rodriguez, David Vazquez, Christopher Pal, Marco Pedersoli
These visual tokens are pre-pended to the SVG token embeddings, and the sequence is modeled by the StarCoder model using next-token prediction, effectively learning to align the visual and code tokens.
1 code implementation • 9 Dec 2023 • Muhammad Osama Zeeshan, Muhammad Haseeb Aslam, Soufiane Belharbi, Alessandro Lameiras Koerich, Marco Pedersoli, Simon Bacon, Eric Granger
It efficiently leverages information from multiple source subjects (labeled source domain data) to adapt a deep FER model to a single target individual (unlabeled target domain data).
Facial Expression Recognition Facial Expression Recognition (FER) +2
1 code implementation • 20 Nov 2023 • David Latortue, Moetez Kdayem, Fidel A Guerrero Peña, Eric Granger, Marco Pedersoli
Object detection models are commonly used for people counting (and localization) in many applications but require a dataset with costly bounding box annotations for training.
1 code implementation • 10 Oct 2023 • Masih Aminbeidokhti, Fidel A. Guerrero Peña, Heitor Rapela Medeiros, Thomas Dubail, Eric Granger, Marco Pedersoli
However, this holds for standard in-domain settings, in which the training and test data follow the same distribution.
1 code implementation • 9 Oct 2023 • Shakeeb Murtaza, Soufiane Belharbi, Marco Pedersoli, Aydin Sarraf, Eric Granger
Subsequently, these proposals are used as pseudo-labels to train our new transformer-based WSOL model designed to perform classification and localization tasks.
1 code implementation • 7 Oct 2023 • Heitor Rapela Medeiros, Fidel A. Guerrero Pena, Masih Aminbeidokhti, Thomas Dubail, Eric Granger, Marco Pedersoli
This model produces a new image representation that enhances objects of interest in the scene and greatly improves detection performance.
1 code implementation • 3 Oct 2023 • Saypraseuth Mounsaveng, Florent Chiaroni, Malik Boudiaf, Marco Pedersoli, Ismail Ben Ayed
Fully Test-Time Adaptation (TTA), which aims at adapting models to data drifts, has recently attracted wide interest.
1 code implementation • 26 Sep 2023 • Atif Belal, Akhil Meethal, Francisco Perdigon Romero, Marco Pedersoli, Eric Granger
Given the use of prototypes, the number of parameters required for our PMT method does not increase significantly with the number of source domains, thus reducing memory issues and possible overfitting.
Multi-Source Unsupervised Domain Adaptation object-detection +2
1 code implementation • 9 Aug 2023 • Akhil Meethal, Eric Granger, Marco Pedersoli
One of the important bottlenecks in training modern object detectors is the need for labeled images where bounding box annotations have to be produced for each object present in the image.
Ranked #1 on Object Detection on VisDrone - 10% labeled data
1 code implementation • 21 Jun 2023 • Mehraveh Javan, Matthew Toews, Marco Pedersoli
To fully understand this problem, we analyse the performance of models independently trained with each pooling configurations on CIFAR10, using a ResNet20 network, and show that the position of the downsampling layers can highly influence the performance of a network and predefined downsampling configurations are not optimal.
Ranked #1 on Neural Architecture Search on Food-101 (Accuracy (% ) metric)
1 code implementation • 1 Jun 2023 • Juan A Rodriguez, David Vazquez, Issam Laradji, Marco Pedersoli, Pau Rodriguez
The generative modeling landscape has experienced tremendous growth in recent years, particularly in generating natural images and art.
1 code implementation • 16 Mar 2023 • Soufiane Belharbi, Shakeeb Murtaza, Marco Pedersoli, Ismail Ben Ayed, Luke McCaffrey, Eric Granger
This paper proposes a novel CAM method for WSVOL that exploits spatiotemporal information in activation maps during training without constraining an object's position.
1 code implementation • 15 Mar 2023 • Akhil Meethal, Eric Granger, Marco Pedersoli
Detecting objects in aerial images is challenging because they are typically composed of crowded small objects distributed non-uniformly over high-resolution images.
Ranked #2 on Object Detection on VisDrone-DET2019
1 code implementation • CVPR 2023 • Fidel A. Guerrero Peña, Heitor Rapela Medeiros, Thomas Dubail, Masih Aminbeidokhti, Eric Granger, Marco Pedersoli
The recent emergence of new algorithms for permuting models into functionally equivalent regions of the solution space has shed some light on the complexity of error surfaces, and some promising properties like mode connectivity.
no code implementations • 7 Nov 2022 • Djebril Mekhazni, Maximilien Dufau, Christian Desrosiers, Marco Pedersoli, Eric Granger
In this scenario, the ReID model must adapt to a complex target domain defined by a network of diverse video cameras based on tracklet information.
3 code implementations • 19 Oct 2022 • Juan A. Rodriguez, David Vazquez, Issam Laradji, Marco Pedersoli, Pau Rodriguez
To alleviate this problem, we present OCR-VQGAN, an image encoder, and decoder that leverages OCR pre-trained features to optimize a text perceptual loss, encouraging the architecture to preserve high-fidelity text and diagram structure.
no code implementations • 22 Sep 2022 • Thomas Dubail, Fidel Alejandro Guerrero Peña, Heitor Rapela Medeiros, Masih Aminbeidokhti, Eric Granger, Marco Pedersoli
In intelligent building management, knowing the number of people and their location in a room are important for better control of its illumination, ventilation, and heating with reduced costs and improved comfort.
no code implementations • 9 Sep 2022 • Shakeeb Murtaza, Soufiane Belharbi, Marco Pedersoli, Aydin Sarraf, Eric Granger
In this paper, we propose a method to train deep weakly-supervised object localization (WSOL) models based only on image-class labels to locate object with high confidence.
no code implementations • 9 Sep 2022 • Shakeeb Murtaza, Soufiane Belharbi, Marco Pedersoli, Aydin Sarraf, Eric Granger
Then, foreground and background pixels are sampled from these regions in order to train a WSOL model for generating activation maps that can accurately localize objects belonging to a specific class.
1 code implementation • 1 Apr 2022 • Akhil Meethal, Marco Pedersoli, Zhongwen Zhu, Francisco Perdigon Romero, Eric Granger
Semi- and weakly-supervised learning have recently attracted considerable attention in the object detection literature since they can alleviate the cost of annotation needed to successfully train deep learning models.
1 code implementation • 28 Mar 2022 • R. Gnana Praveen, Wheidima Carneiro de Melo, Nasib Ullah, Haseeb Aslam, Osama Zeeshan, Théo Denorme, Marco Pedersoli, Alessandro Koerich, Simon Bacon, Patrick Cardinal, Eric Granger
Specifically, we propose a joint cross-attention model that relies on the complementary relationships to extract the salient features across A-V modalities, allowing for accurate prediction of continuous values of valence and arousal.
no code implementations • 4 Feb 2022 • Jizong Peng, Ping Wang, Marco Pedersoli, Christian Desrosiers
Unsupervised pre-training has been proven as an effective approach to boost various downstream tasks given limited labeled data.
1 code implementation • 7 Jan 2022 • Soufiane Belharbi, Marco Pedersoli, Ismail Ben Ayed, Luke McCaffrey, Eric Granger
The CNN is exploited to collect both positive and negative evidence at the pixel level to train the decoder.
no code implementations • 16 Nov 2021 • Jizong Peng, Christian Desrosiers, Marco Pedersoli
This work considers semi-supervised segmentation as a dense prediction problem based on prototype vector correlation and proposes a simple way to represent each segmentation class with multiple prototypes.
1 code implementation • 15 Sep 2021 • Soufiane Belharbi, Aydin Sarraf, Marco Pedersoli, Ismail Ben Ayed, Luke McCaffrey, Eric Granger
Interpolation is required to restore full size CAMs, yet it does not consider the statistical properties of objects, such as color and texture, leading to activations with inconsistent boundaries, and inaccurate localizations.
1 code implementation • NeurIPS 2021 • Jizong Peng, Ping Wang, Chrisitian Desrosiers, Marco Pedersoli
Pre-training a recognition model with contrastive learning on a large dataset of unlabeled data has shown great potential to boost the performance of a downstream task, e. g., image classification.
no code implementations • 12 Jul 2021 • Ping Wang, Jizong Peng, Marco Pedersoli, Yuanfeng Zhou, Caiming Zhang, Christian Desrosiers
Despite their outstanding accuracy, semi-supervised segmentation methods based on deep neural networks can still yield predictions that are considered anatomically impossible by clinicians, for instance, containing holes or disconnected regions.
1 code implementation • 13 Apr 2021 • Le Thanh Nguyen-Meidine, Madhu Kiran, Marco Pedersoli, Jose Dolz, Louis-Antoine Blais-Morin, Eric Granger
Recent advances in unsupervised domain adaptation have significantly improved the recognition accuracy of CNNs by alleviating the domain shift between (labeled) source and (unlabeled) target data distributions.
1 code implementation • 8 Mar 2021 • Jizong Peng, Marco Pedersoli, Christian Desrosiers
In this method, we maximize the MI for intermediate feature embeddings that are taken from both the encoder and decoder of a segmentation network.
2 code implementations • ICCV 2021 • Jérôme Rony, Eric Granger, Marco Pedersoli, Ismail Ben Ayed
Our attack enjoys the generality of penalty methods and the computational efficiency of distance-customized algorithms, and can be readily used for a wide set of distances.
no code implementations • 10 Nov 2020 • Théo Ayral, Marco Pedersoli, Simon Bacon, Eric Granger
The proposed softmax strategy provides several advantages: a reduced computational complexity due to efficient clip sampling, and an improved accuracy since temporal weighting focuses on more relevant clips during both training and inference.
Facial Expression Recognition Facial Expression Recognition (FER)
1 code implementation • 31 Oct 2020 • Ping Wang, Jizong Peng, Marco Pedersoli, Yuanfeng Zhou, Caiming Zhang, Christian Desrosiers
Moreover, to encourage predictions from different networks to be both consistent and confident, we enhance this generalized JSD loss with an uncertainty regularizer based on entropy.
2 code implementations • 25 Jun 2020 • Saypraseuth Mounsaveng, Issam Laradji, Ismail Ben Ayed, David Vazquez, Marco Pedersoli
Data augmentation is a key practice in machine learning for improving generalization performance.
1 code implementation • ECCV 2020 • Malik Boudiaf, Jérôme Rony, Imtiaz Masud Ziko, Eric Granger, Marco Pedersoli, Pablo Piantanida, Ismail Ben Ayed
Second, we show that, more generally, minimizing the cross-entropy is actually equivalent to maximizing the mutual information, to which we connect several well-known pairwise losses.
Ranked #12 on Metric Learning on CARS196 (using extra training data)
no code implementations • 18 Mar 2020 • Abdur R Feyjie, Reza Azad, Marco Pedersoli, Claude Kauffman, Ismail Ben Ayed, Jose Dolz
To handle this new learning paradigm, we propose to include surrogate tasks that can leverage very powerful supervisory signals --derived from the data itself-- for semantic feature learning.
1 code implementation • 9 Mar 2020 • Reza Azad, Abdur R Fayjie, Claude Kauffman, Ismail Ben Ayed, Marco Pedersoli, Jose Dolz
Despite the initial belief that Convolutional Neural Networks (CNNs) are driven by shapes to perform visual recognition tasks, recent evidence suggests that texture bias in CNNs provides higher performing models when learning on large labeled training datasets.
Ranked #2 on Few-Shot Semantic Segmentation on Pascal5i
no code implementations • MIDL 2019 • Jizong Peng, Marco Pedersoli, Christian Desrosiers
The scarcity of labeled data often limits the application of deep learning to medical image segmentation.
1 code implementation • 3 Dec 2019 • Akhil Meethal, Marco Pedersoli, Soufiane Belharbi, Eric Granger
Weakly supervised object localization is a challenging task in which the object of interest should be localized while learning its appearance.
1 code implementation • 11 Nov 2019 • Xianda Xu, Marco Pedersoli
Deep Neural Networks have now achieved state-of-the-art results in a wide range of tasks including image classification, object detection and so on.
no code implementations • 3 Oct 2019 • Jizong Peng, Christian Desrosiers, Marco Pedersoli
The second, named Invariant Information Clustering (IIC), maximizes the mutual information between the clustering of a sample and its geometrically transformed version.
no code implementations • 2 Oct 2019 • Masih Aminbeidokhti, Marco Pedersoli, Patrick Cardinal, Eric Granger
Video-based emotion recognition is a challenging task because it requires to distinguish the small deformations of the human face that represent emotions, while being invariant to stronger visual differences due to different identities.
no code implementations • ICLR Workshop LLD 2019 • Saypraseuth Mounsaveng, David Vazquez, Ismail Ben Ayed, Marco Pedersoli
Data augmentation (DA) is fundamental against overfitting in large convolutional neural networks, especially with a limited training dataset.
no code implementations • 15 Aug 2019 • Jizong Peng, Hoel Kervadec, Jose Dolz, Ismail Ben Ayed, Marco Pedersoli, Christian Desrosiers
An efficient strategy for weakly-supervised segmentation is to impose constraints or regularization priors on target regions.
no code implementations • 6 Jul 2019 • Juan D. S. Ortega, Mohammed Senoussaoui, Eric Granger, Marco Pedersoli, Patrick Cardinal, Alessandro L. Koerich
This paper presents a novel deep neural network (DNN) for multimodal fusion of audio, video and text modalities for emotion recognition.
1 code implementation • 20 Jun 2019 • Le Thanh Nguyen-Meidine, Eric Granger, Madhu Kiran, Louis-Antoine Blais-Morin, Marco Pedersoli
Although deep neural networks (NNs) have achievedstate-of-the-art accuracy in many visual recognition tasks, the growing computational complexity and energy con-sumption of networks remains an issue, especially for ap-plications on platforms with limited resources and requir-ing real-time processing.
2 code implementations • 27 Mar 2019 • Jizong Peng, Guillermo Estrada, Marco Pedersoli, Christian Desrosiers
In this paper, we aim to improve the performance of semantic image segmentation in a semi-supervised setting in which training is effectuated with a reduced set of annotated images and additional non-annotated images.
1 code implementation • 9 Oct 2018 • Mohammed Jabi, Marco Pedersoli, Amar Mitiche, Ismail Ben Ayed
Typically, they use multinomial logistic regression posteriors and parameter regularization, as is very common in supervised learning.
Ranked #2 on Image Clustering on YouTube Faces DB
1 code implementation • 9 Jul 2018 • Aarush Gupta, Dakshit Agrawal, Hardik Chauhan, Jose Dolz, Marco Pedersoli
In this paper we propose a new approach for classifying the global emotion of images containing groups of people.
no code implementations • 29 Nov 2017 • Ahmad Chaddad, Behnaz Naisiri, Marco Pedersoli, Eric Granger, Christian Desrosiers, Matthew Toews
This paper proposes a principled information theoretic analysis of classification for deep neural network structures, e. g. convolutional neural networks (CNN).
no code implementations • ICCV 2017 • Marco Pedersoli, Thomas Lucas, Cordelia Schmid, Jakob Verbeek
We propose "Areas of Attention", a novel attention-based model for automatic image captioning.
1 code implementation • 15 Jun 2016 • Amir Ghodrati, Ali Diba, Marco Pedersoli, Tinne Tuytelaars, Luc van Gool
In this paper, a new method for generating object and action proposals in images and videos is proposed.
no code implementations • ICCV 2015 • Marco Pedersoli, Tinne Tuytelaars
In this paper we propose a new method for the detection and pose estimation of 3D objects, that does not use any 3D CAD model or other 3D information.
no code implementations • 26 Nov 2015 • Amir Ghodrati, Xu Jia, Marco Pedersoli, Tinne Tuytelaars
Learning the distribution of images in order to generate new samples is a challenging task due to the high dimensionality of the data and the highly non-linear relations that are involved.
1 code implementation • ICCV 2015 • Amir Ghodrati, Ali Diba, Marco Pedersoli, Tinne Tuytelaars, Luc van Gool
We generate hypotheses in a sliding-window fashion over different activation layers and show that the final convolutional layers can find the object of interest with high recall but poor localization due to the coarseness of the feature maps.
no code implementations • CVPR 2015 • Hakan Bilen, Marco Pedersoli, Tinne Tuytelaars
However, as learning appearance and localization are two interconnected tasks, the optimization is not convex and the procedure can easily get stuck in a poor local minimum, the algorithm "misses" the object in some images.
no code implementations • CVPR 2014 • Hakan Bilen, Marco Pedersoli, Vinay P. Namboodiri, Tinne Tuytelaars, Luc van Gool
In classification of objects substantial work has gone into improving the low level representation of an image by considering various aspects such as different features, a number of feature pooling and coding techniques and considering different kernels.
no code implementations • CVPR 2014 • Marco Pedersoli, Tinne Tuytelaars, Luc van Gool
Additionally, without any facial point annotation at the level of individual training images, our method can localize facial points with an accuracy similar to fully supervised approaches.