no code implementations • 13 Sep 2024 • Minh-Duc Vu, Zuheng Ming, Fangchen Feng, Bissmella Bahaduri, Anissa Mokraoui
To address this, we propose a new interactive MIM method that can establish interactions between different tokens, which is particularly beneficial for object detection in remote sensing.
1 code implementation • 21 Oct 2023 • Bissmella Bahaduri, Zuheng Ming, Fangchen Feng, Anissa Mokraou
Object detection in Remote Sensing Images (RSI) is a critical task for numerous applications in Earth Observation (EO).
no code implementations • 11 Sep 2023 • Souhail Bakkali, Sanket Biswas, Zuheng Ming, Mickaël Coustaty, Marçal Rusiñol, Oriol Ramos Terrades, Josep Lladós
Visual document understanding (VDU) has rapidly advanced with the development of powerful multi-modal language models.
Ranked #20 on
Document Image Classification
on RVL-CDIP
no code implementations • 23 Mar 2023 • Bo Zhang, Zuheng Ming, Wei Feng, Yaqian Liu, Liang He, Kaixing Zhao
To benefit the complementary information between heterogeneous data, we introduce a new Multimodal Transformer (MMFormer) for Remote Sensing (RS) image classification using Hyperspectral Image (HSI) accompanied by another source of data such as Light Detection and Ranging (LiDAR).
no code implementations • 22 Jun 2022 • Musab Al-Ghadi, Zuheng Ming, Petra Gomez-Krämer, Jean-Christophe Burie
In this work, these two steps are combined together to achieve two objectives: (i) extracted features should have good anticollision (discriminative) capabilities to distinguish between a pair of identity documents belonging to different classes, (ii) checking out the conformity of the guilloche pattern of a given identity document and its similarity to the guilloche pattern of an authentic version of the same country.
no code implementations • 24 May 2022 • Souhail Bakkali, Zuheng Ming, Mickael Coustaty, Marçal Rusiñol, Oriol Ramos Terrades
Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream task.
Ranked #19 on
Document Image Classification
on RVL-CDIP
no code implementations • 3 Mar 2022 • Zuheng Ming, Zitong Yu, Musab Al-Ghadi, Muriel Visani, Muhammad MuzzamilLuqman, Jean-Christophe Burie
Instead of using coarse image patches with single-scale as in ViT, we propose the Multi-scale Multi-Head Self-Attention (MsMHSA) architecture to accommodate multi-scale patch partitions of Q, K, V feature maps to the heads of transformer in a coarse-to-fine manner, which enables to learn a fine-grained representation to perform pixel-level discrimination for face PAD.
no code implementations • 30 Aug 2021 • Tanmoy Mondal, Abhijit Das, Zuheng Ming
In this work, we adhere to explore a Multi-Tasking learning (MTL) based network to perform document attribute classification such as the font type, font size, font emphasis and scanning resolution classification of a document image.
no code implementations • 1 Jul 2021 • Konstantin Bulatov, Ekaterina Emelianova, Daniil Tropin, Natalya Skoryukina, Yulia Chernyshova, Alexander Sheshkus, Sergey Usilin, Zuheng Ming, Jean-Christophe Burie, Muhammad Muzzamil Luqman, Vladimir V. Arlazarov
Identity documents recognition is an important sub-field of document analysis, which deals with tasks of robust document detection, type identification, text fields recognition, as well as identity fraud prevention and document authenticity validation given photos, scans, or video frames of an identity document capture.
no code implementations • 8 Oct 2020 • Zuheng Ming, Muriel Visani, Muhammad Muzzamil Luqman, Jean-Christophe Burie
The widespread deployment of face recognition-based biometric systems has made face Presentation Attack Detection (face anti-spoofing) an increasingly critical issue.
no code implementations • 10 Mar 2020 • Zuheng Ming, Jean-Christophe Burie, Muhammad Muzzamil Luqman
Face recognition of realistic visual images has been well studied and made a significant progress in the recent decade.
no code implementations • 8 Nov 2019 • Souhail Bakkali, Zuheng Ming, Muhammad Muzzamil Luqman, Jean-Christophe Burie
Benefiting from the advance of deep convolutional neural network approaches (CNNs), many face detection algorithms have achieved state-of-the-art performance in terms of accuracy and very high speed in unconstrained applications.
1 code implementation • 8 Nov 2019 • Zuheng Ming, Junshi Xia, Muhammad Muzzamil Luqman, Jean-Christophe Burie, Kaixing Zhao
This multi-task learning with dynamic weights also boosts of the performance on the different tasks comparing to the state-of-art methods with single-task learning.
Ranked #1 on
Facial Expression Recognition (FER)
on Oulu-CASIA
1 code implementation • 8 Nov 2019 • Zuheng Ming, Jean-Christophe Burie, Muhammad Muzzamil Luqman
Rather than the visual images, the face recognition of the caricatures is far from the performance of the visual images.
no code implementations • 28 Feb 2019 • Zuheng Ming, Junshi Xia, Muhammad Muzzamil Luqman, Jean-Christophe Burie, Kaixing Zhao
This paper proposes a holistic multi-task Convolutional Neural Networks (CNNs) with the dynamic weights of the tasks, namely FaceLiveNet+, for face authentication.
no code implementations • JEPTALNRECITAL 2012 • Zuheng Ming, Gang Feng, Denis Beautemps