1 code implementation • 25 Nov 2024 • Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana, Noor Ahsan, Nevasini Sasikumar, Omkar Thawakar, Henok Biadglign Ademtew, Yahya Hmaiti, Amandeep Kumar, Kartik Kuckreja, Mykola Maslych, Wafa Al Ghallabi, Mihail Mihaylov, Chao Qin, Abdelrahman M Shaker, Mike Zhang, Mahardika Krisna Ihsani, Amiel Esplana, Monil Gokani, Shachar Mirkin, Harsh Singh, Ashay Srivastava, Endre Hamerlik, Fathinah Asma Izzati, Fadillah Adamsyah Maani, Sebastian Cavada, Jenny Chim, Rohit Gupta, Sanjay Manjunath, Kamila Zhumakhanova, Feno Heriniaina Rabevohitra, Azril Amirudin, Muhammad Ridzuan, Daniya Kareem, Ketan More, Kunyang Li, Pramesh Shakya, Muhammad Saad, Amirpouya Ghasemaghaei, Amirbek Djanibekov, Dilshod Azizov, Branislava Jankovic, Naman Bhatia, Alvaro Cabrera, Johan Obando-Ceron, Olympiah Otieno, Fabian Farestam, Muztoba Rabbani, Sanoojan Baliah, Santosh Sanjeev, Abduragim Shtanchaev, Maheen Fatima, Thao Nguyen, Amrin Kareem, Toluwani Aremu, Nathan Xavier, Amit Bhatkal, Hawau Toyin, Aman Chadha, Hisham Cholakkal, Rao Muhammad Anwer, Michael Felsberg, Jorma Laaksonen, Thamar Solorio, Monojit Choudhury, Ivan Laptev, Mubarak Shah, Salman Khan, Fahad Khan
In pursuit of culturally diverse global multimodal models, our proposed All Languages Matter Benchmark (ALM-bench) represents the largest and most comprehensive effort to date for evaluating LMMs across 100 languages.
1 code implementation • 24 Oct 2024 • Sara Ghaboura, Ahmed Heakl, Omkar Thawakar, Ali Alharthi, Ines Riahi, Abduljalil Saif, Jorma Laaksonen, Fahad S. Khan, Salman Khan, Rao M. Anwer
In this work, we develop a comprehensive LMM evaluation benchmark for the Arabic language to represent a large population of over 400 million speakers.
1 code implementation • 7 Jan 2024 • Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe
It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef).
Ranked #1 on
Camouflaged Object Segmentation
on COD
(using extra training data)
Camouflaged Object Segmentation
Dichotomous Image Segmentation
+3
no code implementations • 10 Sep 2023 • Usman Muhammad, Mourad Oussalah, Jorma Laaksonen
Conventional feature extraction techniques in the face anti-spoofing domain either analyze the entire video sequence or focus on a specific segment to improve model performance.
no code implementations • 23 Aug 2023 • Usman Muhammad, Mourad Oussalah, Jorma Laaksonen
Inspired by the visual saliency theory, we present a video summarization method for face anti-spoofing detection that aims to enhance the performance and efficiency of deep learning models by leveraging visual saliency.
no code implementations • 14 Jul 2023 • Zixin Guo, Tzu-Jui Julius Wang, Selen Pehlivan, Abduljalil Radman, Jorma Laaksonen
To further reduce the amount of supervision, we propose Prompts-in-The-Loop (PiTL) that prompts knowledge from large language models (LLMs) to describe images.
3 code implementations • 6 Jul 2023 • Usman Muhammad, Md Ziaul Hoque, Mourad Oussalah, Jorma Laaksonen
Face presentation attacks (PA), also known as spoofing attacks, pose a substantial threat to biometric systems that rely on facial recognition systems, such as access control systems, mobile payments, and identity verification systems.
1 code implementation • 13 Jun 2023 • Omkar Thawkar, Abdelrahman Shaker, Sahal Shaji Mullappilly, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Fahad Shahbaz Khan
The latest breakthroughs in large vision-language models, such as Bard and GPT-4, have showcased extraordinary abilities in performing a wide range of tasks.
1 code implementation • 4 Apr 2023 • Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Fahad Shahbaz Khan
In this work, we propose a few-shot colorectal tissue image generation method for addressing the scarcity of histopathological training data for rare cancer tissues.
1 code implementation • 3 Apr 2023 • Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan
Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as `unknown' and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available.
1 code implementation • 21 Mar 2023 • Omkar Thawakar, Rao Muhammad Anwer, Jorma Laaksonen, Orly Reiner, Mubarak Shah, Fahad Shahbaz Khan
Accurate 3D mitochondria instance segmentation in electron microscopy (EM) is a challenging problem and serves as a prerequisite to empirically analyze their distributions and morphology.
no code implementations • 5 Jan 2023 • Usman Muhammad, Jorma Laaksonen, Djamila Romaissa Beddiar, Mourad Oussalah
The latter combines the predictions from the base models, leveraging their complementary information to better handle unseen target domains and enhance the overall performance.
1 code implementation • CVPR 2023 • Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan
In this work, we show how denoising diffusion models can be applied for high-fidelity person image synthesis with strong sample diversity and enhanced mode coverage of the learnt data distribution.
no code implementations • COLING 2022 • Khalid Alnajjar, Mika Hämäläinen, Jörg Tiedemann, Jorma Laaksonen, Mikko Kurimo
Our results show that the model is capable of correctly detecting whether an utterance is humorous 78% of the time and how long the audience's laughter reaction should last with a mean absolute error of 600 milliseconds.
no code implementations • 24 Oct 2022 • Tzu-Jui Julius Wang, Jorma Laaksonen, Tomas Langer, Heikki Arponen, Tom E. Bishop
Moreover, in other V-L downstream tasks considered, our WFH models are on par with models trained with paired V-L data, revealing the utility of unpaired data.
1 code implementation • 1 Jun 2022 • Zixin Guo, Tzu-Jui Julius Wang, Jorma Laaksonen
Different from directly fine-tuning CLIP to generate sentences, we introduce an adaptation training process to adapt CLIP's visual encoder to capture and align differences in image pairs based on the textual descriptions.
no code implementations • 6 Dec 2021 • Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Jorma Laaksonen, Michael Felsberg
Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects.
no code implementations • 18 Aug 2020 • Tzu-Jui Julius Wang, Selen Pehlivan, Jorma Laaksonen
Recent scene graph generation (SGG) models have shown their capability of capturing the most frequent relations among visual entities.
no code implementations • ICCV 2019 • Tiancai Wang, Rao Muhammad Anwer, Muhammad Haris Khan, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Jorma Laaksonen
Our approach outperforms the state-of-the-art on all datasets.
no code implementations • 17 Sep 2019 • Aditya Surikuchi, Jorma Laaksonen
Sequential vision-to-language or visual storytelling has recently been one of the areas of focus in computer vision and language modeling domains.
no code implementations • WS 2018 • Stig-Arne Grönroos, Benoit Huet, Mikko Kurimo, Jorma Laaksonen, Bernard Merialdo, Phu Pham, Mats Sjöberg, Umut Sulubacak, Jörg Tiedemann, Raphael Troncy, Raúl Vázquez
Our experiments show that the effect of the visual features in our system is small.
no code implementations • 5 Jun 2017 • Rao Muhammad Anwer, Fahad Shahbaz Khan, Joost Van de Weijer, Matthieu Molinier, Jorma Laaksonen
To the best of our knowledge, we are the first to investigate Binary Patterns encoded CNNs and different deep network fusion architectures for texture recognition and remote sensing scene classification.
Ranked #13 on
Aerial Scene Classification
on AID (20% as trainset)
no code implementations • CVPR 2017 • Hamed R. -Tavakoli, Fawad Ahmed, Ali Borji, Jorma Laaksonen
This paper revisits visual saliency prediction by evaluating the recent advancements in this field such as crowd-sourced mouse tracking-based databases and contextual annotations.
no code implementations • 24 Apr 2017 • Hamed R. -Tavakoli, Jorma Laaksonen
The motivation behind such a problem formulation is (1) the benefits to the knowledge representation-based vision pipelines, and (2) the potential improvements in emulating bio-inspired vision systems by solving these three problems together.
2 code implementations • ICCV 2017 • Hamed R. -Tavakoli, Rakshith Shetty, Ali Borji, Jorma Laaksonen
To bridge the gap between humans and machines in image understanding and describing, we need further insight into how people describe a perceived scene.
no code implementations • 7 Apr 2017 • Hamed R. -Tavakoli, Jorma Laaksonen, Esa Rahtu
To investigate the current status in regard to affective image tagging, we (1) introduce a new eye movement dataset using an affordable eye tracker, (2) study the use of deep neural networks for pleasantness recognition, (3) investigate the gap between deep features and eye movements.
no code implementations • 14 Dec 2016 • Fahad Shahbaz Khan, Joost Van de Weijer, Rao Muhammad Anwer, Andrew D. Bagdanov, Michael Felsberg, Jorma Laaksonen
Most approaches to human attribute and action recognition in still images are based on image representation in which multi-scale local features are pooled across scale into a single, scale-invariant encoding.
1 code implementation • 20 Oct 2016 • Hamed R. -Tavakoli, Ali Borji, Jorma Laaksonen, Esa Rahtu
This paper presents a novel fixation prediction and saliency modeling framework based on inter-image similarities and ensemble of Extreme Learning Machines (ELM).
1 code implementation • 17 Aug 2016 • Rakshith Shetty, Jorma Laaksonen
We present our submission to the Microsoft Video to Language Challenge of generating short captions describing videos in the challenge dataset.
2 code implementations • 9 Dec 2015 • Rakshith Shetty, Jorma Laaksonen
In this paper, we describe the system for generating textual descriptions of short video clips using recurrent neural networks (RNN), which we used while participating in the Large Scale Movie Description Challenge 2015 in ICCV 2015.
no code implementations • 2 Oct 2014 • Zakria Hussain, Arto Klami, Jussi Kujala, Alex P. Leung, Kitsuchart Pasupa, Peter Auer, Samuel Kaski, Jorma Laaksonen, John Shawe-Taylor
It then retrieves images with a specialized online learning algorithm that balances the tradeoff between exploring new images and exploiting the already inferred interests of the user.
no code implementations • LREC 2014 • Ville Viitaniemi, Tommi Jantunen, Leena Savolainen, Matti Karppa, Jorma Laaksonen
In this paper we present S-pot, a benchmark setting for evaluating the performance of automatic spotting of signs in continuous sign language videos.
no code implementations • LREC 2014 • Matti Karppa, Ville Viitaniemi, Marcos Luzardo, Jorma Laaksonen, Tommi Jantunen
We present a software toolkit called SLMotion which provides a framework for automatic and semiautomatic analysis, feature extraction and annotation of individual sign language videos, and which can easily be adapted to batch processing of entire sign language corpora.
no code implementations • LREC 2012 • Matti Karppa, Tommi Jantunen, Ville Viitaniemi, Jorma Laaksonen, Birgitta Burger, Danny De Weerdt
We consider a non-intrusive computer-vision method for measuring the motion of a person performing natural signing in video recordings.