Search Results for author: Jorma Laaksonen

Found 33 papers, 12 papers with code

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

1 code implementation7 Jan 2024 Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe

It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef).

 Ranked #1 on RGB Salient Object Detection on HRSOD (using extra training data)

Camouflaged Object Segmentation Dichotomous Image Segmentation +3

Semi-Supervised learning for Face Anti-Spoofing using Apex frame

no code implementations10 Sep 2023 Usman Muhammad, Mourad Oussalah, Jorma Laaksonen

Conventional feature extraction techniques in the face anti-spoofing domain either analyze the entire video sequence or focus on a specific segment to improve model performance.

Face Anti-Spoofing

Saliency-based Video Summarization for Face Anti-spoofing

no code implementations23 Aug 2023 Usman Muhammad, Mourad Oussalah, Jorma Laaksonen

Inspired by the visual saliency theory, we present a video summarization method for face anti-spoofing detection that aims to enhance the performance and efficiency of deep learning models by leveraging visual saliency.

Face Anti-Spoofing Face Presentation Attack Detection +1

PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting

no code implementations14 Jul 2023 Zixin Guo, Tzu-Jui Julius Wang, Selen Pehlivan, Abduljalil Radman, Jorma Laaksonen

To further reduce the amount of supervision, we propose Prompts-in-The-Loop (PiTL) that prompts knowledge from large language models (LLMs) to describe images.

Cross-Modal Retrieval Object +1

Deep Ensemble Learning with Frame Skipping for Face Anti-Spoofing

2 code implementations6 Jul 2023 Usman Muhammad, Md Ziaul Hoque, Mourad Oussalah, Jorma Laaksonen

Face presentation attacks (PA), also known as spoofing attacks, pose a substantial threat to biometric systems that rely on facial recognition systems, such as access control systems, mobile payments, and identity verification systems.

Ensemble Learning Face Anti-Spoofing +1

XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models

1 code implementation13 Jun 2023 Omkar Thawkar, Abdelrahman Shaker, Sahal Shaji Mullappilly, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Fahad Shahbaz Khan

The latest breakthroughs in large vision-language models, such as Bard and GPT-4, have showcased extraordinary abilities in performing a wide range of tasks.

Language Modelling Large Language Model

Cross-modulated Few-shot Image Generation for Colorectal Tissue Classification

1 code implementation4 Apr 2023 Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Fahad Shahbaz Khan

In this work, we propose a few-shot colorectal tissue image generation method for addressing the scarcity of histopathological training data for rare cancer tissues.

Data Augmentation Image Classification +1

Video Instance Segmentation in an Open-World

1 code implementation3 Apr 2023 Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan

Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as `unknown' and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available.

Instance Segmentation Semantic Segmentation +1

3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers

1 code implementation21 Mar 2023 Omkar Thawakar, Rao Muhammad Anwer, Jorma Laaksonen, Orly Reiner, Mubarak Shah, Fahad Shahbaz Khan

Accurate 3D mitochondria instance segmentation in electron microscopy (EM) is a challenging problem and serves as a prerequisite to empirically analyze their distributions and morphology.

Instance Segmentation Semantic Segmentation

Domain Generalization via Ensemble Stacking for Face Presentation Attack Detection

no code implementations5 Jan 2023 Usman Muhammad, Jorma Laaksonen, Djamila Romaissa Beddiar, Mourad Oussalah

The latter combines the predictions from the base models, leveraging their complementary information to better handle unseen target domains and enhance the overall performance.

Domain Generalization Ensemble Learning +3

Person Image Synthesis via Denoising Diffusion Model

1 code implementation CVPR 2023 Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan

In this work, we show how denoising diffusion models can be applied for high-fidelity person image synthesis with strong sample diversity and enhanced mode coverage of the learnt data distribution.

Denoising Image Generation

When to Laugh and How Hard? A Multimodal Approach to Detecting Humor and its Intensity

no code implementations COLING 2022 Khalid Alnajjar, Mika Hämäläinen, Jörg Tiedemann, Jorma Laaksonen, Mikko Kurimo

Our results show that the model is capable of correctly detecting whether an utterance is humorous 78% of the time and how long the audience's laughter reaction should last with a mean absolute error of 600 milliseconds.

Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision

no code implementations24 Oct 2022 Tzu-Jui Julius Wang, Jorma Laaksonen, Tomas Langer, Heikki Arponen, Tom E. Bishop

Moreover, in other V-L downstream tasks considered, our WFH models are on par with models trained with paired V-L data, revealing the utility of unpaired data.

Cross-Modal Retrieval Image Retrieval +3

CLIP4IDC: CLIP for Image Difference Captioning

1 code implementation1 Jun 2022 Zixin Guo, Tzu-Jui Julius Wang, Jorma Laaksonen

Different from directly fine-tuning CLIP to generate sentences, we introduce an adaptation training process to adapt CLIP's visual encoder to capture and align differences in image pairs based on the textual descriptions.

Domain Adaptation Image Classification

DoodleFormer: Creative Sketch Drawing with Transformers

no code implementations6 Dec 2021 Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Jorma Laaksonen, Michael Felsberg

Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects.

Image Generation

Tackling the Unannotated: Scene Graph Generation with Bias-Reduced Models

no code implementations18 Aug 2020 Tzu-Jui Julius Wang, Selen Pehlivan, Jorma Laaksonen

Recent scene graph generation (SGG) models have shown their capability of capturing the most frequent relations among visual entities.

Graph Generation Scene Graph Generation +1

Character-Centric Storytelling

no code implementations17 Sep 2019 Aditya Surikuchi, Jorma Laaksonen

Sequential vision-to-language or visual storytelling has recently been one of the areas of focus in computer vision and language modeling domains.

Language Modelling Visual Storytelling

Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification

no code implementations5 Jun 2017 Rao Muhammad Anwer, Fahad Shahbaz Khan, Joost Van de Weijer, Matthieu Molinier, Jorma Laaksonen

To the best of our knowledge, we are the first to investigate Binary Patterns encoded CNNs and different deep network fusion architectures for texture recognition and remote sensing scene classification.

Aerial Scene Classification General Classification +2

Saliency Revisited: Analysis of Mouse Movements versus Fixations

no code implementations CVPR 2017 Hamed R. -Tavakoli, Fawad Ahmed, Ali Borji, Jorma Laaksonen

This paper revisits visual saliency prediction by evaluating the recent advancements in this field such as crowd-sourced mouse tracking-based databases and contextual annotations.

Model Selection Saliency Prediction

Towards Instance Segmentation with Object Priority: Prominent Object Detection and Recognition

no code implementations24 Apr 2017 Hamed R. -Tavakoli, Jorma Laaksonen

The motivation behind such a problem formulation is (1) the benefits to the knowledge representation-based vision pipelines, and (2) the potential improvements in emulating bio-inspired vision systems by solving these three problems together.

Instance Segmentation Object +5

Paying Attention to Descriptions Generated by Image Captioning Models

2 code implementations ICCV 2017 Hamed R. -Tavakoli, Rakshith Shetty, Ali Borji, Jorma Laaksonen

To bridge the gap between humans and machines in image understanding and describing, we need further insight into how people describe a perceived scene.

Image Captioning

Investigating Natural Image Pleasantness Recognition using Deep Features and Eye Tracking for Loosely Controlled Human-computer Interaction

no code implementations7 Apr 2017 Hamed R. -Tavakoli, Jorma Laaksonen, Esa Rahtu

To investigate the current status in regard to affective image tagging, we (1) introduce a new eye movement dataset using an affordable eye tracker, (2) study the use of deep neural networks for pleasantness recognition, (3) investigate the gap between deep features and eye movements.

Scale Coding Bag of Deep Features for Human Attribute and Action Recognition

no code implementations14 Dec 2016 Fahad Shahbaz Khan, Joost Van de Weijer, Rao Muhammad Anwer, Andrew D. Bagdanov, Michael Felsberg, Jorma Laaksonen

Most approaches to human attribute and action recognition in still images are based on image representation in which multi-scale local features are pooled across scale into a single, scale-invariant encoding.

Action Recognition In Still Images Attribute

Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features

1 code implementation20 Oct 2016 Hamed R. -Tavakoli, Ali Borji, Jorma Laaksonen, Esa Rahtu

This paper presents a novel fixation prediction and saliency modeling framework based on inter-image similarities and ensemble of Extreme Learning Machines (ELM).

Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation

1 code implementation17 Aug 2016 Rakshith Shetty, Jorma Laaksonen

We present our submission to the Microsoft Video to Language Challenge of generating short captions describing videos in the challenge dataset.

Caption Generation Video Captioning

Video captioning with recurrent networks based on frame- and video-level features and visual content classification

2 code implementations9 Dec 2015 Rakshith Shetty, Jorma Laaksonen

In this paper, we describe the system for generating textual descriptions of short video clips using recurrent neural networks (RNN), which we used while participating in the Large Scale Movie Description Challenge 2015 in ICCV 2015.

Caption Generation General Classification +2

PinView: Implicit Feedback in Content-Based Image Retrieval

no code implementations2 Oct 2014 Zakria Hussain, Arto Klami, Jussi Kujala, Alex P. Leung, Kitsuchart Pasupa, Peter Auer, Samuel Kaski, Jorma Laaksonen, John Shawe-Taylor

It then retrieves images with a specialized online learning algorithm that balances the tradeoff between exploring new images and exploiting the already inferred interests of the user.

Content-Based Image Retrieval Retrieval

S-pot - a benchmark in spotting signs within continuous signing

no code implementations LREC 2014 Ville Viitaniemi, Tommi Jantunen, Leena Savolainen, Matti Karppa, Jorma Laaksonen

In this paper we present S-pot, a benchmark setting for evaluating the performance of automatic spotting of signs in continuous sign language videos.

SLMotion - An extensible sign language oriented video analysis tool

no code implementations LREC 2014 Matti Karppa, Ville Viitaniemi, Marcos Luzardo, Jorma Laaksonen, Tommi Jantunen

We present a software toolkit called SLMotion which provides a framework for automatic and semiautomatic analysis, feature extraction and annotation of individual sign language videos, and which can easily be adapted to batch processing of entire sign language corpora.

Sign Language Recognition

Comparing computer vision analysis of signed language video with motion capture recordings

no code implementations LREC 2012 Matti Karppa, Tommi Jantunen, Ville Viitaniemi, Jorma Laaksonen, Birgitta Burger, Danny De Weerdt

We consider a non-intrusive computer-vision method for measuring the motion of a person performing natural signing in video recordings.

Cannot find the paper you are looking for? You can Submit a new open access paper.