Search Results for author: Esa Rahtu

Found 69 papers, 36 papers with code

Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation

no code implementations CVPR 2023 Mayu Otani, Riku Togashi, Yu Sawai, Ryosuke Ishigami, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Shin'ichi Satoh

Human evaluation is critical for validating the performance of text-to-image generative models, as this highly cognitive process requires deep comprehension of text and images.

FinnWoodlands Dataset

1 code implementation3 Apr 2023 Juan Lagos, Urho Lempiö, Esa Rahtu

Besides tree trunks, we also annotated "Obstacles" objects as instances as well as the semantic stuff classes "Lake", "Ground", and "Track".

Autonomous Driving Depth Completion +2

MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation

no code implementations14 Feb 2023 Dingding Cai, Janne Heikkilä, Esa Rahtu

Though massive amounts of synthetic RGB images are easy to obtain, the models trained on them suffer from noticeable performance degradation due to the synthetic-to-real domain gap.

6D Pose Estimation using RGB Domain Adaptation +1

PanDepth: Joint Panoptic Segmentation and Depth Completion

1 code implementation29 Dec 2022 Juan Lagos, Esa Rahtu

Understanding 3D environments semantically is pivotal in autonomous driving applications where multiple computer vision tasks are involved.

Autonomous Driving Depth Completion +2

Supervised Fine-tuning Evaluation for Long-term Visual Place Recognition

no code implementations14 Nov 2022 Farid Alijani, Esa Rahtu

In this paper, we present a comprehensive study on the utility of deep convolutional neural networks with two state-of-the-art pooling layers which are placed after convolutional layers and fine-tuned in an end-to-end manner for visual place recognition task in challenging conditions, including seasonal and illumination variations.

Visual Place Recognition

Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors

1 code implementation13 Oct 2022 Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

This contrasts with the case of synchronising videos of talking heads, where audio-visual correspondence is dense in both time and space.

Audio-Visual Synchronization

The Weighting Game: Evaluating Quality of Explainability Methods

1 code implementation12 Aug 2022 Lassi Raatikainen, Esa Rahtu

The objective of this paper is to assess the quality of explanation heatmaps for image classification tasks.

Image Classification

HRF-Net: Holistic Radiance Fields from Sparse Inputs

no code implementations9 Aug 2022 Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila

We present HRF-Net, a novel view synthesis method based on holistic radiance fields that renders novel views using a set of sparse inputs.

Neural Rendering Novel View Synthesis

SC6D: Symmetry-agnostic and Correspondence-free 6D Object Pose Estimation

1 code implementation3 Aug 2022 Dingding Cai, Janne Heikkilä, Esa Rahtu

The pose estimation is decomposed into three sub-tasks: a) object 3D rotation representation learning and matching; b) estimation of the 2D location of the object center; and c) scale-invariant distance estimation (the translation along the z-axis) via classification.

6D Pose Estimation using RGB Representation Learning +1

Online panoptic 3D reconstruction as a Linear Assignment Problem

1 code implementation1 Apr 2022 Leevi Raivio, Esa Rahtu

Real-time holistic scene understanding would allow machines to interpret their surrounding in a much more detailed manner than is currently possible.

3D Reconstruction Image Segmentation +2

AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval

no code implementations CVPR 2022 Riku Togashi, Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkila, Tetsuya Sakai

First, it is rank-insensitive: It ignores the rank positions of successfully localised moments in the top-$K$ ranked list by treating the list as a set.

Moment Retrieval Retrieval

OVE6D: Object Viewpoint Encoding for Depth-based 6D Object Pose Estimation

1 code implementation CVPR 2022 Dingding Cai, Janne Heikkilä, Esa Rahtu

This paper proposes a universal framework, called OVE6D, for model-based 6D object pose estimation from a single depth image and a target object mask.

6D Pose Estimation using RGB Translation

Adaptation and Attention for Neural Video Coding

no code implementations16 Dec 2021 Nannan Zou, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed R. Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu

In this work, we propose an end-to-end learned video codec that introduces several architectural novelties as well as training novelties, revolving around the concepts of adaptation and attention.

Image Compression Motion Estimation

Taming Visually Guided Sound Generation

3 code implementations17 Oct 2021 Vladimir Iashin, Esa Rahtu

In this work, we propose a single model capable of generating visually relevant, high-fidelity sounds prompted with a set of frames from open-domain videos in less time than it takes to play it on a single GPU.

Audio Generation

Towards a Real-Time Facial Analysis System

1 code implementation21 Sep 2021 Bishwo Adhikari, Xingyang Ni, Esa Rahtu, Heikki Huttunen

Facial analysis is an active research area in computer vision, with many practical applications.

object-detection Object Detection

V-SlowFast Network for Efficient Visual Sound Separation

1 code implementation18 Sep 2021 Lingyu Zhu, Esa Rahtu

The objective of this paper is to perform visual sound separation: i) we study visual sound separation on spectrograms of different temporal resolutions; ii) we propose a new light yet efficient three-stream framework V-SlowFast that operates on Visual frame, Slow spectrogram, and Fast spectrogram.

Lightweight Monocular Depth with a Novel Neural Architecture Search Method

no code implementations25 Aug 2021 Lam Huynh, Phong Nguyen, Jiri Matas, Esa Rahtu, Janne Heikkila

This paper presents a novel neural architecture search method, called LiDNAS, for generating lightweight monocular depth estimation models.

Monocular Depth Estimation Neural Architecture Search

Monocular Depth Estimation Primed by Salient Point Detection and Normalized Hessian Loss

no code implementations25 Aug 2021 Lam Huynh, Matteo Pedone, Phong Nguyen, Jiri Matas, Esa Rahtu, Janne Heikkila

In addition, we introduce a normalized Hessian loss term invariant to scaling and shear along the depth direction, which is shown to substantially improve the accuracy.

Monocular Depth Estimation

Image coding for machines: an end-to-end learned approach

no code implementations23 Aug 2021 Nam Le, Honglei Zhang, Francesco Cricri, Ramin Ghaznavi-Youvalari, Esa Rahtu

Over recent years, deep learning-based computer vision systems have been applied to images at an ever-increasing pace, oftentimes representing the only type of consumption for those images.

Instance Segmentation object-detection +2

Learned Image Coding for Machines: A Content-Adaptive Approach

no code implementations23 Aug 2021 Nam Le, Honglei Zhang, Francesco Cricri, Ramin Ghaznavi-Youvalari, Hamed Rezazadegan Tavakoli, Esa Rahtu

One possible solution approach consists of adapting current human-targeted image and video coding standards to the use case of machine consumption.

Data Compression Image Compression

On the Importance of Encrypting Deep Features

1 code implementation16 Aug 2021 Xingyang Ni, Heikki Huttunen, Esa Rahtu

On the other hand, it is advisable to encrypt feature vectors, especially for a machine learning model in production.

Person Re-Identification

HybVIO: Pushing the Limits of Real-time Visual-inertial Odometry

1 code implementation22 Jun 2021 Otto Seiskari, Pekka Rantalankila, Juho Kannala, Jerry Ylilammi, Esa Rahtu, Arno Solin

We present HybVIO, a novel hybrid approach for combining filtering-based visual-inertial odometry (VIO) with optimization-based SLAM.

FlipReID: Closing the Gap between Training and Inference in Person Re-Identification

1 code implementation12 May 2021 Xingyang Ni, Esa Rahtu

More specifically, models using the FlipReID structure are trained on the original images and the flipped images simultaneously, and incorporating the flipping loss minimizes the mean squared error between feature vectors of corresponding image pairs.

Person Re-Identification

Sample selection for efficient image annotation

no code implementations10 May 2021 Bishwo Adhikari, Esa Rahtu, Heikki Huttunen

Supervised object detection has been proven to be successful in many benchmark datasets achieving human-level performances.

object-detection Object Detection

Selective Probabilistic Classifier Based on Hypothesis Testing

no code implementations9 May 2021 Saeed Bakhshi Germi, Esa Rahtu, Heikki Huttunen

In this paper, we propose a simple yet effective method to deal with the violation of the Closed-World Assumption for a classifier.

Visually Guided Sound Source Separation and Localization using Self-Supervised Motion Representations

1 code implementation17 Apr 2021 Lingyu Zhu, Esa Rahtu

The objective of this paper is to perform audio-visual sound source separation, i. e.~to separate component audios from a mixture based on the videos of sound sources.

Optical Flow Estimation Visually Guided Sound Source Separation

Single Source One Shot Reenactment using Weighted motion From Paired Feature Points

no code implementations7 Apr 2021 Soumya Tripathy, Juho Kannala, Esa Rahtu

Image reenactment is a task where the target object in the source image imitates the motion represented in the driving image.

Face Reenactment Image Animation

RGBD-Net: Predicting color and depth images for novel views synthesis

no code implementations29 Nov 2020 Phong Nguyen, Animesh Karnewar, Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila

We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network.

Novel View Synthesis regression

FACEGAN: Facial Attribute Controllable rEenactment GAN

no code implementations9 Nov 2020 Soumya Tripathy, Juho Kannala, Esa Rahtu

However, if the identity differs, the driving facial structures leak to the output distorting the reenactment result.

Face Reenactment

Uncovering Hidden Challenges in Query-Based Video Moment Retrieval

1 code implementation1 Sep 2020 Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä

In this paper, we present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task.

Moment Retrieval Retrieval

Learning to Learn to Compress

no code implementations31 Jul 2020 Nannan Zou, Honglei Zhang, Francesco Cricri, Hamed R. -Tavakoli, Jani Lainema, Miska Hannuksela, Emre Aksu, Esa Rahtu

In a second phase, the Model-Agnostic Meta-learning approach is adapted to the specific case of image compression, where the inner-loop performs latent tensor overfitting, and the outer loop updates both encoder and decoder neural networks based on the overfitting performance.

Image Compression Meta-Learning +1

Leveraging Category Information for Single-Frame Visual Sound Source Separation

3 code implementations15 Jul 2020 Lingyu Zhu, Esa Rahtu

Furthermore, our models are able to exploit the information of the sound source category in the separation process.

Optical Flow Estimation

Visually Guided Sound Source Separation using Cascaded Opponent Filter Network

1 code implementation4 Jun 2020 Lingyu Zhu, Esa Rahtu

A key element in COF is a novel opponent filter module that identifies and relocates residual components between sources.

Visually Guided Sound Source Separation

A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer

2 code implementations17 May 2020 Vladimir Iashin, Esa Rahtu

We show the effectiveness of the proposed model with audio and visual modalities on the dense video captioning task, yet the module is capable of digesting any two modalities in a sequence-to-sequence task.

Dense Video Captioning Temporal Action Proposal Generation

End-to-End Learning for Video Frame Compression with Self-Attention

no code implementations20 Apr 2020 Nannan Zou, Honglei Zhang, Francesco Cricri, Hamed R. -Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu

One of the core components of conventional (i. e., non-learned) video codecs consists of predicting a frame from a previously-decoded frame, by leveraging temporal correlations.

MS-SSIM Optical Flow Estimation +1

Sequential View Synthesis with Transformer

no code implementations9 Apr 2020 Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Janne Heikkila

This paper addresses the problem of novel view synthesis by means of neural rendering, where we are interested in predicting the novel view at an arbitrary camera pose based on a given set of input images from other viewpoints.

Neural Rendering Novel View Synthesis

Guiding Monocular Depth Estimation Using Depth-Attention Volume

2 code implementations ECCV 2020 Lam Huynh, Phong Nguyen-Ha, Jiri Matas, Esa Rahtu, Janne Heikkila

Recovering the scene depth from a single image is an ill-posed problem that requires additional priors, often referred to as monocular depth cues, to disambiguate different 3D interpretations.

Monocular Depth Estimation

Multi-modal Dense Video Captioning

4 code implementations17 Mar 2020 Vladimir Iashin, Esa Rahtu

We apply automatic speech recognition (ASR) system to obtain a temporally aligned textual description of the speech (similar to subtitles) and treat it as a separate input alongside video frames and the corresponding audio track.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

DAVE: A Deep Audio-Visual Embedding for Dynamic Saliency Prediction

2 code implementations25 May 2019 Hamed R. -Tavakoli, Ali Borji, Esa Rahtu, Juho Kannala

Our results suggest that (1) audio is a strong contributing cue for saliency prediction, (2) salient visible sound-source is the natural cause of the superiority of our Audio-Visual model, (3) richer feature representations for the input space leads to more powerful predictions even in absence of more sophisticated saliency decoders, and (4) Audio-Visual model improves over 53. 54\% of the frames predicted by the best Visual model (our baseline).

Saliency Prediction Video Saliency Prediction

Digging Deeper into Egocentric Gaze Prediction

no code implementations12 Apr 2019 Hamed R. -Tavakoli, Esa Rahtu, Juho Kannala, Ali Borji

Extensive experiments over multiple datasets reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up saliency models perform poorly in predicting gaze and underperform spatial biases, (3) deep features perform better compared to traditional features, (4) as opposed to hand regions, the manipulation point is a strong influential cue for gaze prediction, (5) combining the proposed recurrent model with bottom-up cues, vanishing points and, in particular, manipulation point results in the best gaze prediction accuracy over egocentric videos, (6) the knowledge transfer works best for cases where the tasks or sequences are similar, and (7) task and activity recognition can benefit from gaze prediction.

Activity Recognition Gaze Prediction +2

Predicting Novel Views Using Generative Adversarial Query Network

no code implementations10 Apr 2019 Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Janne Heikkila

The problem of predicting a novel view of the scene using an arbitrary number of observations is a challenging problem for computers as well as for humans.

Novel View Synthesis

ICface: Interpretable and Controllable Face Reenactment Using GANs

1 code implementation3 Apr 2019 Soumya Tripathy, Juho Kannala, Esa Rahtu

This paper presents a generic face animator that is able to control the pose and expressions of a given face image.

Face Reenactment Video Editing

Rethinking the Evaluation of Video Summaries

2 code implementations CVPR 2019 Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä

Video summarization is a technique to create a short skim of the original video while preserving the main stories/content.

Video Segmentation Video Semantic Segmentation +1

ADVIO: An authentic dataset for visual-inertial odometry

1 code implementation ECCV 2018 Santiago Cortés, Arno Solin, Esa Rahtu, Juho Kannala

The lack of realistic and open benchmarking datasets for pedestrian visual-inertial odometry has made it hard to pinpoint differences in published methods.


Learning image-to-image translation using paired and unpaired training samples

1 code implementation8 May 2018 Soumya Tripathy, Juho Kannala, Esa Rahtu

In this paper, we propose a new general purpose image-to-image translation model that is able to utilize both paired and unpaired training data simultaneously.

Image-to-Image Translation Translation

Image Patch Matching Using Convolutional Descriptors with Euclidean Distance

no code implementations31 Oct 2017 Iaroslav Melekhov, Juho Kannala, Esa Rahtu

In this work we propose a neural network based image descriptor suitable for image patch matching, which is an important task in many computer vision applications.

object-detection Object Detection +1

Summarization of User-Generated Sports Video by Using Deep Action Recognition Features

no code implementations25 Sep 2017 Antonio Tejero-de-Pablos, Yuta Nakashima, Tomokazu Sato, Naokazu Yokoya, Marko Linna, Esa Rahtu

The labels are provided by annotators possessing different experience with respect to Kendo to demonstrate how the proposed method adapts to different needs.

Action Recognition Temporal Action Localization +1

Investigating Natural Image Pleasantness Recognition using Deep Features and Eye Tracking for Loosely Controlled Human-computer Interaction

no code implementations7 Apr 2017 Hamed R. -Tavakoli, Jorma Laaksonen, Esa Rahtu

To investigate the current status in regard to affective image tagging, we (1) introduce a new eye movement dataset using an affordable eye tracker, (2) study the use of deep neural networks for pleasantness recognition, (3) investigate the gap between deep features and eye movements.

Image-based Localization using Hourglass Networks

no code implementations23 Mar 2017 Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, Esa Rahtu

In this paper, we propose an encoder-decoder convolutional neural network (CNN) architecture for estimating camera pose (orientation and location) from a single RGB-image.

General Classification Image-Based Localization +1

Inertial Odometry on Handheld Smartphones

1 code implementation1 Mar 2017 Arno Solin, Santiago Cortes, Esa Rahtu, Juho Kannala

Building a complete inertial navigation system using the limited quality data provided by current smartphones has been regarded challenging, if not impossible.

Relative Camera Pose Estimation Using Convolutional Neural Networks

1 code implementation5 Feb 2017 Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, Esa Rahtu

This paper presents a convolutional neural network based approach for estimating the relative pose between two cameras.

General Classification Pose Estimation +2

A novel method for automatic localization of joint area on knee plain radiographs

no code implementations31 Jan 2017 Aleksei Tiulpin, Jérôme Thevenot, Esa Rahtu, Simo Saarakkala

The obtained results for the used datasets show the mean intersection over the union equal to: 0. 84, 0. 79 and 0. 78.

Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features

1 code implementation20 Oct 2016 Hamed R. -Tavakoli, Ali Borji, Jorma Laaksonen, Esa Rahtu

This paper presents a novel fixation prediction and saliency modeling framework based on inter-image similarities and ensemble of Extreme Learning Machines (ELM).

Video Summarization using Deep Semantic Features

2 code implementations28 Sep 2016 Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya

For this, we design a deep neural network that maps videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of videos and descriptions.

Clustering Video Summarization

Real-time Human Pose Estimation from Video with Convolutional Neural Networks

no code implementations23 Sep 2016 Marko Linna, Juho Kannala, Esa Rahtu

In this paper, we present a method for real-time multi-person human pose estimation from video by utilizing convolutional neural networks.

Action Recognition Pose Estimation +1

Learning Joint Representations of Videos and Sentences with Web Image Search

no code implementations8 Aug 2016 Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya

In description generation, the performance level is comparable to the current state-of-the-art, although our embeddings were trained for the retrieval tasks.

Image Retrieval Natural Language Queries +4

Understanding Objects in Detail with Fine-Grained Attributes

no code implementations CVPR 2014 Andrea Vedaldi, Siddharth Mahendran, Stavros Tsogkas, Subhransu Maji, Ross Girshick, Juho Kannala, Esa Rahtu, Iasonas Kokkinos, Matthew B. Blaschko, David Weiss, Ben Taskar, Karen Simonyan, Naomi Saphra, Sammy Mohamed

We show that the collected data can be used to study the relation between part detection and attribute prediction by diagnosing the performance of classifiers that pool information from different parts of an object.

object-detection Object Detection

Fine-Grained Visual Classification of Aircraft

1 code implementation21 Jun 2013 Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew Blaschko, Andrea Vedaldi

This paper introduces FGVC-Aircraft, a new dataset containing 10, 000 images of aircraft spanning 100 aircraft models, organised in a three-level hierarchy.

Classification Fine-Grained Image Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.