1 code implementation • 28 Nov 2024 • Xuqian Ren, Matias Turkulainen, Jiepeng Wang, Otto Seiskari, Iaroslav Melekhov, Juho Kannala, Esa Rahtu
We develop a scale-aware meshing strategy inspired by TSDF and octree-based isosurface extraction, which recovers finer details from Gaussian models compared to other commonly used open-source meshing tools.
no code implementations • 20 Sep 2024 • Ilpo Viertola, Vladimir Iashin, Esa Rahtu
We introduce V-AURA, the first autoregressive model to achieve high temporal alignment and relevance in video-to-audio generation.
no code implementations • 31 Aug 2024 • Mostafa Mansour, Ahmed Abdelsalam, Ari Happonen, Jari Porras, Esa Rahtu
Recent advancements in monocular neural depth estimation, particularly those achieved by the UniDepth network, have prompted the investigation of integrating UniDepth within a Gaussian splatting framework for monocular SLAM. This study presents UDGS-SLAM, a novel approach that eliminates the necessity of RGB-D sensors for depth estimation within Gaussian splatting framework.
1 code implementation • 26 Mar 2024 • Matias Turkulainen, Xuqian Ren, Iaroslav Melekhov, Otto Seiskari, Esa Rahtu, Juho Kannala
In this work, we explore the use of readily accessible geometric cues to enhance Gaussian splatting optimization in challenging, ill-posed, and textureless scenes.
1 code implementation • 20 Mar 2024 • Otto Seiskari, Jerry Ylilammi, Valtteri Kaatrasalo, Pekka Rantalankila, Matias Turkulainen, Juho Kannala, Esa Rahtu, Arno Solin
High-quality scene reconstruction and novel view synthesis based on Gaussian Splatting (3DGS) typically require steady, high-quality photographs, often impractical to capture with handheld cameras.
no code implementations • 15 Mar 2024 • Dingding Cai, Janne Heikkilä, Esa Rahtu
At inference, GS-Pose operates sequentially by locating the object in the input image, estimating its initial 6D pose using a retrieval approach, and refining the pose with a render-and-compare method.
2 code implementations • 29 Jan 2024 • Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman
Our objective is audio-visual synchronization with a focus on 'in-the-wild' videos, such as those on YouTube, where synchronization cues can be sparse.
no code implementations • 19 Jan 2024 • Nam Le, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed Rezazadegan Tavakoli, Emre Aksu, Miska M. Hannuksela, Esa Rahtu
Image coding for machines (ICM) aims at reducing the bitrate required to represent an image while minimizing the drop in machine vision analysis accuracy.
no code implementations • 19 Jan 2024 • Jukka I. Ahonen, Nam Le, Honglei Zhang, Antti Hallapuro, Francesco Cricri, Hamed Rezazadegan Tavakoli, Miska M. Hannuksela, Esa Rahtu
To the best of our knowledge, this is the first research paper showing a hybrid video codec that outperforms VVC on multiple datasets and multiple machine vision tasks.
no code implementations • 5 Nov 2023 • Xuqian Ren, Wenjia Wang, Dingding Cai, Tuuli Tuominen, Juho Kannala, Esa Rahtu
Metaverse technologies demand accurate, real-time, and immersive modeling on consumer-grade hardware for both non-human perception (e. g., drone/robot/autonomous car navigation) and immersive technologies like AR/VR, requiring both structural accuracy and photorealism.
no code implementations • CVPR 2023 • Mayu Otani, Riku Togashi, Yu Sawai, Ryosuke Ishigami, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Shin'ichi Satoh
Human evaluation is critical for validating the performance of text-to-image generative models, as this highly cognitive process requires deep comprehension of text and images.
1 code implementation • 3 Apr 2023 • Juan Lagos, Urho Lempiö, Esa Rahtu
Besides tree trunks, we also annotated "Obstacles" objects as instances as well as the semantic stuff classes "Lake", "Ground", and "Track".
no code implementations • 14 Feb 2023 • Dingding Cai, Janne Heikkilä, Esa Rahtu
Though massive amounts of synthetic RGB images are easy to obtain, the models trained on them suffer from noticeable performance degradation due to the synthetic-to-real domain gap.
no code implementations • 3 Jan 2023 • Janne Mustaniemi, Juho Kannala, Esa Rahtu, Li Liu, Janne Heikkilä
Various datasets have been proposed for simultaneous localization and mapping (SLAM) and related problems.
1 code implementation • 29 Dec 2022 • Juan Lagos, Esa Rahtu
Understanding 3D environments semantically is pivotal in autonomous driving applications where multiple computer vision tasks are involved.
no code implementations • 14 Nov 2022 • Farid Alijani, Esa Rahtu
In this paper, we present a comprehensive study on the utility of deep convolutional neural networks with two state-of-the-art pooling layers which are placed after convolutional layers and fine-tuned in an end-to-end manner for visual place recognition task in challenging conditions, including seasonal and illumination variations.
2 code implementations • 13 Oct 2022 • Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman
This contrasts with the case of synchronising videos of talking heads, where audio-visual correspondence is dense in both time and space.
1 code implementation • 1 Sep 2022 • Juan Pablo Lagos, Esa Rahtu
Holistic scene understanding is pivotal for the performance of autonomous machines.
1 code implementation • 12 Aug 2022 • Lassi Raatikainen, Esa Rahtu
The objective of this paper is to assess the quality of explanation heatmaps for image classification tasks.
no code implementations • 9 Aug 2022 • Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila
Moreover, our method can leverage a denser set of reference images of a single scene to produce accurate novel views without relying on additional explicit representations and still maintains the high-speed rendering of the pre-trained model.
1 code implementation • 3 Aug 2022 • Dingding Cai, Janne Heikkilä, Esa Rahtu
The pose estimation is decomposed into three sub-tasks: a) object 3D rotation representation learning and matching; b) estimation of the 2D location of the object center; and c) scale-invariant distance estimation (the translation along the z-axis) via classification.
1 code implementation • 3 Jul 2022 • Lingyu Zhu, Esa Rahtu, Hang Zhao
This paper focuses on perceiving and navigating 3D environments using echoes and RGB image.
1 code implementation • 1 Apr 2022 • Leevi Raivio, Esa Rahtu
Real-time holistic scene understanding would allow machines to interpret their surrounding in a much more detailed manner than is currently possible.
no code implementations • CVPR 2022 • Riku Togashi, Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkila, Tetsuya Sakai
First, it is rank-insensitive: It ignores the rank positions of successfully localised moments in the top-$K$ ranked list by treating the list as a set.
1 code implementation • CVPR 2022 • Mayu Otani, Riku Togashi, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Shin'ichi Satoh
OC-cost computes the cost of correcting detections to ground truths as a measure of accuracy.
no code implementations • 3 Mar 2022 • Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila
We present LDP, a lightweight dense prediction neural architecture search (NAS) framework.
1 code implementation • CVPR 2022 • Dingding Cai, Janne Heikkilä, Esa Rahtu
This paper proposes a universal framework, called OVE6D, for model-based 6D object pose estimation from a single depth image and a target object mask.
no code implementations • 16 Dec 2021 • Nannan Zou, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed R. Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu
In this work, we propose an end-to-end learned video codec that introduces several architectural novelties as well as training novelties, revolving around the concepts of adaptation and attention.
3 code implementations • 17 Oct 2021 • Vladimir Iashin, Esa Rahtu
In this work, we propose a single model capable of generating visually relevant, high-fidelity sounds prompted with a set of frames from open-domain videos in less time than it takes to play it on a single GPU.
1 code implementation • 21 Sep 2021 • Bishwo Adhikari, Xingyang Ni, Esa Rahtu, Heikki Huttunen
Facial analysis is an active research area in computer vision, with many practical applications.
1 code implementation • 18 Sep 2021 • Lingyu Zhu, Esa Rahtu
The objective of this paper is to perform visual sound separation: i) we study visual sound separation on spectrograms of different temporal resolutions; ii) we propose a new light yet efficient three-stream framework V-SlowFast that operates on Visual frame, Slow spectrogram, and Fast spectrogram.
no code implementations • 25 Aug 2021 • Lam Huynh, Matteo Pedone, Phong Nguyen, Jiri Matas, Esa Rahtu, Janne Heikkila
In addition, we introduce a normalized Hessian loss term invariant to scaling and shear along the depth direction, which is shown to substantially improve the accuracy.
no code implementations • 25 Aug 2021 • Lam Huynh, Phong Nguyen, Jiri Matas, Esa Rahtu, Janne Heikkila
This paper presents a novel neural architecture search method, called LiDNAS, for generating lightweight monocular depth estimation models.
no code implementations • 23 Aug 2021 • Nam Le, Honglei Zhang, Francesco Cricri, Ramin Ghaznavi-Youvalari, Hamed Rezazadegan Tavakoli, Esa Rahtu
One possible solution approach consists of adapting current human-targeted image and video coding standards to the use case of machine consumption.
no code implementations • 23 Aug 2021 • Nam Le, Honglei Zhang, Francesco Cricri, Ramin Ghaznavi-Youvalari, Esa Rahtu
Over recent years, deep learning-based computer vision systems have been applied to images at an ever-increasing pace, oftentimes representing the only type of consumption for those images.
1 code implementation • 16 Aug 2021 • Xingyang Ni, Heikki Huttunen, Esa Rahtu
On the other hand, it is advisable to encrypt feature vectors, especially for a machine learning model in production.
1 code implementation • 22 Jun 2021 • Otto Seiskari, Pekka Rantalankila, Juho Kannala, Jerry Ylilammi, Esa Rahtu, Arno Solin
We present HybVIO, a novel hybrid approach for combining filtering-based visual-inertial odometry (VIO) with optimization-based SLAM.
1 code implementation • 12 May 2021 • Xingyang Ni, Esa Rahtu
More specifically, models using the FlipReID structure are trained on the original images and the flipped images simultaneously, and incorporating the flipping loss minimizes the mean squared error between feature vectors of corresponding image pairs.
Ranked #3 on Person Re-Identification on MSMT17
no code implementations • 10 May 2021 • Bishwo Adhikari, Esa Rahtu, Heikki Huttunen
Supervised object detection has been proven to be successful in many benchmark datasets achieving human-level performances.
no code implementations • 9 May 2021 • Saeed Bakhshi Germi, Esa Rahtu, Heikki Huttunen
In this paper, we propose a simple yet effective method to deal with the violation of the Closed-World Assumption for a classifier.
1 code implementation • 17 Apr 2021 • Lingyu Zhu, Esa Rahtu
The objective of this paper is to perform audio-visual sound source separation, i. e.~to separate component audios from a mixture based on the videos of sound sources.
Optical Flow Estimation Visually Guided Sound Source Separation
no code implementations • 7 Apr 2021 • Soumya Tripathy, Juho Kannala, Esa Rahtu
Image reenactment is a task where the target object in the source image imitates the motion represented in the driving image.
no code implementations • ICCV 2021 • Lam Huynh, Phong Nguyen, Jiri Matas, Esa Rahtu, Janne Heikkila
In this paper, we propose enhancing monocular depth estimation by adding 3D points as depth guidance.
no code implementations • 29 Nov 2020 • Phong Nguyen, Animesh Karnewar, Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila
We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network.
no code implementations • 9 Nov 2020 • Soumya Tripathy, Juho Kannala, Esa Rahtu
However, if the identity differs, the driving facial structures leak to the output distorting the reenactment result.
1 code implementation • 1 Sep 2020 • Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä
In this paper, we present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task.
no code implementations • 31 Jul 2020 • Nannan Zou, Honglei Zhang, Francesco Cricri, Hamed R. -Tavakoli, Jani Lainema, Miska Hannuksela, Emre Aksu, Esa Rahtu
In a second phase, the Model-Agnostic Meta-learning approach is adapted to the specific case of image compression, where the inner-loop performs latent tensor overfitting, and the outer loop updates both encoder and decoder neural networks based on the overfitting performance.
3 code implementations • 15 Jul 2020 • Lingyu Zhu, Esa Rahtu
Furthermore, our models are able to exploit the information of the sound source category in the separation process.
1 code implementation • 4 Jun 2020 • Lingyu Zhu, Esa Rahtu
A key element in COF is a novel opponent filter module that identifies and relocates residual components between sources.
2 code implementations • 17 May 2020 • Vladimir Iashin, Esa Rahtu
We show the effectiveness of the proposed model with audio and visual modalities on the dense video captioning task, yet the module is capable of digesting any two modalities in a sequence-to-sequence task.
no code implementations • 20 Apr 2020 • Nannan Zou, Honglei Zhang, Francesco Cricri, Hamed R. -Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu
One of the core components of conventional (i. e., non-learned) video codecs consists of predicting a frame from a previously-decoded frame, by leveraging temporal correlations.
no code implementations • 9 Apr 2020 • Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Janne Heikkila
This paper addresses the problem of novel view synthesis by means of neural rendering, where we are interested in predicting the novel view at an arbitrary camera pose based on a given set of input images from other viewpoints.
2 code implementations • ECCV 2020 • Lam Huynh, Phong Nguyen-Ha, Jiri Matas, Esa Rahtu, Janne Heikkila
Recovering the scene depth from a single image is an ill-posed problem that requires additional priors, often referred to as monocular depth cues, to disambiguate different 3D interpretations.
4 code implementations • 17 Mar 2020 • Vladimir Iashin, Esa Rahtu
We apply automatic speech recognition (ASR) system to obtain a temporally aligned textual description of the speech (similar to subtitles) and treat it as a separate input alongside video frames and the corresponding audio track.
Ranked #11 on Dense Video Captioning on ActivityNet Captions
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
2 code implementations • 25 May 2019 • Hamed R. -Tavakoli, Ali Borji, Esa Rahtu, Juho Kannala
Our results suggest that (1) audio is a strong contributing cue for saliency prediction, (2) salient visible sound-source is the natural cause of the superiority of our Audio-Visual model, (3) richer feature representations for the input space leads to more powerful predictions even in absence of more sophisticated saliency decoders, and (4) Audio-Visual model improves over 53. 54\% of the frames predicted by the best Visual model (our baseline).
no code implementations • 12 Apr 2019 • Hamed R. -Tavakoli, Esa Rahtu, Juho Kannala, Ali Borji
Extensive experiments over multiple datasets reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up saliency models perform poorly in predicting gaze and underperform spatial biases, (3) deep features perform better compared to traditional features, (4) as opposed to hand regions, the manipulation point is a strong influential cue for gaze prediction, (5) combining the proposed recurrent model with bottom-up cues, vanishing points and, in particular, manipulation point results in the best gaze prediction accuracy over egocentric videos, (6) the knowledge transfer works best for cases where the tasks or sequences are similar, and (7) task and activity recognition can benefit from gaze prediction.
1 code implementation • 12 Apr 2019 • Aleksei Tiulpin, Stefan Klein, Sita M. A. Bierma-Zeinstra, Jérôme Thevenot, Esa Rahtu, Joyce van Meurs, Edwin H. G. Oei, Simo Saarakkala
Knee osteoarthritis (OA) is the most common musculoskeletal disease without a cure, and current treatment options are limited to symptomatic relief.
no code implementations • 10 Apr 2019 • Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Janne Heikkila
The problem of predicting a novel view of the scene using an arbitrary number of observations is a challenging problem for computers as well as for humans.
1 code implementation • 3 Apr 2019 • Soumya Tripathy, Juho Kannala, Esa Rahtu
This paper presents a generic face animator that is able to control the pose and expressions of a given face image.
2 code implementations • CVPR 2019 • Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä
Video summarization is a technique to create a short skim of the original video while preserving the main stories/content.
4 code implementations • 19 Oct 2018 • Iaroslav Melekhov, Aleksei Tiulpin, Torsten Sattler, Marc Pollefeys, Esa Rahtu, Juho Kannala
This paper addresses the challenge of dense pixel correspondence estimation between two images.
Ranked #2 on Dense Pixel Correspondence Estimation on HPatches
Camera Pose Estimation Dense Pixel Correspondence Estimation +2
1 code implementation • ECCV 2018 • Santiago Cortés, Arno Solin, Esa Rahtu, Juho Kannala
The lack of realistic and open benchmarking datasets for pedestrian visual-inertial odometry has made it hard to pinpoint differences in published methods.
1 code implementation • 8 May 2018 • Soumya Tripathy, Juho Kannala, Esa Rahtu
In this paper, we propose a new general purpose image-to-image translation model that is able to utilize both paired and unpaired training data simultaneously.
no code implementations • 31 Oct 2017 • Iaroslav Melekhov, Juho Kannala, Esa Rahtu
In this work we propose a neural network based image descriptor suitable for image patch matching, which is an important task in many computer vision applications.
1 code implementation • 29 Oct 2017 • Aleksei Tiulpin, Jérôme Thevenot, Esa Rahtu, Petri Lehenkari, Simo Saarakkala
Here, we also report a radiological OA diagnosis area under the ROC curve of 0. 93.
no code implementations • 25 Sep 2017 • Antonio Tejero-de-Pablos, Yuta Nakashima, Tomokazu Sato, Naokazu Yokoya, Marko Linna, Esa Rahtu
The labels are provided by annotators possessing different experience with respect to Kendo to demonstrate how the proposed method adapts to different needs.
no code implementations • 2 Aug 2017 • Arno Solin, Santiago Cortes, Esa Rahtu, Juho Kannala
This paper presents a novel method for visual-inertial odometry.
no code implementations • 7 Apr 2017 • Hamed R. -Tavakoli, Jorma Laaksonen, Esa Rahtu
To investigate the current status in regard to affective image tagging, we (1) introduce a new eye movement dataset using an affordable eye tracker, (2) study the use of deep neural networks for pleasantness recognition, (3) investigate the gap between deep features and eye movements.
no code implementations • 23 Mar 2017 • Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, Esa Rahtu
In this paper, we propose an encoder-decoder convolutional neural network (CNN) architecture for estimating camera pose (orientation and location) from a single RGB-image.
1 code implementation • 1 Mar 2017 • Arno Solin, Santiago Cortes, Esa Rahtu, Juho Kannala
Building a complete inertial navigation system using the limited quality data provided by current smartphones has been regarded challenging, if not impossible.
1 code implementation • 5 Feb 2017 • Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, Esa Rahtu
This paper presents a convolutional neural network based approach for estimating the relative pose between two cameras.
no code implementations • 31 Jan 2017 • Aleksei Tiulpin, Jérôme Thevenot, Esa Rahtu, Simo Saarakkala
The obtained results for the used datasets show the mean intersection over the union equal to: 0. 84, 0. 79 and 0. 78.
1 code implementation • 20 Oct 2016 • Hamed R. -Tavakoli, Ali Borji, Jorma Laaksonen, Esa Rahtu
This paper presents a novel fixation prediction and saliency modeling framework based on inter-image similarities and ensemble of Extreme Learning Machines (ELM).
2 code implementations • 28 Sep 2016 • Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya
For this, we design a deep neural network that maps videos as well as descriptions to a common semantic space and jointly trained it with associated pairs of videos and descriptions.
no code implementations • 23 Sep 2016 • Marko Linna, Juho Kannala, Esa Rahtu
In this paper, we present a method for real-time multi-person human pose estimation from video by utilizing convolutional neural networks.
no code implementations • 8 Aug 2016 • Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Naokazu Yokoya
In description generation, the performance level is comparable to the current state-of-the-art, although our embeddings were trained for the retrieval tasks.
no code implementations • CVPR 2014 • Andrea Vedaldi, Siddharth Mahendran, Stavros Tsogkas, Subhransu Maji, Ross Girshick, Juho Kannala, Esa Rahtu, Iasonas Kokkinos, Matthew B. Blaschko, David Weiss, Ben Taskar, Karen Simonyan, Naomi Saphra, Sammy Mohamed
We show that the collected data can be used to study the relation between part detection and attribute prediction by diagnosing the performance of classifiers that pool information from different parts of an object.
no code implementations • CVPR 2014 • Pekka Rantalankila, Juho Kannala, Esa Rahtu
The parameters of the graph cut problems are learnt in such a manner that they provide complementary sets of regions.
no code implementations • 21 Jun 2013 • Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew Blaschko, Andrea Vedaldi
This paper introduces FGVC-Aircraft, a new dataset containing 10, 000 images of aircraft spanning 100 aircraft models, organised in a three-level hierarchy.