no code implementations • 31 Jul 2020 • Nannan Zou, Honglei Zhang, Francesco Cricri, Hamed R. -Tavakoli, Jani Lainema, Miska Hannuksela, Emre Aksu, Esa Rahtu
In a second phase, the Model-Agnostic Meta-learning approach is adapted to the specific case of image compression, where the inner-loop performs latent tensor overfitting, and the outer loop updates both encoder and decoder neural networks based on the overfitting performance.
2 code implementations • 29 Apr 2020 • Sen He, Wentong Liao, Hamed R. -Tavakoli, Michael Yang, Bodo Rosenhahn, Nicolas Pugeault
Inspired by the successes in text analysis and translation, previous work have proposed the \textit{transformer} architecture for image captioning.
no code implementations • 20 Apr 2020 • Nannan Zou, Honglei Zhang, Francesco Cricri, Hamed R. -Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu
One of the core components of conventional (i. e., non-learned) video codecs consists of predicting a frame from a previously-decoded frame, by leveraging temporal correlations.
no code implementations • 4 Jul 2019 • Alexandre Bruckert, Hamed R. -Tavakoli, Zhi Liu, Marc Christie, Olivier Le Meur
We demonstrate that on a fixed network architecture, modifying the loss function can significantly improve (or depreciate) the results, hence emphasizing the importance of the choice of the loss function when designing a model.
2 code implementations • 25 May 2019 • Hamed R. -Tavakoli, Ali Borji, Esa Rahtu, Juho Kannala
Our results suggest that (1) audio is a strong contributing cue for saliency prediction, (2) salient visible sound-source is the natural cause of the superiority of our Audio-Visual model, (3) richer feature representations for the input space leads to more powerful predictions even in absence of more sophisticated saliency decoders, and (4) Audio-Visual model improves over 53. 54\% of the frames predicted by the best Visual model (our baseline).
no code implementations • 15 Apr 2019 • Zakaria Laskar, Iaroslav Melekhov, Hamed R. -Tavakoli, Juha Ylioinas, Juho Kannala
The main contribution is a geometric correspondence verification approach for re-ranking a shortlist of retrieved database images based on their dense pair-wise matching with the query image at a pixel level.
no code implementations • 12 Apr 2019 • Hamed R. -Tavakoli, Esa Rahtu, Juho Kannala, Ali Borji
Extensive experiments over multiple datasets reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up saliency models perform poorly in predicting gaze and underperform spatial biases, (3) deep features perform better compared to traditional features, (4) as opposed to hand regions, the manipulation point is a strong influential cue for gaze prediction, (5) combining the proposed recurrent model with bottom-up cues, vanishing points and, in particular, manipulation point results in the best gaze prediction accuracy over egocentric videos, (6) the knowledge transfer works best for cases where the tasks or sequences are similar, and (7) task and activity recognition can benefit from gaze prediction.
1 code implementation • CVPR 2019 • Sen He, Hamed R. -Tavakoli, Ali Borji, Yang Mi, Nicolas Pugeault
Our analyses reveal that: 1) some visual regions (e. g. head, text, symbol, vehicle) are already encoded within various layers of the network pre-trained for object recognition, 2) using modern datasets, we find that fine-tuning pre-trained models for saliency prediction makes them favor some categories (e. g. head) over some others (e. g. text), 3) although deep models of saliency outperform classical models on natural images, the converse is true for synthetic stimuli (e. g. pop-out search arrays), an evidence of significant difference between human and data-driven saliency models, and 4) we confirm that, after-fine tuning, the change in inner-representations is mostly due to the task and not the domain shift in the data.
no code implementations • ICCV 2019 • Sen He, Hamed R. -Tavakoli, Ali Borji, Nicolas Pugeault
In this work, we present a novel dataset consisting of eye movements and verbal descriptions recorded synchronously over images.
no code implementations • 24 Jan 2019 • Zakaria Laskar, Hamed R. -Tavakoli, Juho Kannala
The problem is posed as finding the geometric transformation that aligns a given image pair.
no code implementations • 11 Oct 2018 • Ali Borji, Hamed R. -Tavakoli, Zoya Bylinskii
In this review, we examine the recent progress in saliency prediction and proposed several avenues for future research.
no code implementations • CVPR 2017 • Hamed R. -Tavakoli, Fawad Ahmed, Ali Borji, Jorma Laaksonen
This paper revisits visual saliency prediction by evaluating the recent advancements in this field such as crowd-sourced mouse tracking-based databases and contextual annotations.
no code implementations • 24 Apr 2017 • Hamed R. -Tavakoli, Jorma Laaksonen
The motivation behind such a problem formulation is (1) the benefits to the knowledge representation-based vision pipelines, and (2) the potential improvements in emulating bio-inspired vision systems by solving these three problems together.
2 code implementations • ICCV 2017 • Hamed R. -Tavakoli, Rakshith Shetty, Ali Borji, Jorma Laaksonen
To bridge the gap between humans and machines in image understanding and describing, we need further insight into how people describe a perceived scene.
no code implementations • 7 Apr 2017 • Hamed R. -Tavakoli, Jorma Laaksonen, Esa Rahtu
To investigate the current status in regard to affective image tagging, we (1) introduce a new eye movement dataset using an affordable eye tracker, (2) study the use of deep neural networks for pleasantness recognition, (3) investigate the gap between deep features and eye movements.
1 code implementation • 20 Oct 2016 • Hamed R. -Tavakoli, Ali Borji, Jorma Laaksonen, Esa Rahtu
This paper presents a novel fixation prediction and saliency modeling framework based on inter-image similarities and ensemble of Extreme Learning Machines (ELM).