Search Results for author: Hamed R. -Tavakoli

Found 16 papers, 5 papers with code

Learning to Learn to Compress

no code implementations31 Jul 2020 Nannan Zou, Honglei Zhang, Francesco Cricri, Hamed R. -Tavakoli, Jani Lainema, Miska Hannuksela, Emre Aksu, Esa Rahtu

In a second phase, the Model-Agnostic Meta-learning approach is adapted to the specific case of image compression, where the inner-loop performs latent tensor overfitting, and the outer loop updates both encoder and decoder neural networks based on the overfitting performance.

Image Compression Meta-Learning +1

Image Captioning through Image Transformer

2 code implementations29 Apr 2020 Sen He, Wentong Liao, Hamed R. -Tavakoli, Michael Yang, Bodo Rosenhahn, Nicolas Pugeault

Inspired by the successes in text analysis and translation, previous work have proposed the \textit{transformer} architecture for image captioning.

Image Captioning object-detection +3

End-to-End Learning for Video Frame Compression with Self-Attention

no code implementations20 Apr 2020 Nannan Zou, Honglei Zhang, Francesco Cricri, Hamed R. -Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu

One of the core components of conventional (i. e., non-learned) video codecs consists of predicting a frame from a previously-decoded frame, by leveraging temporal correlations.

MS-SSIM Optical Flow Estimation +1

Deep Saliency Models : The Quest For The Loss Function

no code implementations4 Jul 2019 Alexandre Bruckert, Hamed R. -Tavakoli, Zhi Liu, Marc Christie, Olivier Le Meur

We demonstrate that on a fixed network architecture, modifying the loss function can significantly improve (or depreciate) the results, hence emphasizing the importance of the choice of the loss function when designing a model.

Saliency Prediction

DAVE: A Deep Audio-Visual Embedding for Dynamic Saliency Prediction

2 code implementations25 May 2019 Hamed R. -Tavakoli, Ali Borji, Esa Rahtu, Juho Kannala

Our results suggest that (1) audio is a strong contributing cue for saliency prediction, (2) salient visible sound-source is the natural cause of the superiority of our Audio-Visual model, (3) richer feature representations for the input space leads to more powerful predictions even in absence of more sophisticated saliency decoders, and (4) Audio-Visual model improves over 53. 54\% of the frames predicted by the best Visual model (our baseline).

Saliency Prediction Video Saliency Prediction

Geometric Image Correspondence Verification by Dense Pixel Matching

no code implementations15 Apr 2019 Zakaria Laskar, Iaroslav Melekhov, Hamed R. -Tavakoli, Juha Ylioinas, Juho Kannala

The main contribution is a geometric correspondence verification approach for re-ranking a shortlist of retrieved database images based on their dense pair-wise matching with the query image at a pixel level.

Image Retrieval Re-Ranking +2

Digging Deeper into Egocentric Gaze Prediction

no code implementations12 Apr 2019 Hamed R. -Tavakoli, Esa Rahtu, Juho Kannala, Ali Borji

Extensive experiments over multiple datasets reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up saliency models perform poorly in predicting gaze and underperform spatial biases, (3) deep features perform better compared to traditional features, (4) as opposed to hand regions, the manipulation point is a strong influential cue for gaze prediction, (5) combining the proposed recurrent model with bottom-up cues, vanishing points and, in particular, manipulation point results in the best gaze prediction accuracy over egocentric videos, (6) the knowledge transfer works best for cases where the tasks or sequences are similar, and (7) task and activity recognition can benefit from gaze prediction.

Activity Recognition Gaze Prediction +2

Understanding and Visualizing Deep Visual Saliency Models

1 code implementation CVPR 2019 Sen He, Hamed R. -Tavakoli, Ali Borji, Yang Mi, Nicolas Pugeault

Our analyses reveal that: 1) some visual regions (e. g. head, text, symbol, vehicle) are already encoded within various layers of the network pre-trained for object recognition, 2) using modern datasets, we find that fine-tuning pre-trained models for saliency prediction makes them favor some categories (e. g. head) over some others (e. g. text), 3) although deep models of saliency outperform classical models on natural images, the converse is true for synthetic stimuli (e. g. pop-out search arrays), an evidence of significant difference between human and data-driven saliency models, and 4) we confirm that, after-fine tuning, the change in inner-representations is mostly due to the task and not the domain shift in the data.

Object Recognition Saliency Prediction +1

Human Attention in Image Captioning: Dataset and Analysis

no code implementations ICCV 2019 Sen He, Hamed R. -Tavakoli, Ali Borji, Nicolas Pugeault

In this work, we present a novel dataset consisting of eye movements and verbal descriptions recorded synchronously over images.

Image Captioning Sentence +1

Semantic Matching by Weakly Supervised 2D Point Set Registration

no code implementations24 Jan 2019 Zakaria Laskar, Hamed R. -Tavakoli, Juho Kannala

The problem is posed as finding the geometric transformation that aligns a given image pair.

Bottom-up Attention, Models of

no code implementations11 Oct 2018 Ali Borji, Hamed R. -Tavakoli, Zoya Bylinskii

In this review, we examine the recent progress in saliency prediction and proposed several avenues for future research.

Saliency Prediction

Saliency Revisited: Analysis of Mouse Movements versus Fixations

no code implementations CVPR 2017 Hamed R. -Tavakoli, Fawad Ahmed, Ali Borji, Jorma Laaksonen

This paper revisits visual saliency prediction by evaluating the recent advancements in this field such as crowd-sourced mouse tracking-based databases and contextual annotations.

Model Selection Saliency Prediction

Towards Instance Segmentation with Object Priority: Prominent Object Detection and Recognition

no code implementations24 Apr 2017 Hamed R. -Tavakoli, Jorma Laaksonen

The motivation behind such a problem formulation is (1) the benefits to the knowledge representation-based vision pipelines, and (2) the potential improvements in emulating bio-inspired vision systems by solving these three problems together.

Instance Segmentation Object +5

Paying Attention to Descriptions Generated by Image Captioning Models

2 code implementations ICCV 2017 Hamed R. -Tavakoli, Rakshith Shetty, Ali Borji, Jorma Laaksonen

To bridge the gap between humans and machines in image understanding and describing, we need further insight into how people describe a perceived scene.

Image Captioning

Investigating Natural Image Pleasantness Recognition using Deep Features and Eye Tracking for Loosely Controlled Human-computer Interaction

no code implementations7 Apr 2017 Hamed R. -Tavakoli, Jorma Laaksonen, Esa Rahtu

To investigate the current status in regard to affective image tagging, we (1) introduce a new eye movement dataset using an affordable eye tracker, (2) study the use of deep neural networks for pleasantness recognition, (3) investigate the gap between deep features and eye movements.

Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features

1 code implementation20 Oct 2016 Hamed R. -Tavakoli, Ali Borji, Jorma Laaksonen, Esa Rahtu

This paper presents a novel fixation prediction and saliency modeling framework based on inter-image similarities and ensemble of Extreme Learning Machines (ELM).

Cannot find the paper you are looking for? You can Submit a new open access paper.