Search Results for author: Armin Mustafa

Found 20 papers, 6 papers with code

ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet

1 code implementation5 Dec 2023 Soon Yau Cheong, Armin Mustafa, Andrew Gilbert

This paper introduces ViscoNet, a novel method that enhances text-to-image human generation models with visual prompting.

Image Generation Visual Prompting

CAD -- Contextual Multi-modal Alignment for Dynamic AVQA

no code implementations25 Oct 2023 Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa

In the context of Audio Visual Question Answering (AVQA) tasks, the audio visual modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic.

Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +2

PAT: Position-Aware Transformer for Dense Multi-Label Action Detection

no code implementations9 Aug 2023 Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton

To address this issue, we (i) embed relative positional encoding in the self-attention mechanism and (ii) exploit multi-scale temporal relationships by designing a novel non hierarchical network, in contrast to the recent transformer-based approaches that use a hierarchical structure.

Action Detection Event Detection +1

UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer

1 code implementation18 Apr 2023 Soon Yau Cheong, Armin Mustafa, Andrew Gilbert

Text-to-image models (T2I) such as StableDiffusion have been used to generate high quality images of people.

 Ranked #1 on Pose Transfer on Deep-Fashion (FID metric)

Disentanglement Pose Transfer +2

SEM-POS: Grammatically and Semantically Correct Video Captioning

no code implementations26 Mar 2023 Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa

Generating grammatically and semantically correct captions in video captioning is a challenging task.

POS Video Captioning

KPE: Keypoint Pose Encoding for Transformer-based Image Generation

1 code implementation9 Mar 2022 Soon Yau Cheong, Armin Mustafa, Andrew Gilbert

Therefore we propose a new method; Keypoint Pose Encoding (KPE); KPE is 10 times more memory efficient and over 73% faster at generating high quality images from text input conditioned on the pose.

Image Generation

SILT: Self-supervised Lighting Transfer Using Implicit Image Decomposition

1 code implementation25 Oct 2021 Nikolina Kubiak, Armin Mustafa, Graeme Phillipson, Stephen Jolly, Simon Hadfield

We then remap this unified input domain using a discriminator that is presented with the generated outputs and the style reference, i. e. images of the desired illumination conditions.

Multi-person Implicit Reconstruction from a Single Image

no code implementations CVPR 2021 Armin Mustafa, Akin Caliskan, Lourdes Agapito, Adrian Hilton

We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image.

3D Human Reconstruction

Multi-View Consistency Loss for Improved Single-Image 3D Reconstruction of Clothed People

no code implementations29 Sep 2020 Akin Caliskan, Armin Mustafa, Evren Imre, Adrian Hilton

This paper introduces two advances to overcome this limitation: firstly a new synthetic dataset of realistic clothed people, 3DVH; and secondly, a novel multiple-view loss function for training of monocular volumetric shape estimation, which is demonstrated to significantly improve generalisation and reconstruction accuracy.

3D Human Shape Estimation 3D Reconstruction

RealMonoDepth: Self-Supervised Monocular Depth Estimation for General Scenes

no code implementations14 Apr 2020 Mertalp Ocal, Armin Mustafa

In this paper, we introduce RealMonoDepth a self-supervised monocular depth estimation approach which learns to estimate the real scene depth for a diverse range of indoor and outdoor scenes.

Monocular Depth Estimation Self-Supervised Learning

Learning Dense Wide Baseline Stereo Matching for People

no code implementations2 Oct 2019 Akin Caliskan, Armin Mustafa, Evren Imre, Adrian Hilton

We show that it is possible to learn stereo matching from synthetic people dataset and improve performance on real datasets for stereo reconstruction of people from narrow and wide baseline stereo data.

Data Augmentation Stereo Matching

A*3D Dataset: Towards Autonomous Driving in Challenging Environments

1 code implementation17 Sep 2019 Quang-Hieu Pham, Pierre Sevestre, Ramanpreet Singh Pahwa, Huijing Zhan, Chun Ho Pang, Yuda Chen, Armin Mustafa, Vijay Chandrasekhar, Jie Lin

With the increasing global popularity of self-driving cars, there is an immediate need for challenging real-world datasets for benchmarking and training various computer vision tasks such as 3D object detection.

3D Object Detection Autonomous Driving +4

U4D: Unsupervised 4D Dynamic Scene Understanding

no code implementations ICCV 2019 Armin Mustafa, Chris Russell, Adrian Hilton

We introduce the first approach to solve the challenging problem of unsupervised 4D visual scene understanding for complex dynamic scenes with multiple interacting people from multi-view video.

3D Pose Estimation Instance Segmentation +3

Temporally Coherent General Dynamic Scene Reconstruction

no code implementations18 Jul 2019 Armin Mustafa, Marco Volino, Hansung Kim, Jean-yves Guillemaut, Adrian Hilton

Existing techniques for dynamic scene reconstruction from multiple wide-baseline cameras primarily focus on reconstruction in controlled environments, with fixed calibrated cameras and strong prior constraints.

Segmentation Semantic Segmentation

4D Temporally Coherent Light-field Video

no code implementations30 Apr 2018 Armin Mustafa, Marco Volino, Jean-yves Guillemaut, Adrian Hilton

Evaluation of the proposed light-field scene flow against existing multi-view dense correspondence approaches demonstrates a significant improvement in accuracy of temporal coherence.

Scene Flow Estimation

Semantically Coherent Co-Segmentation and Reconstruction of Dynamic Scenes

no code implementations CVPR 2017 Armin Mustafa, Adrian Hilton

Semantic co-segmentation exploits the coherence in semantic class labels both spatially, between views at a single time instant, and temporally, between widely spaced time instants of dynamic objects with similar shape and appearance.

3D Reconstruction Segmentation

Temporally coherent 4D reconstruction of complex dynamic scenes

no code implementations CVPR 2016 Armin Mustafa, Hansung Kim, Jean-yves Guillemaut, Adrian Hilton

Sparse-to-dense temporal correspondence is integrated with joint multi-view segmentation and reconstruction to obtain a complete 4D representation of static and dynamic objects.

4D reconstruction Camera Calibration +2

General Dynamic Scene Reconstruction from Multiple View Video

no code implementations ICCV 2015 Armin Mustafa, Hansung Kim, Jean-yves Guillemaut, Adrian Hilton

The primary contributions of this paper are twofold: an automatic method for initial coarse dynamic scene segmentation and reconstruction without prior knowledge of background appearance or structure; and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes from multiple wide-baseline static or moving cameras.

Scene Segmentation Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.