no code implementations • 9 Mar 2022 • Soon Yau Cheong, Armin Mustafa, Andrew Gilbert
Transformers have recently been shown to generate high quality images from texts.
1 code implementation • 25 Oct 2021 • Nikolina Kubiak, Armin Mustafa, Graeme Phillipson, Stephen Jolly, Simon Hadfield
We then remap this unified input domain using a discriminator that is presented with the generated outputs and the style reference, i. e. images of the desired illumination conditions.
no code implementations • 19 Apr 2021 • Akin Caliskan, Armin Mustafa, Adrian Hilton
We present a novel method to learn temporally consistent 3D reconstruction of clothed people from a monocular video.
no code implementations • CVPR 2021 • Armin Mustafa, Akin Caliskan, Lourdes Agapito, Adrian Hilton
We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image.
no code implementations • 29 Sep 2020 • Akin Caliskan, Armin Mustafa, Evren Imre, Adrian Hilton
This paper introduces two advances to overcome this limitation: firstly a new synthetic dataset of realistic clothed people, 3DVH; and secondly, a novel multiple-view loss function for training of monocular volumetric shape estimation, which is demonstrated to significantly improve generalisation and reconstruction accuracy.
no code implementations • 14 Apr 2020 • Mertalp Ocal, Armin Mustafa
In this paper, we introduce RealMonoDepth a self-supervised monocular depth estimation approach which learns to estimate the real scene depth for a diverse range of indoor and outdoor scenes.
no code implementations • 2 Oct 2019 • Akin Caliskan, Armin Mustafa, Evren Imre, Adrian Hilton
We show that it is possible to learn stereo matching from synthetic people dataset and improve performance on real datasets for stereo reconstruction of people from narrow and wide baseline stereo data.
1 code implementation • 17 Sep 2019 • Quang-Hieu Pham, Pierre Sevestre, Ramanpreet Singh Pahwa, Huijing Zhan, Chun Ho Pang, Yuda Chen, Armin Mustafa, Vijay Chandrasekhar, Jie Lin
With the increasing global popularity of self-driving cars, there is an immediate need for challenging real-world datasets for benchmarking and training various computer vision tasks such as 3D object detection.
no code implementations • ICCV 2019 • Armin Mustafa, Chris Russell, Adrian Hilton
We introduce the first approach to solve the challenging problem of unsupervised 4D visual scene understanding for complex dynamic scenes with multiple interacting people from multi-view video.
no code implementations • 18 Jul 2019 • Armin Mustafa, Marco Volino, Hansung Kim, Jean-yves Guillemaut, Adrian Hilton
Existing techniques for dynamic scene reconstruction from multiple wide-baseline cameras primarily focus on reconstruction in controlled environments, with fixed calibrated cameras and strong prior constraints.
no code implementations • 30 Apr 2018 • Armin Mustafa, Marco Volino, Jean-yves Guillemaut, Adrian Hilton
Evaluation of the proposed light-field scene flow against existing multi-view dense correspondence approaches demonstrates a significant improvement in accuracy of temporal coherence.
no code implementations • CVPR 2017 • Armin Mustafa, Adrian Hilton
Semantic co-segmentation exploits the coherence in semantic class labels both spatially, between views at a single time instant, and temporally, between widely spaced time instants of dynamic objects with similar shape and appearance.
no code implementations • CVPR 2016 • Armin Mustafa, Hansung Kim, Jean-yves Guillemaut, Adrian Hilton
Sparse-to-dense temporal correspondence is integrated with joint multi-view segmentation and reconstruction to obtain a complete 4D representation of static and dynamic objects.
no code implementations • ICCV 2015 • Armin Mustafa, Hansung Kim, Jean-yves Guillemaut, Adrian Hilton
The primary contributions of this paper are twofold: an automatic method for initial coarse dynamic scene segmentation and reconstruction without prior knowledge of background appearance or structure; and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes from multiple wide-baseline static or moving cameras.