no code implementations • 15 Jul 2024 • Marco Pesavento, Marco Volino, Adrian Hilton
The generated 2D normal maps are then processed by a multi-view attention-based neural implicit model that estimates an implicit representation of the 3D shape, ensuring the reproduction of details in both observed and occluded regions.
no code implementations • 20 Jun 2024 • Moira Shooter, Charles Malleson, Adrian Hilton
To address this, we created 3DDogs-Wild, a naturalised version of the dataset where the optical markers are in-painted and the subjects are placed in diverse environments, enhancing its utility for training RGB image-based pose detectors.
no code implementations • 10 Jun 2024 • Asmar Nadeem, Faegheh Sardari, Robert Dawes, Syed Sameed Husain, Adrian Hilton, Armin Mustafa
Existing video captioning benchmarks and models lack coherent representations of causal-temporal narrative, which is sequences of events linked through cause and effect, unfolding over time and driven by characters or agents.
Ranked #1 on Video Captioning on MSVD-CTN
no code implementations • 10 Jun 2024 • Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton
In this paper, we address this issue by proposing a novel transformer-based network that (a) employs a non-hierarchical structure when modelling different ranges of temporal dependencies and (b) embeds relative positional encoding in its transformer layers.
no code implementations • 6 Jun 2024 • Haosen Yang, Chenhao Zhang, Wenqing Wang, Marco Volino, Adrian Hilton, Li Zhang, Xiatian Zhu
To address these limitations, we propose a Localized Point Management (LPM) strategy, capable of identifying those error-contributing zones in the highest demand for both point addition and geometry calibration.
1 code implementation • 17 May 2024 • Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton
In this paper, we propose CoLeaF, a novel learning framework that optimizes the integration of cross-modal context in the embedding space such that the network explicitly learns to combine cross-modal information for audible-visible events while filtering them out for unaligned events.
no code implementations • CVPR 2024 • Marco Pesavento, Yuanlu Xu, Nikolaos Sarafianos, Robert Maier, Ziyan Wang, Chun-Han Yao, Marco Volino, Edmond Boyer, Adrian Hilton, Tony Tung
In this paper, we explore the benefits of incorporating depth observations in the reconstruction process by introducing ANIM, a novel method that reconstructs arbitrary 3D human shapes from single-view RGB-D images with an unprecedented level of accuracy.
no code implementations • 25 Oct 2023 • Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa
In the context of Audio Visual Question Answering (AVQA) tasks, the audio visual modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic.
Ranked #3 on Audio-visual Question Answering on MUSIC-AVQA
Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +2
no code implementations • 9 Aug 2023 • Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton
To address this issue, we (i) embed relative positional encoding in the self-attention mechanism and (ii) exploit multi-scale temporal relationships by designing a novel non hierarchical network, in contrast to the recent transformer-based approaches that use a hierarchical structure.
Ranked #1 on Action Detection on MultiTHUMOS
no code implementations • 26 Mar 2023 • Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa
Generating grammatically and semantically correct captions in video captioning is a challenging task.
Ranked #3 on Video Captioning on MSVD-CTN
1 code implementation • 23 Aug 2022 • Marco Pesavento, Marco Volino, Adrian Hilton
The approach overcomes limitations of existing approaches that reconstruct 3D human shape from a single image, which require high-resolution images together with auxiliary data such as surface normal or a parametric model to reconstruct high-detail shape.
no code implementations • 7 Mar 2022 • Davide Berghi, Adrian Hilton, Philip J. B. Jackson
We propose to generate weak labels using a pre-trained active speaker detector on pre-extracted face tracks.
no code implementations • 31 Aug 2021 • Marco Pesavento, Marco Volino, Adrian Hilton
Typically the requirement to frame cameras to capture the volume of a dynamic performance ($>50m^3$) results in the person occupying only a small proportion $<$ 10% of the field of view.
1 code implementation • ICCV 2021 • Marco Pesavento, Marco Volino, Adrian Hilton
A novel hierarchical attention-based sampling approach is introduced to learn the similarity between low-resolution image features and multiple reference images based on a perceptual loss.
no code implementations • 31 Jul 2021 • Moira Shooter, Charles Malleson, Adrian Hilton
Estimating the pose of animals can facilitate the understanding of animal motion which is fundamental in disciplines such as biomechanics, neuroscience, ethology, robotics and the entertainment industry.
Ranked #1 on Animal Pose Estimation on StanfordExtra
no code implementations • 19 Apr 2021 • Akin Caliskan, Armin Mustafa, Adrian Hilton
We present a novel method to learn temporally consistent 3D reconstruction of clothed people from a monocular video.
no code implementations • CVPR 2021 • Armin Mustafa, Akin Caliskan, Lourdes Agapito, Adrian Hilton
We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image.
no code implementations • 29 Sep 2020 • Akin Caliskan, Armin Mustafa, Evren Imre, Adrian Hilton
This paper introduces two advances to overcome this limitation: firstly a new synthetic dataset of realistic clothed people, 3DVH; and secondly, a novel multiple-view loss function for training of monocular volumetric shape estimation, which is demonstrated to significantly improve generalisation and reconstruction accuracy.
no code implementations • 11 Sep 2020 • Jinghua Wang, Adrian Hilton, Jianmin Jiang
This paper proposes a new network structure for unsupervised deep representation learning based on spectral analysis, which is a popular technique with solid theory foundations.
no code implementations • 2 Oct 2019 • Akin Caliskan, Armin Mustafa, Evren Imre, Adrian Hilton
We show that it is possible to learn stereo matching from synthetic people dataset and improve performance on real datasets for stereo reconstruction of people from narrow and wide baseline stereo data.
no code implementations • 8 Aug 2019 • Andrew Gilbert, Matthew Trumble, Adrian Hilton, John Collomosse
We aim to simultaneously estimate the 3D articulated pose and high fidelity volumetric occupancy of human performance, from multiple viewpoint video (MVV) with as few as two views.
Ranked #176 on 3D Human Pose Estimation on Human3.6M
1 code implementation • 8 Aug 2019 • Aloisio Dourado, Teofilo Emidio de Campos, Hansung Kim, Adrian Hilton
Semantic scene completion is the task of predicting a complete 3D representation of volumetric occupancy with corresponding semantic labels for a scene from a single point of view.
Ranked #22 on 3D Semantic Scene Completion on NYUv2
no code implementations • ICCV 2019 • Armin Mustafa, Chris Russell, Adrian Hilton
We introduce the first approach to solve the challenging problem of unsupervised 4D visual scene understanding for complex dynamic scenes with multiple interacting people from multi-view video.
no code implementations • 18 Jul 2019 • Armin Mustafa, Marco Volino, Hansung Kim, Jean-yves Guillemaut, Adrian Hilton
Existing techniques for dynamic scene reconstruction from multiple wide-baseline cameras primarily focus on reconstruction in controlled environments, with fixed calibrated cameras and strong prior constraints.
no code implementations • ECCV 2018 • Andrew Gilbert, Marco Volino, John Collomosse, Adrian Hilton
We present a convolutional autoencoder that enables high fidelity volumetric reconstructions of human performance to be captured from multi-view video comprising only a small set of camera views.
no code implementations • ECCV 2018 • Matthew Trumble, Andrew Gilbert, Adrian Hilton, John Collomosse
We present a method for simultaneously estimating 3D human pose and body shape from a sparse set of wide-baseline camera views.
Ranked #9 on 3D Human Pose Estimation on Total Capture
no code implementations • 30 Apr 2018 • Armin Mustafa, Marco Volino, Jean-yves Guillemaut, Adrian Hilton
Evaluation of the proposed light-field scene flow against existing multi-view dense correspondence approaches demonstrates a significant improvement in accuracy of temporal coherence.
no code implementations • 13 Feb 2018 • Andre Bernardes Soares Guedes, Teofilo Emidio de Campos, Adrian Hilton
Semantic scene completion is the task of producing a complete 3D voxel representation of volumetric occupancy with semantic labels for a scene from a single-view observation.
Ranked #23 on 3D Semantic Scene Completion on NYUv2
no code implementations • BMVC 2017 2017 • Matthew Trumble, Andrew Gilbert, Charles Malleson, Adrian Hilton, and John Collomosse
We incorporate this model within a dual stream network integrating pose embeddings derived from MVV and a forward kinematic solve of the IMU data.
Ranked #11 on 3D Human Pose Estimation on Total Capture
no code implementations • CVPR 2017 • Armin Mustafa, Adrian Hilton
Semantic co-segmentation exploits the coherence in semantic class labels both spatially, between views at a single time instant, and temporally, between widely spaced time instants of dynamic objects with similar shape and appearance.
no code implementations • CVPR 2016 • Armin Mustafa, Hansung Kim, Jean-yves Guillemaut, Adrian Hilton
Sparse-to-dense temporal correspondence is integrated with joint multi-view segmentation and reconstruction to obtain a complete 4D representation of static and dynamic objects.
no code implementations • ICCV 2015 • Charles Malleson, Jean-Charles Bazin, Oliver Wang, Derek Bradley, Thabo Beeler, Adrian Hilton, Alexander Sorkine-Hornung
We present a method to continuously blend between multiple facial performances of an actor, which can contain different facial expressions or emotional states.
no code implementations • ICCV 2015 • Armin Mustafa, Hansung Kim, Jean-yves Guillemaut, Adrian Hilton
The primary contributions of this paper are twofold: an automatic method for initial coarse dynamic scene segmentation and reconstruction without prior knowledge of background appearance or structure; and a general robust approach for joint segmentation refinement and dense reconstruction of dynamic scenes from multiple wide-baseline static or moving cameras.