Monocular 3D human pose estimation technologies have the potential to greatly increase the availability of human movement data.
Ranked #39 on 3D Human Pose Estimation on Human3.6M
With frame-by-frame IK we obtain low errors in the case of bent elbows and knees, however, motion sequences with phases of extended/straight limbs results in ambiguity in twist angle.
Second, we propose a graph neural network architecture that takes this feature-graph and captures the semantic relationship between the different regions of the input image using visual attention.
Video compression (e. g., H. 264, MPEG-4) reduces superfluous information by representing the raw video stream using the concept of Group of Pictures (GOP).
We address the problem of exposure correction of dark, blurry and noisy images captured in low-light conditions in the wild.
To this end, we propose a spAtio-temporal, Channel and moTion excitatION (ACTION) module consisting of three paths: Spatio-Temporal Excitation (STE) path, Channel Excitation (CE) path, and Motion Excitation (ME) path.
In order to enhance the angular resolution of light fields, view synthesis methods can be utilized to generate dense intermediate views from sparse light field input.
Depth map estimation is a crucial task in computer vision, and new approaches have recently emerged taking advantage of light fields, as this new imaging modality captures much more information about the angular direction of light rays compared to common approaches based on stereoscopic images or multi-view.
To tackle this problem, in this paper, we study reducing the number of parameters and computational cost of CNN-based SISR methods while maintaining the accuracy of super-resolution reconstruction performance.
Ranked #2 on Image Super-Resolution on BSDS100 - 2x upscaling
Egocentric gestures are the most natural form of communication for humans to interact with wearable devices such as VR/AR helmets and glasses.
The paper presents the first-ever labelled dataset for a highly dense Aerial Laser Scanning (ALS) point cloud at city-scale.
The success of training deep Convolutional Neural Networks (CNNs) heavily depends on a significant amount of labelled data.
We further introduce a two-column CNN architecture that performs better than the state-of-the-art (SoA) in photographic style classification.
Aesthetic image captioning (AIC) refers to the multi-modal task of generating critical textual feedbacks for photographs.
Measuring the colorfulness of a natural or virtual scene is critical for many applications in image processing field ranging from capturing to display.
An omnidirectional image (ODI) enables viewers to look in every direction from a fixed point through a head-mounted display providing an immersive experience compared to that of a standard image.
In this paper, we address this problem by proposing a fast, parameter-free and scene-adaptable deep tone mapping operator (DeepTMO) that yields a high-resolution and high-subjective quality tone mapped output.
In this paper, we present a new Light Field representation for efficient Light Field processing and rendering called Fourier Disparity Layers (FDL).
In this work, we present a novel pipeline that allows for coupled environment acquisition and virtual object rendering on a mobile device equipped with a depth sensor.
In recent years, light fields have become a major research topic and their applications span across the entire spectrum of classical image processing.
The prediction of Visual Attention data from any kind of media is of valuable use to content creators and used to efficiently drive encoding algorithms.