Image restoration tasks have witnessed great performance improvement in recent years by developing large deep models.
When motion blur is strong, however, hidden states are hard to deliver proper information due to the displacement between different frames.
However, we argue that occluded regions have strong correlations with hands so that they can provide highly beneficial information for complete 3D hand mesh estimation.
The CVF module can output multiple decomposed variables of the input and take a combination of the outputs back as an input in a cyclic manner.
Extensive studies demonstrate that our method outperforms the other self-supervised and even unpaired denoising methods by a large margin, without using any additional knowledge, e. g., noise level, regarding the underlying unknown noise.
State-of-the-art video deblurring methods often adopt recurrent neural networks to model the temporal dependency between the frames.
In a practical scenario, a noise generator should learn to simulate the general and complex noise distribution without using paired noisy and clean images.
The goal of filter pruning is to search for unimportant filters to remove in order to make convolutional neural networks (CNNs) efficient without sacrificing the performance in the process.
Accurate identification and localization of abnormalities from radiology images serve as a critical role in computer-aided diagnosis (CAD) systems.
3D shape representation and its processing have substantial effects on 3D shape recognition.
Ranked #1 on 3D Object Classification on ModelNet10
The problem lies in that each application and task may require different auxiliary loss function, especially when tasks are diverse and distinct.
Most image super-resolution (SR) methods are developed on synthetic low-resolution (LR) and high-resolution (HR) image pairs that are constructed by a predetermined operation, e. g., bicubic downsampling.
Our experiments demonstrate the superiorities of our method in terms of representation power compared to the state-of-the-art methods in single RGB image 3D shape reconstruction.
no code implementations • 17 May 2021 • Andrey Ignatov, Andres Romero, Heewon Kim, Radu Timofte, Chiu Man Ho, Zibo Meng, Kyoung Mu Lee, Yuxiang Chen, Yutong Wang, Zeyu Long, Chenhao Wang, Yifei Chen, Boshen Xu, Shuhang Gu, Lixin Duan, Wen Li, Wang Bofei, Zhang Diankai, Zheng Chengjian, Liu Shaoli, Gao Si, Zhang Xiaofeng, Lu Kaidi, Xu Tianyu, Zheng Hui, Xinbo Gao, Xiumei Wang, Jiaming Guo, Xueyi Zhou, Hao Jia, Youliang Yan
Video super-resolution has recently become one of the most important mobile-related problems due to the rise of video communication and streaming services.
In this challenge report, we describe the challenge specifics and the evaluation results from the 2 competition tracks with the proposed solutions.
Super-Resolution (SR) is a fundamental computer vision task that aims to obtain a high-resolution clean image from the given low-resolution counterpart.
The supervised reblurring loss at training stage compares the amplified blur between the deblurred and the sharp images.
Second, we propose a joint-based regressor that distinguishes a target person's feature from others.
Video frame interpolation aims to synthesize accurate intermediate frames given a low-frame-rate video.
Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs.
Diverse user preferences over images have recently led to a great amount of interest in controlling the imagery effects for image restoration tasks.
Using Pose2Pose, Hand4Whole utilizes hand MCP joint features to predict 3D wrists as MCP joints largely contribute to 3D wrist rotations in the human kinematic chain.
Our TCMR significantly outperforms previous video-based methods in temporal consistency with better per-frame 3D pose and shape accuracy.
Ranked #13 on 3D Human Pose Estimation on 3DPW (using extra training data)
Most conventional supervised super-resolution (SR) algorithms assume that low-resolution (LR) data is obtained by downscaling high-resolution (HR) data with a fixed known kernel, but such an assumption often does not hold in real scenarios.
Despite its popularity, several recent works question the effectiveness of MAML when test tasks are different from training tasks, thus suggesting various task-conditioned methodology to improve the initialization.
Videos in the real-world contain various dynamics and motions that may look unnaturally discontinuous in time when the recordedframe rate is low.
We tackle the problem of visual localization under changing conditions, such as time of day, weather, and seasons.
Therefore, we firstly propose (1) a large-scale dataset, InterHand2. 6M, and (2) a baseline network, InterNet, for 3D interacting hand pose estimation from a single RGB image.
Most of the recent deep learning-based 3D human pose and mesh estimation methods regress the pose and shape parameters of human mesh models, such as SMPL and MANO, from an input image.
Ranked #4 on 3D Hand Pose Estimation on FreiHAND
We design our system to be trained in an end-to-end and weakly-supervised manner; therefore, it does not require groundtruth meshes.
Most of the previous image-based 3D human pose and mesh estimation methods estimate parameters of the human mesh model from an input image.
Ranked #4 on 3D Hand Pose Estimation on FreiHAND
However, extensive scale variations of the target object and distractor objects with similar categories have consistently posed challenges in visual tracking.
Most current action recognition methods heavily rely on appearance information by taking an RGB sequence of entire image regions as input.
Videos contain various types and strengths of motions that may look unnaturally discontinuous in time when the recorded frame rate is low.
Finally, we show that our meta-learning framework can be easily employed to any video frame interpolation network and can consistently improve its performance on multiple benchmark datasets.
We present an elegant framework of fine-grained neural architecture search (FGNAS), which allows to employ multiple heterogeneous operations within a single layer and can even generate compositional feature maps using several different base operations.
This study presents a new network (i. e., PoseLifter) that can lift a 2D human pose to an absolute 3D pose in a camera coordinate system.
Although significant improvement has been achieved recently in 3D human pose estimation, most of the previous methods only treat a single-person case.
Ranked #2 on Root Joint Localization on Human3.6M
Model-agnostic meta-learning (MAML) tackles the problem by formulating prior knowledge as a common initialization across tasks, which is then used to quickly adapt to unseen tasks.
Multi-person pose estimation from a 2D image is challenging because it requires not only keypoint localization but also human detection.
In this paper, we propose a human pose refinement network that estimates a refined pose from a tuple of an input image and input pose.
Ranked #2 on Multi-Person Pose Estimation on COCO (Test AP metric)
We propose an efficient Stereographic Projection Neural Network (SPNet) for learning representations of 3D objects.
In this paper, we propose a novel method to compress CNNs by reconstructing the network from a small set of spatial convolution kernels.
We propose a novel deep-learning-based system for vessel segmentation.
Ranked #1 on Retinal Vessel Segmentation on HRF
In this paper, we propose an automatic seed generation technique with deep reinforcement learning to solve the interactive segmentation problem.
We propose a novel network that learns a part-aligned representation for person re-identification.
Ranked #4 on Person Re-Identification on UAV-Human
In this paper, we propose a novel on-line visual tracking framework based on the Siamese matching network and meta-learner network, which run at real-time speeds.
2 code implementations • • Shanxin Yuan, Guillermo Garcia-Hernando, Bjorn Stenger, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee, Pavlo Molchanov, Jan Kautz, Sina Honari, Liuhao Ge, Junsong Yuan, Xinghao Chen, Guijin Wang, Fan Yang, Kai Akiyama, Yang Wu, Qingfu Wan, Meysam Madadi, Sergio Escalera, Shile Li, Dongheui Lee, Iason Oikonomidis, Antonis Argyros, Tae-Kyun Kim
Official Torch7 implementation of "V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map", CVPR 2018
Ranked #4 on Hand Pose Estimation on HANDS 2017
Removing camera motion blur from a single light field is a challenging task since it is highly ill-posed inverse problem.
To overcome these weaknesses, we firstly cast the 3D hand and human pose estimation problem from a single depth map into a voxel-to-voxel prediction that uses a 3D voxelized grid and estimates the per-voxel likelihood for each keypoint.
Ranked #2 on Hand Pose Estimation on NYU Hands
The results trained with only 10 strongly annotated images along with weakly annotated images were comparable to results trained from 800 strongly annotated images, with the 95% confidence interval of difference -3. 00%--5. 00%, in terms of the correct localization (CorLoc) measure, which is the ratio of images with intersection over union with ground truth higher than 0. 5.
When a human matches two images, the viewer has a natural tendency to view the wide area around the target pixel to obtain clues of right correspondence.
The conventional methods for estimating camera poses and scene structures from severely blurry or low resolution images often result in failure.
Recent research on super-resolution has progressed with the development of deep convolutional neural networks (DCNN).
We propose a novel approach to 3D human pose estimation from a single depth map.
The unary term of the proposed CRF model is defined based on a powerful heat-map regression network, which has been proposed for 2D human pose estimation.
In this paper, we introduce a novel real-time visual tracking algorithm based on a template selection strategy constructed by deep reinforcement learning methods.
To remove these complicated motion blurs, conventional energy optimization based methods rely on simple assumptions such that blur kernel is partially uniform or locally linear.
Ranked #13 on Deblurring on HIDE (trained on GOPRO)
We also provide a novel analysis on the blur kernel at object boundaries, which shows the distinctive characteristics of the blur kernel that cannot be captured by conventional blur models.
We infer bidirectional optical flows to handle motion blurs, and also estimate Gaussian blur maps to remove optical blur from defocus in our new blur model.
We present a highly accurate single-image super-resolution (SR) method.
Ranked #4 on Image Super-Resolution on WebFace - 8x upscaling
We propose an image super-resolution method (SR) using a deeply-recursive convolutional network (DRCN).
Ranked #15 on Image Super-Resolution on Urban100 - 2x upscaling
By constructing Markov chain on the restricted search space instead of the original solution space, our method approximates the solution effectively.
In this paper, we propose semantic relation transfer, a method to transfer high-order semantic relations of objects from annotated images to unlabeled images analogous to label transfer techniques where label information are transferred.