Our paper focuses on automating the generation of medical reports from chest X-ray image inputs, a critical yet time-consuming task for radiologists.
Event camera is an emerging imaging sensor for capturing dynamics of moving objects as events, which motivates our work in estimating 3D human pose and shape from the event signals.
This paper focuses on a new problem of estimating human pose and shape from single polarization images.
The pipeline of our approach starts by reconstructing and refining a 3-D mesh representation of the object of interest from an input image; its control joints are predicted by exploiting the semantic part segmentation information; the obtained object 3-D mesh is then rigged \& animated by non-rigid deformation, and rendered to perform in-situ motions in its original image space.
Relying on large amount of dataset with ground-truth annotations, recent learning-based approaches predict correspondences for every vertice on the point cloud; Chamfer distance is usually used to minimize the distance between a deformed template model and the input point cloud.
A strong visual object tracker nowadays relies on its well-crafted modules, which typically consist of manually-designed network architectures to deliver high-quality tracking results.
Complex backgrounds and similar appearances between objects and their surroundings are generally recognized as challenging scenarios in Salient Object Detection (SOD).
To our knowledge, our work is the first in producing calibrated predictions under different expertise levels for medical image segmentation.
High-resolution 3D medical images are important for analysis and diagnosis, but axial scanning to acquire them is very time-consuming.
Action recognition is a relatively established task, where givenan input sequence of human motion, the goal is to predict its ac-tion category.
Inspired by the recent advances in human shape estimation from single color images, in this paper, we attempt at estimating human body shapes by leveraging the geometric cues from single polarization images.
First, based on a generative human template, for every two frames having sufficient overlap, an initial pairwise alignment is performed; It is followed by a global non-rigid registration procedure, in which partial results from RGBD frames are collected into a unified 3D shape, under the guidance of correspondences from the pairwise alignment; Finally, the texture map of the reconstructed human model is optimized to deliver a clear and spatially consistent texture.
To address this problem, we introduce a context-aware IoU-guided tracker (COMET) that exploits a multitask two-stream network and an offline reference proposal generation strategy.
Polarization images are known to be able to capture polarized reflected lights that preserve rich geometric cues of an object, which has motivated its recent applications in reconstructing detailed surface normal of the objects of interest.
Generative adversarial networks (GANs), famous for the capability of learning complex underlying data distribution, are however known to be tricky in the training process, which would probably result in mode collapse or performance deterioration.
Existing methods usually perform feature selection and outlier scoring separately, which would select feature subsets that may not optimally serve for outlier detection, leading to unsatisfying performance.
Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized.
In this paper, a novel wavelet driven deep neural network termed as WaveletKernelNet (WKN) is presented, where a continuous wavelet convolutional (CWConv) layer is designed to replace the first convolutional layer of the standard CNN.
Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal.
A major bottleneck of pedestrian detection lies on the sharp performance deterioration in the presence of small-size pedestrians that are relatively far from the camera.
This paper aims at synthesizing filamentary structured images such as retinal fundus images and neuronal images, as follows: Given a ground-truth, to generate multiple realistic looking phantoms.
Our model first takes a correction step on the grossly corrupted responses via geodesic curves on the manifold, and then performs multivariate linear regression on the corrected data.
The implementation of our approach and comparison methods as well as the involved datasets are made publicly available in support of the open-source and reproducible research initiatives.
We propose in this paper an atomic action-based Bayesian model that constructs Allen's interval relation networks to characterize complex activities with structural varieties in a probabilistic generative way: By introducing latent variables from the Chinese restaurant process, our approach is able to capture all possible styles of a particular complex activity as a unique set of distributions over atomic actions and relations.
This paper focuses on the challenging problem of 3D pose estimation of a diverse spectrum of articulated objects from single depth images.
Pose estimation, tracking, and action recognition of articulated objects from depth images are important and challenging problems, which are normally considered separately.
Detecting hand actions from ego-centric depth sequences is a practically challenging problem, owing mostly to the complex and dexterous nature of hand articulations as well as non-stationary camera motion.
We focus on the challenging problem of efficient mouse 3D pose estimation based on static images, and especially single depth images.
In this paper we consider the problem of graph-based transductive classification, and we are particularly interested in the directed graph scenario which is a natural form for many real world applications.