However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions.
Considering the complexity of hair structure, we innovatively treat hair wisp extraction as an instance segmentation problem, where a hair wisp is referred to as an instance.
Abdominal organ and tumour segmentation has many important clinical applications, such as organ quantification, surgical planning, and disease diagnosis.
Therefore, 2D DSA segmentation methods are unable to capture the complete IA information and treatment of cerebrovascular diseases.
Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image while modeling the complex relationships between labels and image regions.
1 code implementation • 10 Aug 2023 • Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Shihao Ma, Adamo Young, Cheng Zhu, Kangkang Meng, Xin Yang, Ziyan Huang, Fan Zhang, Wentao Liu, YuanKe Pan, Shoujin Huang, Jiacheng Wang, Mingze Sun, Weixin Xu, Dengqiang Jia, Jae Won Choi, Natália Alves, Bram de Wilde, Gregor Koehler, Yajun Wu, Manuel Wiesenfarth, Qiongjie Zhu, Guoqiang Dong, Jian He, the FLARE Challenge Consortium, Bo wang
The best-performing algorithms successfully generalized to holdout external validation sets, achieving a median DSC of 89. 5\%, 90. 9\%, and 88. 3\% on North American, European, and Asian cohorts, respectively.
Automatic segmentation of the intracranial artery (IA) in digital subtraction angiography (DSA) sequence is an essential step in diagnosing IA-related diseases and guiding neuro-interventional surgery.
The embeddings of regions in a bag are treated as embeddings of words in a sentence, and they are sent to the text encoder of a VLM to obtain the bag-of-regions embedding, which is learned to be aligned to the corresponding features extracted by a frozen VLM.
Ranked #5 on Open Vocabulary Object Detection on MSCOCO (using extra training data)
We propose a single-network approach, termed ZoomNet, to take into account the hierarchical structure of the full human body and solve the scale variation of different body parts.
Ranked #2 on 2D Human Pose Estimation on COCO-WholeBody
The goal of this paper is to interactively refine the automatic segmentation on challenging structures that fall behind human performance, either due to the scarcity of available annotations or the difficulty nature of the problem itself, for example, on segmenting cancer or small organs.
Human pose estimation aims to accurately estimate a wide variety of human poses.
Abdominal organ segmentation has many important clinical applications, such as organ quantification, surgical planning, and disease diagnosis.
Unlike most previous works that directly predict the 3D poses of two interacting hands simultaneously, we propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.
In this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition.
Ranked #4 on 2D Pose Estimation on MP-100
Vision transformers have achieved great successes in many computer vision tasks.
Ranked #4 on 2D Human Pose Estimation on COCO-WholeBody
In this paper, we propose a novel hybrid architecture for medical image segmentation called PHTrans, which parallelly hybridizes Transformer and CNN in main building blocks to produce hierarchical representations from global and local features and adaptively aggregate them, aiming to fully exploit their strengths to obtain better segmentation performance.
Ranked #4 on Medical Image Segmentation on Synapse multi-organ CT
PL approaches apply pseudo-labels to unlabeled data, and then train the model with a combination of the labeled and pseudo-labeled data iteratively.
Prior plays an important role in providing the plausible constraint on human motion.
In this work, we first build a large 3D point cloud database for subjective and objective quality assessment of point clouds.
Based on this observation, we propose a novel normalization method called " HDR calibration " for HDR images stored in relative luminance, calibrating HDR images into a similar luminance scale according to the LDR images.
Following the top-down paradigm, we decompose the task into two stages, i. e. person localization and pose estimation.
Ranked #2 on 3D Multi-Person Pose Estimation on Panoptic (using extra training data)
In light of this, we propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution.
Ranked #57 on 3D Human Pose Estimation on Human3.6M
Human pose estimation has achieved significant progress in recent years.
Ranked #26 on Pose Estimation on COCO test-dev (using extra training data)
However, unlike human vision that is robust to various data corruptions such as blur and pixelation, current pose estimators are easily confused by these corruptions.
Image quality assessment (IQA) models aim to establish a quantitative relationship between visual images and their perceptual quality by human observers.
Recovering multi-person 3D poses with absolute scales from a single RGB image is a challenging problem due to the inherent depth and scale ambiguity from a single view.
Ranked #11 on 3D Multi-Person Pose Estimation (absolute) on MuPoTS-3D
The HMOR encodes interaction information as the ordinal relations of depths and angles hierarchically, which captures the body-part and joint level semantic and maintains global consistency at the same time.
The modules of HGG can be trained end-to-end with the keypoint detection network and is able to supervise the grouping process in a hierarchical manner.
Ranked #3 on Keypoint Detection on OCHuman
This paper investigates the task of 2D human whole-body pose estimation, which aims to localize dense landmarks on the entire human body including face, hands, body, and feet.
Ranked #8 on 2D Human Pose Estimation on COCO-WholeBody
This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i. e. a 2D space used for texture mapping of 3D mesh).
Ranked #1 on 3D Human Reconstruction on Surreal
We introduce a new benchmark dataset for face video forgery detection, of unprecedented quality.
Then a joint-training strategy is proposed to deal with the domain gaps between multiple data sources and formats in webly-supervised learning.
Ranked #5 on Action Recognition on UCF101 (using extra training data)
In this paper, we introduce body part segmentation as critical supervision.
Ranked #84 on 3D Human Pose Estimation on Human3.6M (PA-MPJPE metric)
In this paper, we propose the Triplet Representation for Body (TRB) -- a compact 2D human body representation, with skeleton keypoints capturing human pose information and contour keypoints containing human shape information.
Our lightweight setup allows operations in uncontrolled environments, and lends itself to telepresence applications such as video-conferencing from dynamic environments.
One of the biggest challenges in learning BIQA models is the conflict between the gigantic image space (which is in the dimension of the number of image pixels) and the extremely limited reliable ground truth data for training.
Recent studies have shown remarkable advances in 3D human pose estimation from monocular images, with the help of large-scale in-door 3D datasets and sophisticated network architectures.
Human-object interactions (HOI) recognition and pose estimation are two closely related tasks.
In real-world applications, e. g. law enforcement and video retrieval, one often needs to search a certain person in long videos with just one portrait.
In this paper, we propose a two-stage depth ranking based method (DRPose3D) to tackle the problem of 3D human pose estimation.