no code implementations • 4 Nov 2024 • Jie Yang, Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Ruimao Zhang
To bridge this gap, we introduce the novel challenge of Semantic Keypoint Comprehension, which aims to comprehend keypoints across different task scenarios, including keypoint semantic understanding, visual prompt-based keypoint detection, and textual prompt-based keypoint detection.
no code implementations • 26 Sep 2024 • Xin Cai, Zhiyuan You, Hailong Zhang, Wentao Liu, Jinwei Gu, Tianfan Xue
By conditioning on the low-frequency content retrieved in the first stage, the diffusion model effectively reconstructs the high-frequency details that are typically lost in the lensless imaging process, while also maintaining image fidelity.
1 code implementation • 4 Sep 2024 • Wentao Liu, Qianjun Pan, Yi Zhang, Zhuo Liu, Ji Wu, Jie zhou, Aimin Zhou, Qin Chen, Bo Jiang, Liang He
We train our model using three stages, including foundational pre-training, foundational fine-tuning, and mathematical fine-tuning.
1 code implementation • 7 Aug 2024 • Tianfang Zhang, Lei LI, Yang Zhou, Wentao Liu, Chen Qian, Xiangyang Ji
In this paper, we introduce CAS-ViT: Convolutional Additive Self-attention Vision Transformers, to achieve a balance between efficiency and performance in mobile applications.
Ranked #513 on Image Classification on ImageNet
1 code implementation • 16 Jul 2024 • Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang
Our dynamic tokens possess two crucial characteristics: (1) Representing image regions with similar semantic meanings using the same vision token, even if those regions are not adjacent, and (2) concentrating on regions with valuable details and represent them using fine tokens.
1 code implementation • 14 Jul 2024 • Yi Zhang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu
With multi-modal joint training, our model achieves state-of-the-art performance on a wide range of pedestrian detection benchmarks, surpassing leading models tailored for specific sensor modality.
Ranked #1 on Object Detection on STCrowd
1 code implementation • 9 Jun 2024 • Size Wu, Sheng Jin, Wenwei Zhang, Lumin Xu, Wentao Liu, Wei Li, Chen Change Loy
To address this issue, we present F-LMM -- grounding frozen off-the-shelf LMMs in human-AI conversations -- a straightforward yet effective design based on the fact that word-pixel correspondences conducive to visual grounding inherently exist in the attention weights of well-trained LMMs.
1 code implementation • 6 Jun 2024 • Xin Cai, Hailong Zhang, Chenchen Wang, Wentao Liu, Jinwei Gu, Tianfan Xue
Lensless cameras, innovatively replacing traditional lenses for ultra-thin, flat optics, encode light directly onto sensors, producing images that are not immediately recognizable.
no code implementations • 14 May 2024 • Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang, Zeliang Ma, Dengyi Ji, Haiwen Li, Xingliang Huang, Yu Tian, Genghua Kou, Fan Jia, Yingfei Liu, Tiancai Wang, Ying Li, Xiaoshuai Hao, Yifan Yang, HUI ZHANG, Mengchuan Wei, Yi Zhou, Haimei Zhao, Jing Zhang, Jinke Li, Xiao He, Xiaoqiang Cheng, Bingyang Zhang, Lirong Zhao, Dianlei Ding, Fangsheng Liu, Yixiang Yan, Hongming Wang, Nanfei Ye, Lun Luo, Yubo Tian, Yiwei Zuo, Zhe Cao, Yi Ren, Yunfan Li, Wenjie Liu, Xun Wu, Yifan Mao, Ming Li, Jian Liu, Jiayang Liu, Zihan Qin, Cunxi Chu, Jialei Xu, Wenbo Zhao, Junjun Jiang, Xianming Liu, Ziyan Wang, Chiwei Li, Shilong Li, Chendong Yuan, Songyue Yang, Wentao Liu, Peng Chen, Bin Zhou, YuBo Wang, Chi Zhang, Jianhang Sun, Hai Chen, Xiao Yang, Lizhong Wang, Dongyi Fu, Yongchun Lin, Huitong Yang, Haoang Li, Yadan Luo, Xianjing Cheng, Yong Xu
In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles.
1 code implementation • 30 Apr 2024 • Sheng Jin, Ruijie Yao, Lumin Xu, Wentao Liu, Chen Qian, Ji Wu, Ping Luo
In this paper, we propose UniFS, a universal few-shot instance perception model that unifies a wide range of instance perception tasks by reformulating them into a dynamic point representation learning framework.
Ranked #1 on Few-Shot Object Detection on MS-COCO (5-shot)
no code implementations • 9 Mar 2024 • Wentao Liu, Bowen Liang, Weijin Xu, Tong Tian, Qingsheng Lu, Xipeng Pan, Haoyuan Li, Siyu Tian, Huihua Yang, Ruisheng Su
In this paper, we propose an unsupervised method, UDCR, for aortic DSA/CTA rigid registration based on deep reinforcement learning.
no code implementations • 4 Mar 2024 • Jingyu Gong, Min Wang, Wentao Liu, Chen Qian, Zhizhong Zhang, Yuan Xie, Lizhuang Ma
To handle this problem, we propose the first Dynamic Environment MOtion Synthesis framework (DEMOS) to predict future motion instantly according to the current scene, and use it to dynamically update the latent motion for final motion synthesis.
no code implementations • 23 Feb 2024 • Zekang Yang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu
Automated machine learning (AutoML) is a collection of techniques designed to automate the machine learning development process.
no code implementations • 9 Jan 2024 • Weijin Xu, Zhuang Sha, Huihua Yang, Rongcai Jiang, Zhanying Li, Wentao Liu, Ruisheng Su
Hemorrhagic Stroke (HS) has a rapid onset and is a serious condition that poses a great health threat.
no code implementations • CVPR 2024 • Chen Zhang, Wencheng Han, Yang Zhou, Jianbing Shen, Cheng-Zhong Xu, Wentao Liu
These methods utilize both the metadata and the sRGB image to perform sRGB-to-RAW de-rendering and recover high-quality single-frame RAW data.
1 code implementation • 18 Dec 2023 • Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Wentao Liu, Chen Change Loy
Our experimental results demonstrate that CLIM improves different baseline open-vocabulary object detectors by a large margin on both OV-COCO and OV-LVIS benchmarks.
Ranked #8 on Open Vocabulary Object Detection on LVIS v1.0
no code implementations • 12 Dec 2023 • Wentao Liu, Hanglei Hu, Jie zhou, Yuyang Ding, Junsong Li, Jiayi Zeng, Mengliang He, Qin Chen, Bo Jiang, Aimin Zhou, Liang He
In recent years, there has been remarkable progress in leveraging Language Models (LMs), encompassing Pre-trained Language Models (PLMs) and Large-scale Language Models (LLMs), within the domain of mathematics.
1 code implementation • 9 Dec 2023 • Sheng Jin, Shuhuai Li, Tong Li, Wentao Liu, Chen Qian, Ping Luo
Human-centric perception (e. g. detection, segmentation, pose estimation, and attribute analysis) is a long-standing problem for computer vision.
1 code implementation • 2 Oct 2023 • Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy
However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions.
no code implementations • ICCV 2023 • Wenpeng Xiao, Wentao Liu, Yitong Wang, Bernard Ghanem, Bing Li
Considering the complexity of hair structure, we innovatively treat hair wisp extraction as an instance segmentation problem, where a hair wisp is referred to as an instance.
no code implementations • 11 Sep 2023 • Wentao Liu, Tong Tian, Weijin Xu, Lemeng Wang, Haoyuan Li, Huihua Yang
Abdominal organ and tumour segmentation has many important clinical applications, such as organ quantification, surgical planning, and disease diagnosis.
no code implementations • 7 Sep 2023 • Lemeng Wang, Wentao Liu, Weijin Xu, Haoyuan Li, Huihua Yang, Feng Gao
Therefore, 2D DSA segmentation methods are unable to capture the complete IA information and treatment of cerebrovascular diseases.
1 code implementation • 28 Aug 2023 • Ruijie Yao, Sheng Jin, Lumin Xu, Wang Zeng, Wentao Liu, Chen Qian, Ping Luo, Ji Wu
Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image while modeling the complex relationships between labels and image regions.
Ranked #3 on Multi-Label Classification on PASCAL VOC 2007
2 code implementations • 10 Aug 2023 • Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Shihao Ma, Adamo Young, Cheng Zhu, Kangkang Meng, Xin Yang, Ziyan Huang, Fan Zhang, Wentao Liu, YuanKe Pan, Shoujin Huang, Jiacheng Wang, Mingze Sun, Weixin Xu, Dengqiang Jia, Jae Won Choi, Natália Alves, Bram de Wilde, Gregor Koehler, Yajun Wu, Manuel Wiesenfarth, Qiongjie Zhu, Guoqiang Dong, Jian He, the FLARE Challenge Consortium, Bo wang
The best-performing algorithms successfully generalized to holdout external validation sets, achieving a median DSC of 89. 5\%, 90. 9\%, and 88. 3\% on North American, European, and Asian cohorts, respectively.
1 code implementation • 21 Jun 2023 • Wentao Liu, Tong Tian, Lemeng Wang, Weijin Xu, Lei LI, Haoyuan Li, Wenyi Zhao, Siyu Tian, Xipeng Pan, Huihua Yang, Feng Gao, Yiming Deng, Xin Yang, Ruisheng Su
In this work, we introduce DIAS, a dataset specifically developed for IA segmentation in DSA sequences.
1 code implementation • CVPR 2023 • Size Wu, Wenwei Zhang, Sheng Jin, Wentao Liu, Chen Change Loy
The embeddings of regions in a bag are treated as embeddings of words in a sentence, and they are sent to the text encoder of a VLM to obtain the bag-of-regions embedding, which is learned to be aligned to the corresponding features extracted by a frozen VLM.
Ranked #8 on Open Vocabulary Object Detection on MSCOCO (using extra training data)
1 code implementation • 23 Aug 2022 • Lumin Xu, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang
We propose a single-network approach, termed ZoomNet, to take into account the hierarchical structure of the full human body and solve the scale variation of different body parts.
Ranked #5 on 2D Human Pose Estimation on COCO-WholeBody
no code implementations • 20 Aug 2022 • Wentao Liu, Chaofan Ma, Yuhuan Yang, Weidi Xie, Ya zhang
The goal of this paper is to interactively refine the automatic segmentation on challenging structures that fall behind human performance, either due to the scarcity of available annotations or the difficulty nature of the problem itself, for example, on segmenting cancer or small organs.
1 code implementation • 16 Aug 2022 • Wentao Jiang, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Si Liu
Human pose estimation aims to accurately estimate a wide variety of human poses.
2 code implementations • 23 Jul 2022 • Wentao Liu, Weijin Xu, Songlin Yan, Lemeng Wang, Haoyuan Li, Huihua Yang
Abdominal organ segmentation has many important clinical applications, such as organ quantification, surgical planning, and disease diagnosis.
1 code implementation • 22 Jul 2022 • Hao Meng, Sheng Jin, Wentao Liu, Chen Qian, Mengxiang Lin, Wanli Ouyang, Ping Luo
Unlike most previous works that directly predict the 3D poses of two interacting hands simultaneously, we propose to decompose the challenging interacting hand pose estimation task and estimate the pose of each hand separately.
1 code implementation • 21 Jul 2022 • Lumin Xu, Sheng Jin, Wang Zeng, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang
In this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition.
Ranked #1 on Category-Agnostic Pose Estimation on MP100
1 code implementation • CVPR 2022 • Wang Zeng, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang
Vision transformers have achieved great successes in many computer vision tasks.
Ranked #7 on 2D Human Pose Estimation on COCO-WholeBody
2 code implementations • 9 Mar 2022 • Wentao Liu, Tong Tian, Weijin Xu, Huihua Yang, Xipeng Pan, Songlin Yan, Lemeng Wang
In this paper, we propose a novel hybrid architecture for medical image segmentation called PHTrans, which parallelly hybridizes Transformer and CNN in main building blocks to produce hierarchical representations from global and local features and adaptively aggregate them, aiming to fully exploit their strengths to obtain better segmentation performance.
no code implementations • ICLR 2022 • Can Wang, Sheng Jin, Yingda Guan, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang
PL approaches apply pseudo-labels to unlabeled data, and then train the model with a combination of the labeled and pseudo-labeled data iteratively.
1 code implementation • 25 Nov 2021 • Jiachen Xu, Min Wang, Jingyu Gong, Wentao Liu, Chen Qian, Yuan Xie, Lizhuang Ma
Prior plays an important role in providing the plausible constraint on human motion.
1 code implementation • 10 Nov 2021 • Honglei Su, Qi Liu, Zhengfang Duanmu, Wentao Liu, Zhou Wang
In this work, we first build a large 3D point cloud database for subjective and objective quality assessment of point clouds.
1 code implementation • 14 Sep 2021 • Hanning Yu, Wentao Liu, Chengjiang Long, Bo Dong, Qin Zou, Chunxia Xiao
Based on this observation, we propose a novel normalization method called " HDR calibration " for HDR images stored in relative luminance, calibrating HDR images into a similar luminance scale according to the LDR images.
1 code implementation • ICCV 2021 • Size Wu, Sheng Jin, Wentao Liu, Lei Bai, Chen Qian, Dong Liu, Wanli Ouyang
Following the top-down paradigm, we decompose the task into two stages, i. e. person localization and pose estimation.
Ranked #2 on 3D Multi-Person Pose Estimation on Panoptic (using extra training data)
no code implementations • 8 Aug 2021 • Rongrong Gao, Na Fan, Changlin Li, Wentao Liu, Qifeng Chen
We present a novel approach to joint depth and normal estimation for time-of-flight (ToF) sensors.
4 code implementations • ICCV 2021 • Jiefeng Li, Siyuan Bian, Ailing Zeng, Can Wang, Bo Pang, Wentao Liu, Cewu Lu
In light of this, we propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution.
Ranked #10 on Pose Estimation on COCO val2017
4 code implementations • CVPR 2021 • Lumin Xu, Yingda Guan, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang
Human pose estimation has achieved significant progress in recent years.
Ranked #23 on Pose Estimation on COCO test-dev
1 code implementation • CVPR 2021 • Jiahang Wang, Sheng Jin, Wentao Liu, Weizhong Liu, Chen Qian, Ping Luo
However, unlike human vision that is robust to various data corruptions such as blur and pixelation, current pose estimators are easily confused by these corruptions.
no code implementations • 30 Jan 2021 • Zhengfang Duanmu, Wentao Liu, Zhongling Wang, Zhou Wang
Image quality assessment (IQA) models aim to establish a quantitative relationship between visual images and their perceptual quality by human observers.
1 code implementation • ECCV 2020 • Jianan Zhen, Qi Fang, Jiaming Sun, Wentao Liu, Wei Jiang, Hujun Bao, Xiaowei Zhou
Recovering multi-person 3D poses with absolute scales from a single RGB image is a challenging problem due to the inherent depth and scale ambiguity from a single view.
Ranked #11 on 3D Multi-Person Pose Estimation (absolute) on MuPoTS-3D
no code implementations • ECCV 2020 • Jiefeng Li, Can Wang, Wentao Liu, Chen Qian, Cewu Lu
The HMOR encodes interaction information as the ordinal relations of depths and angles hierarchically, which captures the body-part and joint level semantic and maintains global consistency at the same time.
3D Multi-Person Pose Estimation (absolute) 3D Multi-Person Pose Estimation (root-relative) +2
no code implementations • ECCV 2020 • Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, Ping Luo
The modules of HGG can be trained end-to-end with the keypoint detection network and is able to supervise the grouping process in a hierarchical manner.
Ranked #3 on Keypoint Detection on OCHuman
2 code implementations • ECCV 2020 • Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo
This paper investigates the task of 2D human whole-body pose estimation, which aims to localize dense landmarks on the entire human body including face, hands, body, and feet.
Ranked #12 on 2D Human Pose Estimation on COCO-WholeBody
3 code implementations • CVPR 2020 • Wang Zeng, Wanli Ouyang, Ping Luo, Wentao Liu, Xiaogang Wang
This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i. e. a 2D space used for texture mapping of 3D mesh).
Ranked #1 on 3D Human Reconstruction on Surreal
no code implementations • 20 May 2020 • Gereon Fox, Wentao Liu, Hyeongwoo Kim, Hans-Peter Seidel, Mohamed Elgharib, Christian Theobalt
We introduce a new benchmark dataset for face video forgery detection, of unprecedented quality.
3 code implementations • ECCV 2020 • Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, Dahua Lin
Then a joint-training strategy is proposed to deal with the domain gaps between multiple data sources and formats in webly-supervised learning.
Ranked #6 on Action Recognition on UCF101 (using extra training data)
no code implementations • 24 Mar 2020 • Min Wang, Feng Qiu, Wentao Liu, Chen Qian, Xiaowei Zhou, Lizhuang Ma
In this paper, we introduce body part segmentation as critical supervision.
Ranked #103 on 3D Human Pose Estimation on Human3.6M (PA-MPJPE metric)
2 code implementations • ICCV 2019 • Haodong Duan, Kwan-Yee Lin, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang
In this paper, we propose the Triplet Representation for Body (TRB) -- a compact 2D human body representation, with skeleton keypoints capturing human pose information and contour keypoints containing human shape information.
Conditional Image Generation Open-Ended Question Answering +1
no code implementations • 26 May 2019 • Mohamed Elgharib, Mallikarjun BR, Ayush Tewari, Hyeongwoo Kim, Wentao Liu, Hans-Peter Seidel, Christian Theobalt
Our lightweight setup allows operations in uncontrolled environments, and lends itself to telepresence applications such as video-conferencing from dynamic environments.
no code implementations • 13 Apr 2019 • Kede Ma, Wentao Liu, Tongliang Liu, Zhou Wang, DaCheng Tao
One of the biggest challenges in learning BIQA models is the conflict between the gigantic image space (which is in the dimension of the number of image pixels) and the extremely limited reliable ground truth data for training.
no code implementations • CVPR 2019 • Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian
Our framework consists of two main components,~\ie~SpatialNet and TemporalNet.
no code implementations • CVPR 2019 • Xipeng Chen, Kwan-Yee Lin, Wentao Liu, Chen Qian, Xiaogang Wang, Liang Lin
Recent studies have shown remarkable advances in 3D human pose estimation from monocular images, with the help of large-scale in-door 3D datasets and sophisticated network architectures.
no code implementations • 15 Mar 2019 • Wei Feng, Wentao Liu, Tong Li, Jing Peng, Chen Qian, Xiaolin Hu
Human-object interactions (HOI) recognition and pose estimation are two closely related tasks.
2 code implementations • ECCV 2018 • Qingqiu Huang, Wentao Liu, Dahua Lin
In real-world applications, e. g. law enforcement and video retrieval, one often needs to search a certain person in long videos with just one portrait.
no code implementations • 23 May 2018 • Min Wang, Xipeng Chen, Wentao Liu, Chen Qian, Liang Lin, Lizhuang Ma
In this paper, we propose a two-stage depth ranking based method (DRPose3D) to tackle the problem of 3D human pose estimation.