no code implementations • ECCV 2020 • Xu Yan, Weibing Zhao, Kun Yuan, Ruimao Zhang, Zhen Li, Shuguang Cui
Recovering realistic textures from a largely down-sampled low resolution (LR) image with complicated patterns is a challenging problem in image super-resolution.
1 code implementation • 28 Nov 2023 • Bohao Li, Yuying Ge, Yixiao Ge, Guangzhi Wang, Rui Wang, Ruimao Zhang, Ying Shan
Multimodal large language models (MLLMs), building upon the foundation of powerful large language models (LLMs), have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal inputs (acting like a combination of GPT-4V and DALL-E 3).
no code implementations • 19 Oct 2023 • Shunlin Lu, Ling-Hao Chen, Ailing Zeng, Jing Lin, Ruimao Zhang, Lei Zhang, Heung-Yeung Shum
This work targets a novel text-driven whole-body motion generation task, which takes a given textual description as input and aims at generating high-quality, diverse, and coherent facial expressions, hand gestures, and body motions simultaneously.
1 code implementation • 12 Oct 2023 • Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang
This work proposes a unified framework called UniPose to detect keypoints of any articulated (e. g., human and animal), rigid, and soft objects via visual or textual prompts for fine-grained vision understanding and manipulation.
Ranked #1 on
2D Human Pose Estimation
on Human-Art
(using extra training data)
1 code implementation • ICCV 2023 • Yiran Qin, Chaoqun Wang, Zijian Kang, Ningning Ma, Zhen Li, Ruimao Zhang
In this paper, we propose a novel training strategy called SupFusion, which provides an auxiliary feature level supervision for effective LiDAR-Camera fusion and significantly boosts detection performance.
no code implementations • 12 Sep 2023 • Zihan Zhou, Ruiying Liu, Chaolong Ying, Ruimao Zhang, Tianshu Yu
Molecular conformation generation, a critical aspect of computational chemistry, involves producing the three-dimensional conformer geometry for a given molecule.
1 code implementation • 10 Sep 2023 • Jiong Wang, Fengyu Yang, Wenbo Gou, Bingliang Li, Danqi Yan, Ailing Zeng, Yijun Gao, Junle Wang, Yanqing Jing, Ruimao Zhang
To facilitate the development of 3D pose estimation, we present FreeMan, the first large-scale, multi-view dataset collected under the real-world conditions.
1 code implementation • 23 Aug 2023 • Siyue Yao, MingJie Sun, Bingliang Li, Fengyu Yang, Junle Wang, Ruimao Zhang
In this paper, we introduce a novel multi-dancer synthesis task called partner dancer generation, which involves synthesizing virtual human dancers capable of performing dance with users.
1 code implementation • ICCV 2023 • Jie Yang, Ailing Zeng, Feng Li, Shilong Liu, Ruimao Zhang, Lei Zhang
Click-Pose explores how user feedback can cooperate with a neural keypoint detector to correct the predicted keypoints in an interactive way for a faster and more effective annotation process.
1 code implementation • NeurIPS 2023 • Jing Lin, Ailing Zeng, Shunlin Lu, Yuanhao Cai, Ruimao Zhang, Haoqian Wang, Lei Zhang
In this paper, we present Motion-X, a large-scale 3D expressive whole-body motion dataset.
no code implementations • 6 Jun 2023 • Yuncheng Jiang, Zixun Zhang, Ruimao Zhang, Guanbin Li, Shuguang Cui, Zhen Li
YONA fully exploits the information of one previous adjacent frame and conducts polyp detection on the current frame without multi-frame collaborations.
no code implementations • 26 May 2023 • Rui Sun, Andi Zhang, Haiming Zhang, Jinke Ren, Yao Zhu, Ruimao Zhang, Shuguang Cui, Zhen Li
Specifically, our framework consists of two components: a sample repairing module and a detection module.
Out-of-Distribution Detection
Out of Distribution (OOD) Detection
1 code implementation • 20 May 2023 • Jie Yang, Bingliang Li, Fengyu Yang, Ailing Zeng, Lei Zhang, Ruimao Zhang
Extensive experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i. e., 41. 50 mAP) and zero-shot detection.
Ranked #2 on
Zero-Shot Human-Object Interaction Detection
on HICO-DET
(using extra training data)
Human-Object Interaction Detection
Zero-Shot Human-Object Interaction Detection
no code implementations • 23 Apr 2023 • Xiaozhe Gu, Zixun Zhang, Yuncheng Jiang, Tao Luo, Ruimao Zhang, Shuguang Cui, Zhen Li
Despite the simplicity, stochastic gradient descent (SGD)-like algorithms are successful in training deep neural networks (DNNs).
no code implementations • CVPR 2023 • Jie Yang, Chaoqun Wang, Zhen Li, Junle Wang, Ruimao Zhang
This paper presents Scalable Semantic Transfer (SST), a novel training paradigm, to explore how to leverage the mutual benefits of the data from different label domains (i. e. various levels of label granularity) to train a powerful human parsing network.
2 code implementations • 24 Mar 2023 • Ye Zhu, Jie Yang, Si-Qi Liu, Ruimao Zhang
Semi-supervised medical image segmentation has attracted much attention in recent years because of the high cost of medical image annotations.
3 code implementations • 3 Feb 2023 • Jie Yang, Ailing Zeng, Shilong Liu, Feng Li, Ruimao Zhang, Lei Zhang
This paper presents a novel end-to-end framework with Explicit box Detection for multi-person Pose estimation, called ED-Pose, where it unifies the contextual learning between human-level (global) and keypoint-level (local) information.
Ranked #2 on
2D Human Pose Estimation
on Human-Art
no code implementations • 2 Jan 2023 • Ziyi Tang, Ruimao Zhang, Zhanglin Peng, Jinrui Chen, Liang Lin
We further introduce the Attribute-Aware and Identity-Aware Proxy embedding modules (AAP and IAP) to extract the informative and discriminative feature representations at different stages.
Representation Learning
Video-Based Person Re-Identification
2 code implementations • 9 Oct 2022 • Xu Yan, Heshen Zhan, Chaoda Zheng, Jiantao Gao, Ruimao Zhang, Shuguang Cui, Zhen Li
Specifically, this paper introduces a simple but effective point cloud cross-modality training (PointCMT) strategy, which utilizes view-images, i. e., rendered or projected 2D images of the 3D object, to boost point cloud analysis.
Ranked #10 on
3D Point Cloud Classification
on ModelNet40
2 code implementations • 21 Jul 2022 • Haotian Bai, Ruimao Zhang, Jiong Wang, Xiang Wan
Weakly Supervised Object Localization (WSOL), which aims to localize objects by only using image-level labels, has attracted much attention because of its low annotation cost in real applications.
1 code implementation • 10 Jul 2022 • Xu Yan, Jiantao Gao, Chaoda Zheng, Chao Zheng, Ruimao Zhang, Shenghui Cui, Zhen Li
As camera and LiDAR sensors capture complementary information used in autonomous driving, great efforts have been made to develop semantic segmentation algorithms through multi-modality data fusion.
Ranked #3 on
3D Semantic Segmentation
on SemanticKITTI
1 code implementation • 23 Jun 2022 • Weijie Ma, Ye Zhu, Ruimao Zhang, Jie Yang, Yiwen Hu, Zhen Li, Li Xiang
By aligning the class tokens and spatial attention maps of paired NBI and WL images at different levels, the Transformer achieves the ability to keep both global and local representation consistency for the above two modalities.
no code implementations • 21 Jun 2022 • Jie Yang, Ye Zhu, Chaoqun Wang, Zhen Li, Ruimao Zhang
Integrating multi-modal data to promote medical image analysis has recently gained great attention.
1 code implementation • 16 Jun 2022 • Yuanfeng Ji, Haotian Bai, Jie Yang, Chongjian Ge, Ye Zhu, Ruimao Zhang, Zhen Li, Lingyan Zhang, Wanling Ma, Xiang Wan, Ping Luo
Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods.
no code implementations • 23 May 2022 • Hao Zhang, Ruimao Zhang, Zhanglin Peng, Junle Wang, Yanqing Jing
A simple pixel selection strategy followed with the construction of multi-level contrastive units is introduced to optimize the model for both domain adaptation and active supervised learning.
no code implementations • 13 Jan 2022 • Yuying Ge, Yibing Song, Ruimao Zhang, Ping Luo
Dancing video retargeting aims to synthesize a video that transfers the dance movements from a source video to a target person.
no code implementations • 6 Dec 2021 • Yuying Ge, Ruimao Zhang, Ping Luo
This work proposes a novel framework named MetaCloth via meta-learning, which is able to learn unseen tasks of dense fashion landmark detection with only a few annotated samples.
2 code implementations • ICCV 2021 • Teng Wang, Ruimao Zhang, Zhichao Lu, Feng Zheng, Ran Cheng, Ping Luo
Dense video captioning aims to generate multiple associated captions with their temporal locations from the video.
Ranked #6 on
Dense Video Captioning
on ActivityNet Captions
1 code implementation • 2 Aug 2021 • Jun Wei, Yiwen Hu, Ruimao Zhang, Zhen Li, S. Kevin Zhou, Shuguang Cui
To address the above issues, we propose the Shallow Attention Network (SANet) for polyp segmentation.
Ranked #9 on
Video Polyp Segmentation
on SUN-SEG-Easy (Unseen)
1 code implementation • 8 Jul 2021 • Zhaoyi Yan, Ruimao Zhang, Hongzhi Zhang, Qingfu Zhang, WangMeng Zuo
One of the main issues in this task is how to handle the dramatic scale variations of pedestrians caused by the perspective effect.
1 code implementation • 28 Jun 2021 • Yuanfeng Ji, Ruimao Zhang, Huijie Wang, Zhen Li, Lingyun Wu, Shaoting Zhang, Ping Luo
The recent vision transformer(i. e. for image classification) learns non-local attentive interaction of different patch tokens.
1 code implementation • 5 May 2021 • Enze Xie, Wenhai Wang, Mingyu Ding, Ruimao Zhang, Ping Luo
Extensive experiments demonstrate the effectiveness of both PolarMask and PolarMask++, which achieve competitive results on instance segmentation in the challenging COCO dataset with single-model and single-scale training and testing, as well as new state-of-the-art results on rotate text detection and cell segmentation.
Ranked #78 on
Instance Segmentation
on COCO test-dev
(using extra training data)
1 code implementation • 30 Apr 2021 • Weibing Zhao, Xu Yan, Jiantao Gao, Ruimao Zhang, Jiayan Zhang, Zhen Li, Song Wu, Shuguang Cui
In this paper, we address a fundamental problem in PCSR: How to downsample the dense point cloud with arbitrary scales while preserving the local topology of discarding points in a case-agnostic manner (i. e. without additional storage for point relationship)?
2 code implementations • CVPR 2021 • Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, Ping Luo
A recent pioneering work employed knowledge distillation to reduce the dependency of human parsing, where the try-on images produced by a parser-based method are used as supervisions to train a "student" network without relying on segmentation, making the student mimic the try-on ability of the parser-based model.
Ranked #1 on
Virtual Try-on
on MPV
1 code implementation • ICCV 2021 • Zhihao Yuan, Xu Yan, Yinghong Liao, Ruimao Zhang, Sheng Wang, Zhen Li, Shuguang Cui
Compared with the visual grounding on 2D images, the natural-language-guided 3D object localization on point clouds is more challenging.
2 code implementations • 7 Dec 2020 • Xu Yan, Jiantao Gao, Jie Li, Ruimao Zhang, Zhen Li, Rui Huang, Shuguang Cui
In practice, an initial semantic segmentation (SS) of a single sweep point cloud can be achieved by any appealing network and then flows into the semantic scene completion (SSC) module as the input.
Ranked #3 on
3D Semantic Scene Completion
on SemanticKITTI
3D Semantic Scene Completion from a single RGB image
3D Semantic Segmentation
+3
1 code implementation • 26 Nov 2020 • Weijia Wu, Enze Xie, Ruimao Zhang, Wenhai Wang, Hong Zhou, Ping Luo
For example, without using polygon annotations, PSENet achieves an 80. 5% F-score on TotalText [3] (vs. 80. 9% of fully supervised counterpart), 31. 1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs.
1 code implementation • 10 Nov 2020 • Andrey Ignatov, Radu Timofte, Zhilu Zhang, Ming Liu, Haolin Wang, WangMeng Zuo, Jiawei Zhang, Ruimao Zhang, Zhanglin Peng, Sijie Ren, Linhui Dai, Xiaohong Liu, Chengqi Li, Jun Chen, Yuichi Ito, Bhavya Vasudeva, Puneesh Deora, Umapada Pal, Zhenyu Guo, Yu Zhu, Tian Liang, Chenghua Li, Cong Leng, Zhihong Pan, Baopu Li, Byung-Hoon Kim, Joonyoung Song, Jong Chul Ye, JaeHyun Baek, Magauiya Zhussip, Yeskendir Koishekenov, Hwechul Cho Ye, Xin Liu, Xueying Hu, Jun Jiang, Jinwei Gu, Kai Li, Pengliang Tan, Bingxin Hou
This paper reviews the second AIM learned ISP challenge and provides the description of the proposed solutions and results.
no code implementations • 16 Sep 2020 • Yuanfeng Ji, Ruimao Zhang, Zhen Li, Jiamin Ren, Shaoting Zhang, Ping Luo
Unlike the recent neural architecture search (NAS) methods that typically searched the optimal operators in each network layer, but missed a good strategy to search for feature aggregations, this paper proposes a novel NAS method for 3D medical image segmentation, named UXNet, which searches both the scale-wise feature aggregation strategies as well as the block-wise operators in the encoder-decoder network.
no code implementations • CVPR 2020 • Ruimao Zhang, Zhanglin Peng, Lingyun Wu, Zhen Li, Ping Luo
This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN), which is able to learn different normalization methods for different convolutional layers and image samples of a deep network.
3 code implementations • 12 Mar 2020 • Han Yang, Ruimao Zhang, Xiaobao Guo, Wei Liu, WangMeng Zuo, Ping Luo
First, a semantic layout generation module utilizes semantic segmentation of the reference image to progressively predict the desired semantic layout after try-on.
Ranked #4 on
Virtual Try-on
on VITON
(IS metric)
no code implementations • ICCV 2019 • Zhaoyang Zhang, Jingyu Li, Wenqi Shao, Zhanglin Peng, Ruimao Zhang, Xiaogang Wang, Ping Luo
ResNeXt, still suffers from the sub-optimal performance due to manually defining the number of groups as a constant over all of the layers.
no code implementations • ICCV 2019 • Jiangfan Han, Xiaoyi Dong, Ruimao Zhang, Dong-Dong Chen, Weiming Zhang, Nenghai Yu, Ping Luo, Xiaogang Wang
Recently, generation-based methods have received much attention since they directly use feed-forward networks to generate the adversarial samples, which avoid the time-consuming iterative attacking procedure in optimization-based and gradient-based methods.
no code implementations • 22 Jul 2019 • Ping Luo, Ruimao Zhang, Jiamin Ren, Zhanglin Peng, Jingyu Li
Analyses of SN are also presented to answer the following three questions: (a) Is it useful to allow each normalization layer to select its own normalizer?
1 code implementation • CVPR 2019 • Wenqi Shao, Tianjian Meng, Jingyu Li, Ruimao Zhang, Yudian Li, Xiaogang Wang, Ping Luo
Unlike $\ell_1$ and $\ell_0$ constraints that impose difficulties in optimization, we turn this constrained optimization problem into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax.
5 code implementations • CVPR 2019 • Yuying Ge, Ruimao Zhang, Lingyun Wu, Xiaogang Wang, Xiaoou Tang, Ping Luo
A strong baseline is proposed, called Match R-CNN, which builds upon Mask R-CNN to solve the above four tasks in an end-to-end manner.
no code implementations • 19 Nov 2018 • Ping Luo, Zhanglin Peng, Jiamin Ren, Ruimao Zhang
Our results suggest that (1) using distinct normalizers improves both learning and generalization of a ConvNet; (2) the choices of normalizers are more related to depth and batch size, but less relevant to parameter initialization, learning rate decay, and solver; (3) different tasks and datasets have different behaviors when learning to select normalizers.
no code implementations • 10 Oct 2018 • Lili Huang, Jiefeng Peng, Ruimao Zhang, Guanbin Li, Liang Lin
Semantic image parsing, which refers to the process of decomposing images into semantic regions and constructing the structure representation of the input, has recently aroused widespread interest in the field of computer vision.
no code implementations • 1 Sep 2018 • Lingbo Liu, Ruimao Zhang, Jiefeng Peng, Guanbin Li, Bowen Du, Liang Lin
Traffic flow prediction is crucial for urban traffic management and public safety.
no code implementations • 16 Jul 2018 • Ruimao Zhang, Hongbin Sun, Jingyu Li, Yuying Ge, Liang Lin, Ping Luo, Xiaogang Wang
To address the above issues, we present a novel and practical deep architecture for video person re-identification termed Self-and-Collaborative Attention Network (SCAN).
3 code implementations • ICLR 2019 • Ping Luo, Jiamin Ren, Zhanglin Peng, Ruimao Zhang, Jingyu Li
We hope SN will help ease the usage and understand the normalization techniques in deep learning.
no code implementations • 27 Sep 2017 • Ruimao Zhang, Liang Lin, Guangrun Wang, Meng Wang, WangMeng Zuo
Rather than relying on elaborative annotations (e. g., manually labeled semantic maps and relations), we train our deep model in a weakly-supervised learning manner by leveraging the descriptive sentences of the training images.
no code implementations • 20 Feb 2017 • Ruimao Zhang, Wei Yang, Zhanglin Peng, Xiaogang Wang, Liang Lin
This paper introduces Progressively Diffused Networks (PDNs) for unifying multi-scale context modeling with deep feature learning, by taking semantic image segmentation as an exemplar application.
3 code implementations • 13 Jan 2017 • Keze Wang, Dongyu Zhang, Ya Li, Ruimao Zhang, Liang Lin
In this paper, we propose a novel active learning framework, which is capable of building a competitive classifier with optimal feature representation via a limited amount of labeled training instances in an incremental learning manner.
no code implementations • CVPR 2016 • Liang Lin, Guangrun Wang, Rui Zhang, Ruimao Zhang, Xiaodan Liang, WangMeng Zuo
This paper addresses a fundamental problem of scene understanding: How to parse the scene image into a structured configuration (i. e., a semantic object hierarchy with object interaction relations) that finely accords with human perception.
no code implementations • 7 Apr 2016 • Zhanglin Peng, Ruimao Zhang, Xiaodan Liang, Xiaobai Liu, Liang Lin
This paper addresses the problem of geometric scene parsing, i. e. simultaneously labeling geometric surfaces (e. g. sky, ground and vertical plane) and determining the interaction relations (e. g. layering, supporting, siding and affinity) between main regions.
no code implementations • 19 Aug 2015 • Ruimao Zhang, Liang Lin, Rui Zhang, WangMeng Zuo, Lei Zhang
Furthermore, each bit of our hashing codes is unequally weighted so that we can manipulate the code lengths by truncating the insignificant bits.
no code implementations • 3 Feb 2015 • Zhanglin Peng, Liang Lin, Ruimao Zhang, Jing Xu
Constructing effective representations is a critical but challenging problem in multimedia understanding.
no code implementations • 2 Feb 2015 • Liang Lin, Ruimao Zhang, Xiaohua Duan
During the iterations of inference, the model of each category is analytically updated by a generative learning algorithm.