no code implementations • ECCV 2020 • Xu Yan, Weibing Zhao, Kun Yuan, Ruimao Zhang, Zhen Li, Shuguang Cui
Recovering realistic textures from a largely down-sampled low resolution (LR) image with complicated patterns is a challenging problem in image super-resolution.
no code implementations • 9 Jan 2025 • Yuhong Zhang, Jing Lin, Ailing Zeng, Guanlin Wu, Shunlin Lu, Yurong Fu, Yuanhao Cai, Ruimao Zhang, Haoqian Wang, Lei Zhang
To address this issue, we develop a scalable annotation pipeline that can automatically capture 3D whole-body human motion and comprehensive textural labels from RGB videos and build the Motion-X dataset comprising 81. 1K text-motion pairs.
no code implementations • 8 Jan 2025 • Yuzhou Huang, Ziyang Yuan, Quande Liu, Qiulin Wang, Xintao Wang, Ruimao Zhang, Pengfei Wan, Di Zhang, Kun Gai
To address these challenges, we introduce ConceptMaster, an innovative framework that effectively tackles the critical issues of identity decoupling while maintaining concept fidelity in customized videos.
no code implementations • 19 Dec 2024 • Shunlin Lu, Jingbo Wang, Zeyu Lu, Ling-Hao Chen, Wenxun Dai, Junting Dong, Zhiyang Dou, Bo Dai, Ruimao Zhang
In this paper, we introduce a scalable motion generation framework that includes the motion tokenizer Motion FSQ-VAE and a text-prefix autoregressive transformer.
no code implementations • 13 Dec 2024 • Lai Wei, Jiahua Ma, Yibo Hu, Ruimao Zhang
In practice, unlike previous approaches that concatenate visual and tactile data to generate future robot state sequences, our method employs tactile data as a calibration signal to adjust the robot's state within the state space implicitly.
no code implementations • 4 Nov 2024 • Jie Yang, Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Ruimao Zhang
To bridge this gap, we introduce the novel challenge of Semantic Keypoint Comprehension, which aims to comprehend keypoints across different task scenarios, including keypoint semantic understanding, visual prompt-based keypoint detection, and textual prompt-based keypoint detection.
no code implementations • 23 Oct 2024 • Yiran Qin, Zhelun Shi, Jiwen Yu, Xijun Wang, Enshen Zhou, Lijun Li, Zhenfei Yin, Xihui Liu, Lu Sheng, Jing Shao, Lei Bai, Wanli Ouyang, Ruimao Zhang
WorldSimBench includes Explicit Perceptual Evaluation and Implicit Manipulative Evaluation, encompassing human preference assessments from the visual perspective and action-level evaluations in embodied tasks, covering three representative embodied scenarios: Open-Ended Embodied Environment, Autonomous, Driving, and Robot Manipulation.
no code implementations • 1 Oct 2024 • Hanqi Jiang, Xixuan Hao, Yuzhou Huang, Chong Ma, Jiaxun Zhang, Yi Pan, Ruimao Zhang
Moreover, our framework incorporates a generation decoder that employs two proxy tasks, responsible for generating the impression from (1) images, via a captioning branch, and (2) findings, through a summarization branch.
no code implementations • 21 Aug 2024 • Yuzhou Huang, Yiran Qin, Shunlin Lu, Xintao Wang, Rui Huang, Ying Shan, Ruimao Zhang
Traditional visual storytelling is complex, requiring specialized knowledge and substantial resources, yet often constrained by human creativity and creation precision.
no code implementations • 17 Jul 2024 • Jie Yang, Xuesong Niu, Nan Jiang, Ruimao Zhang, Siyuan Huang
Existing 3D human object interaction (HOI) datasets and models simply align global descriptions with the long HOI sequence, while lacking a detailed understanding of intermediate states and the transitions between states.
no code implementations • CVPR 2024 • Jie Yang, Bingliang Li, Ailing Zeng, Lei Zhang, Ruimao Zhang
In this paper, we develop \textbf{MP-HOI}, a powerful Multi-modal Prompt-based HOI detector designed to leverage both textual descriptions for open-set generalization and visual exemplars for handling high ambiguity in descriptions, realizing HOI detection in the open world.
1 code implementation • 30 May 2024 • Ling-Hao Chen, Shunlin Lu, Ailing Zeng, Hao Zhang, Benyou Wang, Ruimao Zhang, Lei Zhang
This study delves into the realm of multi-modality (i. e., video and motion modalities) human behavior understanding by leveraging the powerful capabilities of Large Language Models (LLMs).
2 code implementations • 25 Apr 2024 • Bohao Li, Yuying Ge, Yi Chen, Yixiao Ge, Ruimao Zhang, Ying Shan
We hope that our work can serve as a valuable addition to existing MLLM benchmarks, providing insightful observations and inspiring further research in the area of text-rich visual comprehension with MLLMs.
1 code implementation • 18 Mar 2024 • Enshen Zhou, Yiran Qin, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu Sheng, Yu Qiao, Jing Shao
It is a long-lasting goal to design a generalist-embodied agent that can follow diverse instructions in human-like ways.
no code implementations • 7 Feb 2024 • Chaoqun Wang, Yiran Qin, Zijian Kang, Ningning Ma, Ruimao Zhang
First, a depth estimation (DE) scheme leverages relative depth information to realize the effective feature lifting from 2D to 3D spaces.
1 code implementation • CVPR 2024 • Bohao Li, Yuying Ge, Yixiao Ge, Guangzhi Wang, Rui Wang, Ruimao Zhang, Ying Shan
Multimodal large language models (MLLMs) building upon the foundation of powerful large language models (LLMs) have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal inputs (acting like a combination of GPT-4V and DALL-E 3).
1 code implementation • 12 Dec 2023 • Linglin Jing, Ying Xue, Xu Yan, Chaoda Zheng, Dong Wang, Ruimao Zhang, Zhigang Wang, Hui Fang, Bin Zhao, Zhen Li
The field of 4D point cloud understanding is rapidly developing with the goal of analyzing dynamic 3D point cloud sequences.
1 code implementation • CVPR 2024 • Yiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing Shao
It is a long-lasting goal to design an embodied system that can solve long-horizon open-world tasks in human-like ways.
1 code implementation • CVPR 2024 • Yuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, Ying Shan
Both quantitative and qualitative results on this evaluation dataset indicate that our SmartEdit surpasses previous methods, paving the way for the practical application of complex instruction-based image editing.
2 code implementations • 28 Nov 2023 • Bohao Li, Yuying Ge, Yixiao Ge, Guangzhi Wang, Rui Wang, Ruimao Zhang, Ying Shan
Multimodal large language models (MLLMs), building upon the foundation of powerful large language models (LLMs), have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal inputs (acting like a combination of GPT-4V and DALL-E 3).
1 code implementation • 19 Oct 2023 • Shunlin Lu, Ling-Hao Chen, Ailing Zeng, Jing Lin, Ruimao Zhang, Lei Zhang, Heung-Yeung Shum
This work targets a novel text-driven whole-body motion generation task, which takes a given textual description as input and aims at generating high-quality, diverse, and coherent facial expressions, hand gestures, and body motions simultaneously.
Ranked #1 on
Motion Synthesis
on Motion-X
2 code implementations • 12 Oct 2023 • Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang
This work aims to address an advanced keypoint detection problem: how to accurately detect any keypoints in complex real-world scenarios, which involves massive, messy, and open-ended objects as well as their associated keypoints definitions.
Ranked #1 on
2D Human Pose Estimation
on Human-Art
(using extra training data)
1 code implementation • ICCV 2023 • Yiran Qin, Chaoqun Wang, Zijian Kang, Ningning Ma, Zhen Li, Ruimao Zhang
In this paper, we propose a novel training strategy called SupFusion, which provides an auxiliary feature level supervision for effective LiDAR-Camera fusion and significantly boosts detection performance.
no code implementations • 12 Sep 2023 • Zihan Zhou, Ruiying Liu, Chaolong Ying, Ruimao Zhang, Tianshu Yu
Molecular conformation generation, a critical aspect of computational chemistry, involves producing the three-dimensional conformer geometry for a given molecule.
1 code implementation • CVPR 2024 • Jiong Wang, Fengyu Yang, Wenbo Gou, Bingliang Li, Danqi Yan, Ailing Zeng, Yijun Gao, Junle Wang, Yanqing Jing, Ruimao Zhang
To facilitate the development of 3D pose estimation, we present FreeMan, the first large-scale, multi-view dataset collected under the real-world conditions.
1 code implementation • 23 Aug 2023 • Siyue Yao, MingJie Sun, Bingliang Li, Fengyu Yang, Junle Wang, Ruimao Zhang
In this paper, we introduce a novel multi-dancer synthesis task called partner dancer generation, which involves synthesizing virtual human dancers capable of performing dance with users.
1 code implementation • ICCV 2023 • Jie Yang, Ailing Zeng, Feng Li, Shilong Liu, Ruimao Zhang, Lei Zhang
Click-Pose explores how user feedback can cooperate with a neural keypoint detector to correct the predicted keypoints in an interactive way for a faster and more effective annotation process.
1 code implementation • NeurIPS 2023 • Jing Lin, Ailing Zeng, Shunlin Lu, Yuanhao Cai, Ruimao Zhang, Haoqian Wang, Lei Zhang
In this paper, we present Motion-X, a large-scale 3D expressive whole-body motion dataset.
1 code implementation • 6 Jun 2023 • Yuncheng Jiang, Zixun Zhang, Ruimao Zhang, Guanbin Li, Shuguang Cui, Zhen Li
YONA fully exploits the information of one previous adjacent frame and conducts polyp detection on the current frame without multi-frame collaborations.
no code implementations • 26 May 2023 • Rui Sun, Andi Zhang, Haiming Zhang, Jinke Ren, Yao Zhu, Ruimao Zhang, Shuguang Cui, Zhen Li
Specifically, our framework consists of two components: a sample repairing module and a detection module.
Generative Adversarial Network
Out-of-Distribution Detection
+1
1 code implementation • 20 May 2023 • Jie Yang, Bingliang Li, Fengyu Yang, Ailing Zeng, Lei Zhang, Ruimao Zhang
Extensive experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i. e., 41. 50 mAP) and zero-shot detection.
Ranked #2 on
Zero-Shot Human-Object Interaction Detection
on HICO-DET
(using extra training data)
no code implementations • 23 Apr 2023 • Xiaozhe Gu, Zixun Zhang, Yuncheng Jiang, Tao Luo, Ruimao Zhang, Shuguang Cui, Zhen Li
Despite the simplicity, stochastic gradient descent (SGD)-like algorithms are successful in training deep neural networks (DNNs).
no code implementations • CVPR 2023 • Jie Yang, Chaoqun Wang, Zhen Li, Junle Wang, Ruimao Zhang
This paper presents Scalable Semantic Transfer (SST), a novel training paradigm, to explore how to leverage the mutual benefits of the data from different label domains (i. e. various levels of label granularity) to train a powerful human parsing network.
2 code implementations • 24 Mar 2023 • Ye Zhu, Jie Yang, Si-Qi Liu, Ruimao Zhang
Semi-supervised medical image segmentation has attracted much attention in recent years because of the high cost of medical image annotations.
3 code implementations • 3 Feb 2023 • Jie Yang, Ailing Zeng, Shilong Liu, Feng Li, Ruimao Zhang, Lei Zhang
This paper presents a novel end-to-end framework with Explicit box Detection for multi-person Pose estimation, called ED-Pose, where it unifies the contextual learning between human-level (global) and keypoint-level (local) information.
Ranked #2 on
2D Human Pose Estimation
on Human-Art
no code implementations • 2 Jan 2023 • Ziyi Tang, Ruimao Zhang, Zhanglin Peng, Jinrui Chen, Liang Lin
We further introduce the Attribute-Aware and Identity-Aware Proxy embedding modules (AAP and IAP) to extract the informative and discriminative feature representations at different stages.
2 code implementations • 9 Oct 2022 • Xu Yan, Heshen Zhan, Chaoda Zheng, Jiantao Gao, Ruimao Zhang, Shuguang Cui, Zhen Li
Specifically, this paper introduces a simple but effective point cloud cross-modality training (PointCMT) strategy, which utilizes view-images, i. e., rendered or projected 2D images of the 3D object, to boost point cloud analysis.
Ranked #12 on
3D Point Cloud Classification
on ModelNet40
2 code implementations • 21 Jul 2022 • Haotian Bai, Ruimao Zhang, Jiong Wang, Xiang Wan
Weakly Supervised Object Localization (WSOL), which aims to localize objects by only using image-level labels, has attracted much attention because of its low annotation cost in real applications.
Ranked #2 on
Weakly-Supervised Object Localization
on ImageNet
1 code implementation • 10 Jul 2022 • Xu Yan, Jiantao Gao, Chaoda Zheng, Chao Zheng, Ruimao Zhang, Shenghui Cui, Zhen Li
As camera and LiDAR sensors capture complementary information used in autonomous driving, great efforts have been made to develop semantic segmentation algorithms through multi-modality data fusion.
Ranked #4 on
Robust 3D Semantic Segmentation
on nuScenes-C
1 code implementation • 23 Jun 2022 • Weijie Ma, Ye Zhu, Ruimao Zhang, Jie Yang, Yiwen Hu, Zhen Li, Li Xiang
By aligning the class tokens and spatial attention maps of paired NBI and WL images at different levels, the Transformer achieves the ability to keep both global and local representation consistency for the above two modalities.
no code implementations • 21 Jun 2022 • Jie Yang, Ye Zhu, Chaoqun Wang, Zhen Li, Ruimao Zhang
Integrating multi-modal data to promote medical image analysis has recently gained great attention.
3 code implementations • 16 Jun 2022 • Yuanfeng Ji, Haotian Bai, Jie Yang, Chongjian Ge, Ye Zhu, Ruimao Zhang, Zhen Li, Lingyan Zhang, Wanling Ma, Xiang Wan, Ping Luo
Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods.
no code implementations • 23 May 2022 • Hao Zhang, Ruimao Zhang, Zhanglin Peng, Junle Wang, Yanqing Jing
A simple pixel selection strategy followed with the construction of multi-level contrastive units is introduced to optimize the model for both domain adaptation and active supervised learning.
no code implementations • 13 Jan 2022 • Yuying Ge, Yibing Song, Ruimao Zhang, Ping Luo
Dancing video retargeting aims to synthesize a video that transfers the dance movements from a source video to a target person.
no code implementations • 6 Dec 2021 • Yuying Ge, Ruimao Zhang, Ping Luo
This work proposes a novel framework named MetaCloth via meta-learning, which is able to learn unseen tasks of dense fashion landmark detection with only a few annotated samples.
2 code implementations • ICCV 2021 • Teng Wang, Ruimao Zhang, Zhichao Lu, Feng Zheng, Ran Cheng, Ping Luo
Dense video captioning aims to generate multiple associated captions with their temporal locations from the video.
Ranked #6 on
Dense Video Captioning
on YouCook2
1 code implementation • 2 Aug 2021 • Jun Wei, Yiwen Hu, Ruimao Zhang, Zhen Li, S. Kevin Zhou, Shuguang Cui
To address the above issues, we propose the Shallow Attention Network (SANet) for polyp segmentation.
Ranked #11 on
Video Polyp Segmentation
on SUN-SEG-Easy (Unseen)
1 code implementation • 8 Jul 2021 • Zhaoyi Yan, Ruimao Zhang, Hongzhi Zhang, Qingfu Zhang, WangMeng Zuo
One of the main issues in this task is how to handle the dramatic scale variations of pedestrians caused by the perspective effect.
1 code implementation • 28 Jun 2021 • Yuanfeng Ji, Ruimao Zhang, Huijie Wang, Zhen Li, Lingyun Wu, Shaoting Zhang, Ping Luo
The recent vision transformer(i. e. for image classification) learns non-local attentive interaction of different patch tokens.
1 code implementation • 5 May 2021 • Enze Xie, Wenhai Wang, Mingyu Ding, Ruimao Zhang, Ping Luo
Extensive experiments demonstrate the effectiveness of both PolarMask and PolarMask++, which achieve competitive results on instance segmentation in the challenging COCO dataset with single-model and single-scale training and testing, as well as new state-of-the-art results on rotate text detection and cell segmentation.
Ranked #83 on
Instance Segmentation
on COCO test-dev
(using extra training data)
1 code implementation • 30 Apr 2021 • Weibing Zhao, Xu Yan, Jiantao Gao, Ruimao Zhang, Jiayan Zhang, Zhen Li, Song Wu, Shuguang Cui
In this paper, we address a fundamental problem in PCSR: How to downsample the dense point cloud with arbitrary scales while preserving the local topology of discarding points in a case-agnostic manner (i. e. without additional storage for point relationship)?
2 code implementations • CVPR 2021 • Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, Ping Luo
A recent pioneering work employed knowledge distillation to reduce the dependency of human parsing, where the try-on images produced by a parser-based method are used as supervisions to train a "student" network without relying on segmentation, making the student mimic the try-on ability of the parser-based model.
Ranked #1 on
Virtual Try-on
on MPV
1 code implementation • ICCV 2021 • Zhihao Yuan, Xu Yan, Yinghong Liao, Ruimao Zhang, Sheng Wang, Zhen Li, Shuguang Cui
Compared with the visual grounding on 2D images, the natural-language-guided 3D object localization on point clouds is more challenging.
2 code implementations • 7 Dec 2020 • Xu Yan, Jiantao Gao, Jie Li, Ruimao Zhang, Zhen Li, Rui Huang, Shuguang Cui
In practice, an initial semantic segmentation (SS) of a single sweep point cloud can be achieved by any appealing network and then flows into the semantic scene completion (SSC) module as the input.
Ranked #4 on
3D Semantic Scene Completion
on SemanticKITTI
3D Semantic Scene Completion from a single RGB image
3D Semantic Segmentation
+3
1 code implementation • 26 Nov 2020 • Weijia Wu, Enze Xie, Ruimao Zhang, Wenhai Wang, Hong Zhou, Ping Luo
For example, without using polygon annotations, PSENet achieves an 80. 5% F-score on TotalText [3] (vs. 80. 9% of fully supervised counterpart), 31. 1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs.
1 code implementation • 10 Nov 2020 • Andrey Ignatov, Radu Timofte, Zhilu Zhang, Ming Liu, Haolin Wang, WangMeng Zuo, Jiawei Zhang, Ruimao Zhang, Zhanglin Peng, Sijie Ren, Linhui Dai, Xiaohong Liu, Chengqi Li, Jun Chen, Yuichi Ito, Bhavya Vasudeva, Puneesh Deora, Umapada Pal, Zhenyu Guo, Yu Zhu, Tian Liang, Chenghua Li, Cong Leng, Zhihong Pan, Baopu Li, Byung-Hoon Kim, Joonyoung Song, Jong Chul Ye, JaeHyun Baek, Magauiya Zhussip, Yeskendir Koishekenov, Hwechul Cho Ye, Xin Liu, Xueying Hu, Jun Jiang, Jinwei Gu, Kai Li, Pengliang Tan, Bingxin Hou
This paper reviews the second AIM learned ISP challenge and provides the description of the proposed solutions and results.
no code implementations • 16 Sep 2020 • Yuanfeng Ji, Ruimao Zhang, Zhen Li, Jiamin Ren, Shaoting Zhang, Ping Luo
Unlike the recent neural architecture search (NAS) methods that typically searched the optimal operators in each network layer, but missed a good strategy to search for feature aggregations, this paper proposes a novel NAS method for 3D medical image segmentation, named UXNet, which searches both the scale-wise feature aggregation strategies as well as the block-wise operators in the encoder-decoder network.
no code implementations • CVPR 2020 • Ruimao Zhang, Zhanglin Peng, Lingyun Wu, Zhen Li, Ping Luo
This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN), which is able to learn different normalization methods for different convolutional layers and image samples of a deep network.
3 code implementations • 12 Mar 2020 • Han Yang, Ruimao Zhang, Xiaobao Guo, Wei Liu, WangMeng Zuo, Ping Luo
First, a semantic layout generation module utilizes semantic segmentation of the reference image to progressively predict the desired semantic layout after try-on.
Ranked #4 on
Virtual Try-on
on VITON
(IS metric)
no code implementations • ICCV 2019 • Zhaoyang Zhang, Jingyu Li, Wenqi Shao, Zhanglin Peng, Ruimao Zhang, Xiaogang Wang, Ping Luo
ResNeXt, still suffers from the sub-optimal performance due to manually defining the number of groups as a constant over all of the layers.
no code implementations • ICCV 2019 • Jiangfan Han, Xiaoyi Dong, Ruimao Zhang, Dong-Dong Chen, Weiming Zhang, Nenghai Yu, Ping Luo, Xiaogang Wang
Recently, generation-based methods have received much attention since they directly use feed-forward networks to generate the adversarial samples, which avoid the time-consuming iterative attacking procedure in optimization-based and gradient-based methods.
no code implementations • 22 Jul 2019 • Ping Luo, Ruimao Zhang, Jiamin Ren, Zhanglin Peng, Jingyu Li
Analyses of SN are also presented to answer the following three questions: (a) Is it useful to allow each normalization layer to select its own normalizer?
1 code implementation • CVPR 2019 • Wenqi Shao, Tianjian Meng, Jingyu Li, Ruimao Zhang, Yudian Li, Xiaogang Wang, Ping Luo
Unlike $\ell_1$ and $\ell_0$ constraints that impose difficulties in optimization, we turn this constrained optimization problem into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax.
5 code implementations • CVPR 2019 • Yuying Ge, Ruimao Zhang, Lingyun Wu, Xiaogang Wang, Xiaoou Tang, Ping Luo
A strong baseline is proposed, called Match R-CNN, which builds upon Mask R-CNN to solve the above four tasks in an end-to-end manner.
no code implementations • 19 Nov 2018 • Ping Luo, Zhanglin Peng, Jiamin Ren, Ruimao Zhang
Our results suggest that (1) using distinct normalizers improves both learning and generalization of a ConvNet; (2) the choices of normalizers are more related to depth and batch size, but less relevant to parameter initialization, learning rate decay, and solver; (3) different tasks and datasets have different behaviors when learning to select normalizers.
no code implementations • 10 Oct 2018 • Lili Huang, Jiefeng Peng, Ruimao Zhang, Guanbin Li, Liang Lin
Semantic image parsing, which refers to the process of decomposing images into semantic regions and constructing the structure representation of the input, has recently aroused widespread interest in the field of computer vision.
no code implementations • 1 Sep 2018 • Lingbo Liu, Ruimao Zhang, Jiefeng Peng, Guanbin Li, Bowen Du, Liang Lin
Traffic flow prediction is crucial for urban traffic management and public safety.
no code implementations • 16 Jul 2018 • Ruimao Zhang, Hongbin Sun, Jingyu Li, Yuying Ge, Liang Lin, Ping Luo, Xiaogang Wang
To address the above issues, we present a novel and practical deep architecture for video person re-identification termed Self-and-Collaborative Attention Network (SCAN).
3 code implementations • ICLR 2019 • Ping Luo, Jiamin Ren, Zhanglin Peng, Ruimao Zhang, Jingyu Li
We hope SN will help ease the usage and understand the normalization techniques in deep learning.
no code implementations • 27 Sep 2017 • Ruimao Zhang, Liang Lin, Guangrun Wang, Meng Wang, WangMeng Zuo
Rather than relying on elaborative annotations (e. g., manually labeled semantic maps and relations), we train our deep model in a weakly-supervised learning manner by leveraging the descriptive sentences of the training images.
no code implementations • 20 Feb 2017 • Ruimao Zhang, Wei Yang, Zhanglin Peng, Xiaogang Wang, Liang Lin
This paper introduces Progressively Diffused Networks (PDNs) for unifying multi-scale context modeling with deep feature learning, by taking semantic image segmentation as an exemplar application.
4 code implementations • 13 Jan 2017 • Keze Wang, Dongyu Zhang, Ya Li, Ruimao Zhang, Liang Lin
In this paper, we propose a novel active learning framework, which is capable of building a competitive classifier with optimal feature representation via a limited amount of labeled training instances in an incremental learning manner.
no code implementations • CVPR 2016 • Liang Lin, Guangrun Wang, Rui Zhang, Ruimao Zhang, Xiaodan Liang, WangMeng Zuo
This paper addresses a fundamental problem of scene understanding: How to parse the scene image into a structured configuration (i. e., a semantic object hierarchy with object interaction relations) that finely accords with human perception.
no code implementations • 7 Apr 2016 • Zhanglin Peng, Ruimao Zhang, Xiaodan Liang, Xiaobai Liu, Liang Lin
This paper addresses the problem of geometric scene parsing, i. e. simultaneously labeling geometric surfaces (e. g. sky, ground and vertical plane) and determining the interaction relations (e. g. layering, supporting, siding and affinity) between main regions.
no code implementations • 19 Aug 2015 • Ruimao Zhang, Liang Lin, Rui Zhang, WangMeng Zuo, Lei Zhang
Furthermore, each bit of our hashing codes is unequally weighted so that we can manipulate the code lengths by truncating the insignificant bits.
no code implementations • 3 Feb 2015 • Zhanglin Peng, Liang Lin, Ruimao Zhang, Jing Xu
Constructing effective representations is a critical but challenging problem in multimedia understanding.
no code implementations • 2 Feb 2015 • Liang Lin, Ruimao Zhang, Xiaohua Duan
During the iterations of inference, the model of each category is analytically updated by a generative learning algorithm.