Search Results for author: Ruimao Zhang

Found 59 papers, 31 papers with code

SEED-Bench-2: Benchmarking Multimodal Large Language Models

1 code implementation28 Nov 2023 Bohao Li, Yuying Ge, Yixiao Ge, Guangzhi Wang, Rui Wang, Ruimao Zhang, Ying Shan

Multimodal large language models (MLLMs), building upon the foundation of powerful large language models (LLMs), have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal inputs (acting like a combination of GPT-4V and DALL-E 3).

Benchmarking Image Generation +1

HumanTOMATO: Text-aligned Whole-body Motion Generation

no code implementations19 Oct 2023 Shunlin Lu, Ling-Hao Chen, Ailing Zeng, Jing Lin, Ruimao Zhang, Lei Zhang, Heung-Yeung Shum

This work targets a novel text-driven whole-body motion generation task, which takes a given textual description as input and aims at generating high-quality, diverse, and coherent facial expressions, hand gestures, and body motions simultaneously.

UniPose: Detecting Any Keypoints

1 code implementation12 Oct 2023 Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang

This work proposes a unified framework called UniPose to detect keypoints of any articulated (e. g., human and animal), rigid, and soft objects via visual or textual prompts for fine-grained vision understanding and manipulation.

 Ranked #1 on 2D Human Pose Estimation on Human-Art (using extra training data)

2D Human Pose Estimation 2D Pose Estimation +4

SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection

1 code implementation ICCV 2023 Yiran Qin, Chaoqun Wang, Zijian Kang, Ningning Ma, Zhen Li, Ruimao Zhang

In this paper, we propose a novel training strategy called SupFusion, which provides an auxiliary feature level supervision for effective LiDAR-Camera fusion and significantly boosts detection performance.

3D Object Detection object-detection

Molecular Conformation Generation via Shifting Scores

no code implementations12 Sep 2023 Zihan Zhou, Ruiying Liu, Chaolong Ying, Ruimao Zhang, Tianshu Yu

Molecular conformation generation, a critical aspect of computational chemistry, involves producing the three-dimensional conformer geometry for a given molecule.

FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions

1 code implementation10 Sep 2023 Jiong Wang, Fengyu Yang, Wenbo Gou, Bingliang Li, Danqi Yan, Ailing Zeng, Yijun Gao, Junle Wang, Yanqing Jing, Ruimao Zhang

To facilitate the development of 3D pose estimation, we present FreeMan, the first large-scale, multi-view dataset collected under the real-world conditions.

3D Human Pose Estimation 3D Pose Estimation +1

Dance with You: The Diversity Controllable Dancer Generation via Diffusion Models

1 code implementation23 Aug 2023 Siyue Yao, MingJie Sun, Bingliang Li, Fengyu Yang, Junle Wang, Ruimao Zhang

In this paper, we introduce a novel multi-dancer synthesis task called partner dancer generation, which involves synthesizing virtual human dancers capable of performing dance with users.

Neural Interactive Keypoint Detection

1 code implementation ICCV 2023 Jie Yang, Ailing Zeng, Feng Li, Shilong Liu, Ruimao Zhang, Lei Zhang

Click-Pose explores how user feedback can cooperate with a neural keypoint detector to correct the predicted keypoints in an interactive way for a faster and more effective annotation process.

Keypoint Detection

YONA: You Only Need One Adjacent Reference-frame for Accurate and Fast Video Polyp Detection

no code implementations6 Jun 2023 Yuncheng Jiang, Zixun Zhang, Ruimao Zhang, Guanbin Li, Shuguang Cui, Zhen Li

YONA fully exploits the information of one previous adjacent frame and conducts polyp detection on the current frame without multi-frame collaborations.

Contrastive Learning

Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model

1 code implementation20 May 2023 Jie Yang, Bingliang Li, Fengyu Yang, Ailing Zeng, Lei Zhang, Ruimao Zhang

Extensive experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i. e., 41. 50 mAP) and zero-shot detection.

Ranked #2 on Zero-Shot Human-Object Interaction Detection on HICO-DET (using extra training data)

Human-Object Interaction Detection Zero-Shot Human-Object Interaction Detection

Hierarchical Weight Averaging for Deep Neural Networks

no code implementations23 Apr 2023 Xiaozhe Gu, Zixun Zhang, Yuncheng Jiang, Tao Luo, Ruimao Zhang, Shuguang Cui, Zhen Li

Despite the simplicity, stochastic gradient descent (SGD)-like algorithms are successful in training deep neural networks (DNNs).

Semantic Human Parsing via Scalable Semantic Transfer over Multiple Label Domains

no code implementations CVPR 2023 Jie Yang, Chaoqun Wang, Zhen Li, Junle Wang, Ruimao Zhang

This paper presents Scalable Semantic Transfer (SST), a novel training paradigm, to explore how to leverage the mutual benefits of the data from different label domains (i. e. various levels of label granularity) to train a powerful human parsing network.

Human Parsing Representation Learning

Inherent Consistent Learning for Accurate Semi-supervised Medical Image Segmentation

2 code implementations24 Mar 2023 Ye Zhu, Jie Yang, Si-Qi Liu, Ruimao Zhang

Semi-supervised medical image segmentation has attracted much attention in recent years because of the high cost of medical image annotations.

Image Segmentation Segmentation +2

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation

3 code implementations3 Feb 2023 Jie Yang, Ailing Zeng, Shilong Liu, Feng Li, Ruimao Zhang, Lei Zhang

This paper presents a novel end-to-end framework with Explicit box Detection for multi-person Pose estimation, called ED-Pose, where it unifies the contextual learning between human-level (global) and keypoint-level (local) information.

2D Human Pose Estimation Human Detection +3

Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person Re-identification

no code implementations2 Jan 2023 Ziyi Tang, Ruimao Zhang, Zhanglin Peng, Jinrui Chen, Liang Lin

We further introduce the Attribute-Aware and Identity-Aware Proxy embedding modules (AAP and IAP) to extract the informative and discriminative feature representations at different stages.

Representation Learning Video-Based Person Re-Identification

Let Images Give You More:Point Cloud Cross-Modal Training for Shape Analysis

2 code implementations9 Oct 2022 Xu Yan, Heshen Zhan, Chaoda Zheng, Jiantao Gao, Ruimao Zhang, Shuguang Cui, Zhen Li

Specifically, this paper introduces a simple but effective point cloud cross-modality training (PointCMT) strategy, which utilizes view-images, i. e., rendered or projected 2D images of the 3D object, to boost point cloud analysis.

3D Point Cloud Classification Knowledge Distillation +1

Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration

2 code implementations21 Jul 2022 Haotian Bai, Ruimao Zhang, Jiong Wang, Xiang Wan

Weakly Supervised Object Localization (WSOL), which aims to localize objects by only using image-level labels, has attracted much attention because of its low annotation cost in real applications.

Long-range modeling Weakly-Supervised Object Localization

2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds

1 code implementation10 Jul 2022 Xu Yan, Jiantao Gao, Chaoda Zheng, Chao Zheng, Ruimao Zhang, Shenghui Cui, Zhen Li

As camera and LiDAR sensors capture complementary information used in autonomous driving, great efforts have been made to develop semantic segmentation algorithms through multi-modality data fusion.

Autonomous Driving Knowledge Distillation +3

Toward Clinically Assisted Colorectal Polyp Recognition via Structured Cross-modal Representation Consistency

1 code implementation23 Jun 2022 Weijie Ma, Ye Zhu, Ruimao Zhang, Jie Yang, Yiwen Hu, Zhen Li, Li Xiang

By aligning the class tokens and spatial attention maps of paired NBI and WL images at different levels, the Transformer achieves the ability to keep both global and local representation consistency for the above two modalities.

Classification Image Classification

AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

1 code implementation16 Jun 2022 Yuanfeng Ji, Haotian Bai, Jie Yang, Chongjian Ge, Ye Zhu, Ruimao Zhang, Zhen Li, Lingyan Zhang, Wanling Ma, Xiang Wan, Ping Luo

Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods.

Image Segmentation Medical Image Segmentation +3

Active Domain Adaptation with Multi-level Contrastive Units for Semantic Segmentation

no code implementations23 May 2022 Hao Zhang, Ruimao Zhang, Zhanglin Peng, Junle Wang, Yanqing Jing

A simple pixel selection strategy followed with the construction of multi-level contrastive units is introduced to optimize the model for both domain adaptation and active supervised learning.

Active Learning Domain Adaptation +3

MetaDance: Few-shot Dancing Video Retargeting via Temporal-aware Meta-learning

no code implementations13 Jan 2022 Yuying Ge, Yibing Song, Ruimao Zhang, Ping Luo

Dancing video retargeting aims to synthesize a video that transfers the dance movements from a source video to a target person.


MetaCloth: Learning Unseen Tasks of Dense Fashion Landmark Detection from a Few Samples

no code implementations6 Dec 2021 Yuying Ge, Ruimao Zhang, Ping Luo

This work proposes a novel framework named MetaCloth via meta-learning, which is able to learn unseen tasks of dense fashion landmark detection with only a few annotated samples.


Crowd Counting via Perspective-Guided Fractional-Dilation Convolution

1 code implementation8 Jul 2021 Zhaoyi Yan, Ruimao Zhang, Hongzhi Zhang, Qingfu Zhang, WangMeng Zuo

One of the main issues in this task is how to handle the dramatic scale variations of pedestrians caused by the perspective effect.

Crowd Counting

Multi-Compound Transformer for Accurate Biomedical Image Segmentation

1 code implementation28 Jun 2021 Yuanfeng Ji, Ruimao Zhang, Huijie Wang, Zhen Li, Lingyun Wu, Shaoting Zhang, Ping Luo

The recent vision transformer(i. e. for image classification) learns non-local attentive interaction of different patch tokens.

Image Classification Image Segmentation +2

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond

1 code implementation5 May 2021 Enze Xie, Wenhai Wang, Mingyu Ding, Ruimao Zhang, Ping Luo

Extensive experiments demonstrate the effectiveness of both PolarMask and PolarMask++, which achieve competitive results on instance segmentation in the challenging COCO dataset with single-model and single-scale training and testing, as well as new state-of-the-art results on rotate text detection and cell segmentation.

Ranked #78 on Instance Segmentation on COCO test-dev (using extra training data)

Cell Segmentation Instance Segmentation +5

PointLIE: Locally Invertible Embedding for Point Cloud Sampling and Recovery

1 code implementation30 Apr 2021 Weibing Zhao, Xu Yan, Jiantao Gao, Ruimao Zhang, Jiayan Zhang, Zhen Li, Song Wu, Shuguang Cui

In this paper, we address a fundamental problem in PCSR: How to downsample the dense point cloud with arbitrary scales while preserving the local topology of discarding points in a case-agnostic manner (i. e. without additional storage for point relationship)?

Parser-Free Virtual Try-on via Distilling Appearance Flows

2 code implementations CVPR 2021 Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, Ping Luo

A recent pioneering work employed knowledge distillation to reduce the dependency of human parsing, where the try-on images produced by a parser-based method are used as supervisions to train a "student" network without relying on segmentation, making the student mimic the try-on ability of the parser-based model.

Human Parsing Knowledge Distillation +1

Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion

2 code implementations7 Dec 2020 Xu Yan, Jiantao Gao, Jie Li, Ruimao Zhang, Zhen Li, Rui Huang, Shuguang Cui

In practice, an initial semantic segmentation (SS) of a single sweep point cloud can be achieved by any appealing network and then flows into the semantic scene completion (SSC) module as the input.

3D Semantic Scene Completion from a single RGB image 3D Semantic Segmentation +3

Polygon-free: Unconstrained Scene Text Detection with Box Annotations

1 code implementation26 Nov 2020 Weijia Wu, Enze Xie, Ruimao Zhang, Wenhai Wang, Hong Zhou, Ping Luo

For example, without using polygon annotations, PSENet achieves an 80. 5% F-score on TotalText [3] (vs. 80. 9% of fully supervised counterpart), 31. 1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs.

Scene Text Detection Text Detection

UXNet: Searching Multi-level Feature Aggregation for 3D Medical Image Segmentation

no code implementations16 Sep 2020 Yuanfeng Ji, Ruimao Zhang, Zhen Li, Jiamin Ren, Shaoting Zhang, Ping Luo

Unlike the recent neural architecture search (NAS) methods that typically searched the optimal operators in each network layer, but missed a good strategy to search for feature aggregations, this paper proposes a novel NAS method for 3D medical image segmentation, named UXNet, which searches both the scale-wise feature aggregation strategies as well as the block-wise operators in the encoder-decoder network.

Image Segmentation Neural Architecture Search +3

Exemplar Normalization for Learning Deep Representation

no code implementations CVPR 2020 Ruimao Zhang, Zhanglin Peng, Lingyun Wu, Zhen Li, Ping Luo

This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN), which is able to learn different normalization methods for different convolutional layers and image samples of a deep network.

Semantic Segmentation

Towards Photo-Realistic Virtual Try-On by Adaptively Generating$\leftrightarrow$Preserving Image Content

3 code implementations12 Mar 2020 Han Yang, Ruimao Zhang, Xiaobao Guo, Wei Liu, WangMeng Zuo, Ping Luo

First, a semantic layout generation module utilizes semantic segmentation of the reference image to progressively predict the desired semantic layout after try-on.

Ranked #4 on Virtual Try-on on VITON (IS metric)

Semantic Segmentation Virtual Try-on

Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks

no code implementations ICCV 2019 Zhaoyang Zhang, Jingyu Li, Wenqi Shao, Zhanglin Peng, Ruimao Zhang, Xiaogang Wang, Ping Luo

ResNeXt, still suffers from the sub-optimal performance due to manually defining the number of groups as a constant over all of the layers.

Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once

no code implementations ICCV 2019 Jiangfan Han, Xiaoyi Dong, Ruimao Zhang, Dong-Dong Chen, Weiming Zhang, Nenghai Yu, Ping Luo, Xiaogang Wang

Recently, generation-based methods have received much attention since they directly use feed-forward networks to generate the adversarial samples, which avoid the time-consuming iterative attacking procedure in optimization-based and gradient-based methods.

Classification General Classification

Switchable Normalization for Learning-to-Normalize Deep Representation

no code implementations22 Jul 2019 Ping Luo, Ruimao Zhang, Jiamin Ren, Zhanglin Peng, Jingyu Li

Analyses of SN are also presented to answer the following three questions: (a) Is it useful to allow each normalization layer to select its own normalizer?

SSN: Learning Sparse Switchable Normalization via SparsestMax

1 code implementation CVPR 2019 Wenqi Shao, Tianjian Meng, Jingyu Li, Ruimao Zhang, Yudian Li, Xiaogang Wang, Ping Luo

Unlike $\ell_1$ and $\ell_0$ constraints that impose difficulties in optimization, we turn this constrained optimization problem into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax.

Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct?

no code implementations19 Nov 2018 Ping Luo, Zhanglin Peng, Jiamin Ren, Ruimao Zhang

Our results suggest that (1) using distinct normalizers improves both learning and generalization of a ConvNet; (2) the choices of normalizers are more related to depth and batch size, but less relevant to parameter initialization, learning rate decay, and solver; (3) different tasks and datasets have different behaviors when learning to select normalizers.

Learning Deep Representations for Semantic Image Parsing: a Comprehensive Overview

no code implementations10 Oct 2018 Lili Huang, Jiefeng Peng, Ruimao Zhang, Guanbin Li, Liang Lin

Semantic image parsing, which refers to the process of decomposing images into semantic regions and constructing the structure representation of the input, has recently aroused widespread interest in the field of computer vision.

Representation Learning Segmentation +1

Attentive Crowd Flow Machines

no code implementations1 Sep 2018 Lingbo Liu, Ruimao Zhang, Jiefeng Peng, Guanbin Li, Bowen Du, Liang Lin

Traffic flow prediction is crucial for urban traffic management and public safety.


SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification

no code implementations16 Jul 2018 Ruimao Zhang, Hongbin Sun, Jingyu Li, Yuying Ge, Liang Lin, Ping Luo, Xiaogang Wang

To address the above issues, we present a novel and practical deep architecture for video person re-identification termed Self-and-Collaborative Attention Network (SCAN).

Video-Based Person Re-Identification

Differentiable Learning-to-Normalize via Switchable Normalization

3 code implementations ICLR 2019 Ping Luo, Jiamin Ren, Zhanglin Peng, Ruimao Zhang, Jingyu Li

We hope SN will help ease the usage and understand the normalization techniques in deep learning.

Hierarchical Scene Parsing by Weakly Supervised Learning with Image Descriptions

no code implementations27 Sep 2017 Ruimao Zhang, Liang Lin, Guangrun Wang, Meng Wang, WangMeng Zuo

Rather than relying on elaborative annotations (e. g., manually labeled semantic maps and relations), we train our deep model in a weakly-supervised learning manner by leveraging the descriptive sentences of the training images.

Descriptive Scene Labeling +2

Progressively Diffused Networks for Semantic Image Segmentation

no code implementations20 Feb 2017 Ruimao Zhang, Wei Yang, Zhanglin Peng, Xiaogang Wang, Liang Lin

This paper introduces Progressively Diffused Networks (PDNs) for unifying multi-scale context modeling with deep feature learning, by taking semantic image segmentation as an exemplar application.

Image Segmentation Segmentation +1

Cost-Effective Active Learning for Deep Image Classification

3 code implementations13 Jan 2017 Keze Wang, Dongyu Zhang, Ya Li, Ruimao Zhang, Liang Lin

In this paper, we propose a novel active learning framework, which is capable of building a competitive classifier with optimal feature representation via a limited amount of labeled training instances in an incremental learning manner.

Active Learning Classification +5

Deep Structured Scene Parsing by Learning with Image Descriptions

no code implementations CVPR 2016 Liang Lin, Guangrun Wang, Rui Zhang, Ruimao Zhang, Xiaodan Liang, WangMeng Zuo

This paper addresses a fundamental problem of scene understanding: How to parse the scene image into a structured configuration (i. e., a semantic object hierarchy with object interaction relations) that finely accords with human perception.

Descriptive Scene Labeling +1

Geometric Scene Parsing with Hierarchical LSTM

no code implementations7 Apr 2016 Zhanglin Peng, Ruimao Zhang, Xiaodan Liang, Xiaobai Liu, Liang Lin

This paper addresses the problem of geometric scene parsing, i. e. simultaneously labeling geometric surfaces (e. g. sky, ground and vertical plane) and determining the interaction relations (e. g. layering, supporting, siding and affinity) between main regions.

3D Reconstruction Scene Labeling

Bit-Scalable Deep Hashing with Regularized Similarity Learning for Image Retrieval and Person Re-identification

no code implementations19 Aug 2015 Ruimao Zhang, Liang Lin, Rui Zhang, WangMeng Zuo, Lei Zhang

Furthermore, each bit of our hashing codes is unequally weighted so that we can manipulate the code lengths by truncating the insignificant bits.

Deep Hashing Image Retrieval +1

Deep Boosting: Layered Feature Mining for General Image Classification

no code implementations3 Feb 2015 Zhanglin Peng, Liang Lin, Ruimao Zhang, Jing Xu

Constructing effective representations is a critical but challenging problem in multimedia understanding.

Classification General Classification +1

Adaptive Scene Category Discovery with Generative Learning and Compositional Sampling

no code implementations2 Feb 2015 Liang Lin, Ruimao Zhang, Xiaohua Duan

During the iterations of inference, the model of each category is analytically updated by a generative learning algorithm.

Image Categorization

Cannot find the paper you are looking for? You can Submit a new open access paper.