Search Results for author: Gang Yu

Found 85 papers, 49 papers with code

GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

no code implementations • 22 Apr 2024 • Hongyun Yu, Zhan Qu, Qihang Yu, Jianchuan Chen, Zhonghua Jiang, Zhiwen Chen, Shengyu Zhang, Jimin Xu, Fei Wu, Chengfei Lv, Gang Yu

In this paper, we propose GaussianTalker, a novel method for audio-driven talking head synthesis based on 3D Gaussian Splatting.

Paper
Add Code

Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

1 code implementation • Under review for Transaction 2024 • Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen

Our method benefits various applications including in-the-wild metrology monocular-SLAM, and 3D reconstruction, which highlight the versatility of Metric3D v2 models as geometric foundation models.

Ranked #1 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

3D Reconstruction Monocular Depth Estimation +3

591

Paper
Code

MotionChain: Conversational Motion Controllers via Multimodal Prompts

1 code implementation • 2 Apr 2024 • Biao Jiang, Xin Chen, Chi Zhang, Fukun Yin, Zhuoyuan Li, Gang Yu, Jiayuan Fan

However, this proficiency remains largely unexplored in other multimodal generative models, particularly in human motion models.

Language Modelling

Paper
Code

Generative Motion Stylization within Canonical Motion Space

no code implementations • 18 Mar 2024 • Jiaxu Zhang, Xin Chen, Gang Yu, Zhigang Tu

Our key insight is to embed motion style into a cross-modality latent space and perceive the cross-structure skeleton topologies, allowing for motion stylization within a canonical motion space.

Motion Synthesis

Paper
Add Code

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

1 code implementation • 8 Mar 2024 • XiWei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, Gang Yu

Diffusion models have demonstrated remarkable performance in the domain of text-to-image generation.

Denoising Language Modelling +2

250

Paper
Code

MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

no code implementations • 3 Mar 2024 • Zhende Song, Chenchen Wang, Jiamu Sheng, Chi Zhang, Gang Yu, Jiayuan Fan, Tao Chen

The development of multimodal models has marked a significant step forward in how machines understand videos.

Video Understanding

Paper
Add Code

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

1 code implementation • 21 Dec 2023 • Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong liu, Gang Yu

This paper presents Paint3D, a novel coarse-to-fine generative framework that is capable of producing high-resolution, lighting-less, and diverse 2K UV texture maps for untextured 3D meshes conditioned on text or image inputs.

505

Paper
Code

AppAgent: Multimodal Agents as Smartphone Users

no code implementations • 21 Dec 2023 • Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, Gang Yu

Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks.

Navigate

Paper
Add Code

M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts

1 code implementation • 17 Dec 2023 • Mingsheng Li, Xin Chen, Chi Zhang, Sijin Chen, Hongyuan Zhu, Fukun Yin, Gang Yu, Tao Chen

Furthermore, we establish a new benchmark for assessing the performance of large models in understanding multi-modal 3D prompts.

Instruction Following

Paper
Code

FaceStudio: Put Your Face Everywhere in Seconds

no code implementations • 5 Dec 2023 • Yuxuan Yan, Chi Zhang, Rui Wang, Yichao Zhou, Gege Zhang, Pei Cheng, Gang Yu, Bin Fu

This study investigates identity-preserving image synthesis, an intriguing task in image generation that seeks to maintain a subject's identity while adding a personalized, stylistic touch.

Image Generation

Paper
Add Code

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

1 code implementation • 30 Nov 2023 • Sijin Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao Chen

However, developing LMMs that can comprehend, reason, and plan in complex and diverse 3D environments remains a challenging topic, especially considering the demand for understanding permutation-invariant point cloud 3D representations of the 3D scene.

3D dense captioning Dense Captioning +1

154

Paper
Code

ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model

1 code implementation • 29 Nov 2023 • Fukun Yin, Xin Chen, Chi Zhang, Biao Jiang, Zibo Zhao, Jiayuan Fan, Gang Yu, Taihao Li, Tao Chen

The advent of large language models, enabling flexibility through instruction-driven approaches, has revolutionized many traditional generative tasks, but large models for 3D data, particularly in comprehensively handling 3D shapes with other modalities, are still under-explored.

3D Shape Generation Language Modelling +1

Paper
Code

ChartLlama: A Multimodal LLM for Chart Understanding and Generation

no code implementations • 27 Nov 2023 • Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, Hanwang Zhang

Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset.

Language Modelling Large Language Model

Paper
Add Code

VQ-NeRF: Vector Quantization Enhances Implicit Neural Representations

no code implementations • 23 Oct 2023 • Yiying Yang, Wen Liu, Fukun Yin, Xin Chen, Gang Yu, Jiayuan Fan, Tao Chen

Recent advancements in implicit neural representations have contributed to high-fidelity surface reconstruction and photorealistic novel view synthesis.

Novel View Synthesis Quantization +1

Paper
Add Code

TapMo: Shape-aware Motion Generation of Skeleton-free Characters

no code implementations • 19 Oct 2023 • Jiaxu Zhang, Shaoli Huang, Zhigang Tu, Xin Chen, Xiaohang Zhan, Gang Yu, Ying Shan

In this work, we present TapMo, a Text-driven Animation Pipeline for synthesizing Motion in a broad spectrum of skeleton-free 3D characters.

Paper
Add Code

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering

no code implementations • ICCV 2023 • Chi Zhang, Wei Yin, Gang Yu, Zhibin Wang, Tao Chen, Bin Fu, Joey Tianyi Zhou, Chunhua Shen

In this paper, we propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.

Monocular Depth Estimation

Paper
Add Code

Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning

1 code implementation • 6 Sep 2023 • Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen

Moreover, we argue that object localization and description generation require different levels of scene understanding, which could be challenging for a shared set of queries to capture.

3D dense captioning Caption Generation +4

Paper
Code

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

1 code implementation • 22 Aug 2023 • YiWen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, Guosheng Lin

Recent strides in Text-to-3D techniques have been propelled by distilling knowledge from powerful large text-to-image diffusion models (LDMs).

3D Generation Text to 3D

201

Paper
Code

StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data

1 code implementation • 20 Aug 2023 • Yanda Li, Chi Zhang, Gang Yu, Zhibin Wang, Bin Fu, Guosheng Lin, Chunhua Shen, Ling Chen, Yunchao Wei

However, these datasets often exhibit domain bias, potentially constraining the generative capabilities of the models.

Ranked #53 on Visual Question Answering on MM-Vet

Visual Question Answering

Paper
Code

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

1 code implementation • ICCV 2023 • Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen

State-of-the-art (SOTA) monocular metric depth estimation methods can only handle a single camera model and are unable to perform mixed-data training due to the metric ambiguity.

Ranked #19 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Image Reconstruction Monocular Depth Estimation +1

591

Paper
Code

Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation

1 code implementation • NeurIPS 2023 • Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, Bin Fu, Tao Chen, Gang Yu, Shenghua Gao

We present a novel alignment-before-generation approach to tackle the challenging task of generating general 3D shapes based on 2D images or texts.

3D Shape Generation

270

Paper
Code

MotionGPT: Human Motion as a Foreign Language

2 code implementations • NeurIPS 2023 • Biao Jiang, Xin Chen, Wen Liu, Jingyi Yu, Gang Yu, Tao Chen

Building upon this "motion vocabulary", we perform language modeling on both motion and text in a unified manner, treating human motion as a specific language.

Language Modelling Motion Captioning +2

1,223

Paper
Code

STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection

1 code implementation • CVPR 2023 • Zhenglin Zhou, Huaxia Li, Hong Liu, Nanyang Wang, Gang Yu, Rongrong Ji

To solve this problem, we propose a Self-adapTive Ambiguity Reduction (STAR) loss by exploiting the properties of semantic ambiguity.

Ranked #1 on Face Alignment on 300W

Face Alignment Facial Landmark Detection

131

Paper
Code

Synchro-Transient-Extracting Transform for the Analysis of Signals with Both Harmonic and Impulsive Components

no code implementations • 2 Jun 2023 • Yunlong Ma, Gang Yu, Tianran Lin, Qingtang Jiang

This allows the technique to generate an energy-concentrated TF representation for both harmonics and impulses in the signal.

Paper
Add Code

StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation

2 code implementations • 30 May 2023 • Chi Zhang, YiWen Chen, Yijun Fu, Zhenglin Zhou, Gang Yu, Billzb Wang, Bin Fu, Tao Chen, Guosheng Lin, Chunhua Shen

The recent advancements in image-text diffusion models have stimulated research interest in large-scale 3D generative models.

3D Generation Attribute +1

498

Paper
Code

Disentangled Pre-training for Image Matting

1 code implementation • 3 Apr 2023 • Yanda Li, Zilong Huang, Gang Yu, Ling Chen, Yunchao Wei, Jianbo Jiao

The pre-training task is designed in a similar manner as image matting, where random trimap and alpha matte are generated to achieve an image disentanglement objective.

Disentanglement Image Matting

Paper
Code

Capturing the motion of every joint: 3D human pose and shape estimation with independent tokens

1 code implementation • 1 Mar 2023 • Sen yang, Wen Heng, Gang Liu, Guozhong Luo, Wankou Yang, Gang Yu

In this paper we present a novel method to estimate 3D human pose and shape from monocular videos.

Ranked #35 on 3D Human Pose Estimation on 3DPW

3D human pose and shape estimation

Paper
Code

SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation

1 code implementation • 30 Jan 2023 • Qiang Wan, Zilong Huang, Jiachen Lu, Gang Yu, Li Zhang

Coupled with a light segmentation head, we achieve the best trade-off between segmentation accuracy and latency on the ARM-based mobile devices on the ADE20K and Cityscapes datasets.

Image Classification Segmentation +1

240

Paper
Code

A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction

no code implementations • ICCV 2023 • Chongshan Lu, Fukun Yin, Xin Chen, Tao Chen, Gang Yu, Jiayuan Fan

Meanwhile, a new benchmark for several outdoor NeRF-based tasks is established, such as novel view synthesis, surface reconstruction, and multi-modal NeRF.

Novel View Synthesis Surface Reconstruction

Paper
Add Code

End-to-End 3D Dense Captioning with Vote2Cap-DETR

1 code implementation • CVPR 2023 • Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang Yu

Compared with prior arts, our framework has several appealing advantages: 1) Without resorting to numerous hand-crafted components, our method is based on a full transformer encoder-decoder architecture with a learnable vote query driven object decoder, and a caption decoder that produces the dense captions in a set-prediction manner.

3D dense captioning Dense Captioning +1

Paper
Code

Executing your Commands via Motion Diffusion in Latent Space

1 code implementation • CVPR 2023 • Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, Jingyi Yu, Gang Yu

We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors.

Ranked #2 on Motion Synthesis on HumanAct12

Motion Synthesis

508

Paper
Code

Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI & AIM 2022 Challenge: Report

no code implementations • 7 Nov 2022 • Andrey Ignatov, Grigory Malivenko, Radu Timofte, Lukasz Treszczotko, Xin Chang, Piotr Ksiazek, Michal Lopuszynski, Maciej Pioro, Rafal Rudnicki, Maciej Smyl, Yujie Ma, Zhenyu Li, Zehui Chen, Jialei Xu, Xianming Liu, Junjun Jiang, XueChao Shi, Difan Xu, Yanan Li, Xiaotao Wang, Lei Lei, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo, Gang Yu, Bin Fu, Jiaqi Li, Yiran Wang, Zihao Huang, Zhiguo Cao, Marcos V. Conde, Denis Sapozhnikov, Byeong Hyun Lee, Dongwon Park, Seongmin Hong, Joonhee Lee, Seunggyu Lee, Se Young Chun

Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks.

Bokeh Effect Rendering Depth Estimation +3

Paper
Add Code

Learning Variational Motion Prior for Video-based Motion Capture

no code implementations • 27 Oct 2022 • Xin Chen, Zhuo Su, Lingbo Yang, Pei Cheng, Lan Xu, Bin Fu, Gang Yu

To improve the generalization capacity of prior space, we propose a transformer-based variational autoencoder pretrained over marker-based 3D mocap data, with a novel style-mapping block to boost the generation quality.

Pose Estimation

Paper
Add Code

Coordinates Are NOT Lonely -- Codebook Prior Helps Implicit Neural 3D Representations

1 code implementation • 20 Oct 2022 • Fukun Yin, Wen Liu, Zilong Huang, Pei Cheng, Tao Chen, Gang Yu

Implicit neural 3D representation has achieved impressive results in surface or scene reconstruction and novel view synthesis, which typically uses the coordinate-based multi-layer perceptrons (MLPs) to learn a continuous scene representation.

Novel View Synthesis

Paper
Code

Hierarchical Normalization for Robust Monocular Depth Estimation

no code implementations • 18 Oct 2022 • Chi Zhang, Wei Yin, Zhibin Wang, Gang Yu, Bin Fu, Chunhua Shen

In this paper, we address monocular depth estimation with deep neural networks.

Monocular Depth Estimation

Paper
Add Code

D&D: Learning Human Dynamics from Dynamic Camera

1 code implementation • 19 Sep 2022 • Jiefeng Li, Siyuan Bian, Chao Xu, Gang Liu, Gang Yu, Cewu Lu

In this work, we present D&D (Learning Human Dynamics from Dynamic Camera), which leverages the laws of physics to reconstruct 3D human motion from the in-the-wild videos with a moving camera.

3D Human Pose Estimation Human Dynamics

107

Paper
Code

Enhancing Quality of Pose-varied Face Restoration with Local Weak Feature Sensing and GAN Prior

no code implementations • 28 May 2022 • Kai Hu, Yu Liu, Renhe Liu, Wei Lu, Gang Yu, Bin Fu

In the asymmetric codec, we adopt a mixed multi-path residual block (MMRB) to gradually extract weak texture features of input images, which can better preserve the original facial features and avoid excessive fantasy.

Blind Face Restoration Super-Resolution

Paper
Add Code

Preparing data for pathological artificial intelligence with clinical-grade performance

no code implementations • 22 May 2022 • Yuanqing Yang, Kai Sun, Yanhua Gao, Kuangsong Wang, Gang Yu

The digital pathology is fundamental of clinical-grade PAI, and the techniques of data standardization and weakly supervised learning methods based on whole slide image (WSI) are effective ways to overcome obstacles of performance reproduction.

Weakly-supervised Learning

Paper
Add Code

Designing thermal radiation metamaterials via hybrid adversarial autoencoder and Bayesian optimization

no code implementations • 26 Apr 2022 • Dezhao Zhu, Jiang Guo, Gang Yu, C. Y. Zhao, Hong Wang, Shenghong Ju

Designing thermal radiation metamaterials is challenging especially for problems with high degrees of freedom and complex objective.

Bayesian Optimization

Paper
Add Code

NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results

2 code implementations • 20 Apr 2022 • Ren Yang, Radu Timofte, Meisong Zheng, Qunliang Xing, Minglang Qiao, Mai Xu, Lai Jiang, Huaida Liu, Ying Chen, Youcheng Ben, Xiao Zhou, Chen Fu, Pei Cheng, Gang Yu, Junyi Li, Renlong Wu, Zhilu Zhang, Wei Shang, Zhengyao Lv, Yunjin Chen, Mingcai Zhou, Dongwei Ren, Kai Zhang, WangMeng Zuo, Pavel Ostyakov, Vyal Dmitry, Shakarim Soltanayev, Chervontsev Sergey, Zhussip Magauiya, Xueyi Zou, Youliang Yan, Pablo Navarrete Michelini, Yunhua Lu, Diankai Zhang, Shaoli Liu, Si Gao, Biao Wu, Chengjian Zheng, Xiaofeng Zhang, Kaidi Lu, Ning Wang, Thuong Nguyen Canh, Thong Bach, Qing Wang, Xiaopeng Sun, Haoyu Ma, Shijie Zhao, Junlin Li, Liangbin Xie, Shuwei Shi, Yujiu Yang, Xintao Wang, Jinjin Gu, Chao Dong, Xiaodi Shi, Chunmei Nian, Dong Jiang, Jucai Lin, Zhihuai Xie, Mao Ye, Dengyan Luo, Liuhan Peng, Shengjie Chen, Qian Wang, Xin Liu, Boyang Liang, Hang Dong, Yuhao Huang, Kai Chen, Xingbei Guo, Yujing Sun, Huilei Wu, Pengxu Wei, Yulin Huang, Junying Chen, Ik Hyun Lee, Sunder Ali Khowaja, Jiseok Yoon

This challenge includes three tracks.

Super-Resolution

Paper
Code

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

3 code implementations • CVPR 2022 • Wenqiang Zhang, Zilong Huang, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen

Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices.

Segmentation Semantic Segmentation

372

Paper
Code

An Energy-concentrated Wavelet Transform for Time Frequency Analysis of Transient Signals

no code implementations • 22 Feb 2022 • Haoran Dong, Gang Yu

Transient signals are often composed of a series of modes that have multivalued time-dependent instantaneous frequency (IF), which brings challenges to the development of signal processing technology.

Paper
Add Code

Attribute-specific Control Units in StyleGAN for Fine-grained Image Manipulation

1 code implementation • 25 Nov 2021 • Rui Wang, Jian Chen, Gang Yu, Li Sun, Changqian Yu, Changxin Gao, Nong Sang

Image manipulation with StyleGAN has been an increasing concern in recent years. Recent works have achieved tremendous success in analyzing several semantic latent spaces to edit the attributes of the generated images. However, due to the limited semantic and spatial manipulation precision in these latent spaces, the existing endeavors are defeated in fine-grained StyleGAN image manipulation, i. e., local attribute translation. To address this issue, we discover attribute-specific control units, which consist of multiple channels of feature maps and modulation styles.

Attribute Image Manipulation

Paper
Code

Fine-grained Identity Preserving Landmark Synthesis for Face Reenactment

no code implementations • 10 Oct 2021 • Haichao Zhang, Youcheng Ben, Weixi Zhang, Tao Chen, Gang Yu, Bin Fu

Recent face reenactment works are limited by the coarse reference landmarks, leading to unsatisfactory identity preserving performance due to the distribution gap between the manipulated landmarks and those sampled from a real person.

Face Reenactment

Paper
Add Code

Sketch Me A Video

no code implementations • 10 Oct 2021 • Haichao Zhang, Gang Yu, Tao Chen, Guozhong Luo

Video creation has been an attractive yet challenging task for artists to explore.

Paper
Add Code

A multi-stage semi-supervised improved deep embedded clustering method for bearing fault diagnosis under the situation of insufficient labeled samples

no code implementations • 28 Sep 2021 • Tongda Sun, Gang Yu

The labeled dataset can be augmented by these pseudo-labeled data and then leveraged to train a bearing fault diagnosis model.

Clustering Deep Clustering

Paper
Add Code

Object-aware Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image Classification

no code implementations • 30 Aug 2021 • Yike Wu, Bo Zhang, Gang Yu, Weixi Zhang, Bin Wang, Tao Chen, Jiayuan Fan

The goal of few-shot fine-grained image classification is to recognize rarely seen fine-grained objects in the query set, given only a few samples of this class in the support set.

Fine-Grained Image Classification Object +3

Paper
Add Code

Identification of Pediatric Respiratory Diseases Using Fine-grained Diagnosis System

no code implementations • 24 Aug 2021 • Gang Yu, Zhongzhi Yu, Yemin Shi, Yingshuo Wang, Xiaoqing Liu, Zheming Li, Yonggen Zhao, Fenglei Sun, Yizhou Yu, Qiang Shu

The first stage structuralizes test results by extracting relevant numerical values from clinical notes, and the disease identification stage provides a diagnosis based on text-form clinical notes and the structured data obtained from the first stage.

Paper
Add Code

Shuffle Transformer with Feature Alignment for Video Face Parsing

no code implementations • 16 Jun 2021 • Rui Zhang, Yang Han, Zilong Huang, Pei Cheng, Guozhong Luo, Gang Yu, Bin Fu

This is a short technical report introducing the solution of the Team TCParser for Short-video Face Parsing Track of The 3rd Person in Context (PIC) Workshop and Challenge at CVPR 2021.

Face Parsing

Paper
Add Code

Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer

4 code implementations • 7 Jun 2021 • Zilong Huang, Youcheng Ben, Guozhong Luo, Pei Cheng, Gang Yu, Bin Fu

In this work, we revisit the spatial shuffle as an efficient way to build connections among windows.

Ranked #45 on Semantic Segmentation on ADE20K val

Image Classification object-detection +3

1,677

Paper
Code

Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report

no code implementations • 17 May 2021 • Andrey Ignatov, Grigory Malivenko, David Plowman, Samarth Shukla, Radu Timofte, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo, Gang Yu, Bin Fu, Yiran Wang, Xingyi Li, Min Shi, Ke Xian, Zhiguo Cao, Jin-Hua Du, Pei-Lin Wu, Chao Ge, Jiaoyang Yao, Fangwen Tu, Bo Li, Jung Eun Yoo, Kwanggyoon Seo, Jialei Xu, Zhenyu Li, Xianming Liu, Junjun Jiang, Wei-Chi Chen, Shayan Joya, Huanhuan Fan, Zhaobing Kang, Ang Li, Tianpeng Feng, Yang Liu, Chuannan Sheng, Jian Yin, Fausto T. Benavide

While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference.

Depth Estimation

Paper
Add Code

Multi-scale super-resolution generation of low-resolution scanned pathological images

1 code implementation • 15 May 2021 • Kai Sun, Yanhua Gao, Ting Xie, Xun Wang, Qingqing Yang, Le Chen, Kuansong Wang, Gang Yu

We design a strategy to scan slides with low resolution (5X) and a super-resolution method is proposed to restore the image details when in diagnosis.

Generative Adversarial Network SSIM +1

Paper
Code

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

7 code implementations • 5 Apr 2020 • Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang

We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation.

Ranked #1 on Real-Time Semantic Segmentation on COCO-Stuff

Real-Time Semantic Segmentation Segmentation

8,252

Paper
Code

Context Prior for Scene Segmentation

2 code implementations • CVPR 2020 • Changqian Yu, Jingbo Wang, Changxin Gao, Gang Yu, Chunhua Shen, Nong Sang

Given an input image and corresponding ground truth, Affinity Loss constructs an ideal affinity map to supervise the learning of Context Prior.

Ranked #1 on Scene Understanding on ADE20K val

Scene Segmentation Scene Understanding +1

245

Paper
Code

High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification

2 code implementations • CVPR 2020 • Guan'an Wang, Shuo Yang, Huanyu Liu, Zhicheng Wang, Yang Yang, Shuliang Wang, Gang Yu, Erjin Zhou, Jian Sun

When aligning two groups of local features from two images, we view it as a graph matching problem and propose a cross-graph embedded-alignment (CGEA) layer to jointly learn and embed topology information to local features, and straightly predict similarity score.

Graph Matching Person Re-Identification +1

498

Paper
Code

State-Aware Tracker for Real-Time Video Object Segmentation

1 code implementation • CVPR 2020 • Xi Chen, Zuoxin Li, Ye Yuan, Gang Yu, Jianxin Shen, Donglian Qi

For higher efficiency, SAT takes advantage of the inter-frame consistency and deals with each target object as a tracklet.

Segmentation Semantic Segmentation +2

811

Paper
Code

Real-Time Semantic Segmentation via Multiply Spatial Fusion Network

no code implementations • 17 Nov 2019 • Haiyang Si, Zhiqiang Zhang, Feifan Lv, Gang Yu, Feng Lu

Specifically, it achieves 77. 1% Mean IOU on the Cityscapes test dataset with the speed of 41 FPS for a 1024*2048 input, and 75. 4% Mean IOU with the speed of 91 FPS on the Camvid test dataset.

Autonomous Driving Playing the Game of 2048 +1

Paper
Add Code

SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines

6 code implementations • 14 Nov 2019 • Yinda Xu, Zeyu Wang, Zuoxin Li, Ye Yuan, Gang Yu

Following these guidelines, we design our Fully Convolutional Siamese tracker++ (SiamFC++) by introducing both classification and target state estimation branch(G1), classification score without ambiguity(G2), tracking without prior knowledge(G3), and estimation quality score(G4).

Ranked #2 on Visual Object Tracking on VOT2017/18 (using extra training data)

Classification General Classification +3

811

Paper
Code

Learnable Tree Filter for Structure-preserving Feature Transform

1 code implementation • NeurIPS 2019 • Lin Song, Yanwei Li, Zeming Li, Gang Yu, Hongbin Sun, Jian Sun, Nanning Zheng

To this end, tree filtering modules are embedded to formulate a unified framework for semantic segmentation.

Semantic Segmentation

140

Paper
Code

Double Anchor R-CNN for Human Detection in a Crowd

no code implementations • 22 Sep 2019 • Kevin Zhang, Feng Xiong, Peize Sun, Li Hu, Boxun Li, Gang Yu

Double Anchor RPN is developed to capture body and head parts in pairs.

Human Detection

Paper
Add Code

Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection

3 code implementations • 26 Aug 2019 • Benjin Zhu, Zhengkai Jiang, Xiangxin Zhou, Zeming Li, Gang Yu

This report presents our method which wins the nuScenes3D Detection Challenge [17] held in Workshop on Autonomous Driving(WAD, CVPR 2019).

Ranked #5 on 3D Object Detection on nuScenes LiDAR only

3D Object Detection Autonomous Driving +1

1,463

Paper
Code

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

6 code implementations • ICCV 2019 • Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen

Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications. In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing.

Ranked #8 on Scene Text Detection on SCUT-CTW1500

Scene Text Detection Segmentation +1

4,068

Paper
Code

Attention-Based Multi-Context Guiding for Few-Shot Semantic Segmentation

no code implementations • Proceedings of the AAAI Conference on Artificial Intelligence 2019 • Tao Hu, Pengwan Yang, Chiliang Zhang, Gang Yu, Yadong Mu, Cees G. M. Snoek

Few-shot learning is a nascent research topic, motivated by the fact that traditional deep learning methods require tremen- dous amounts of data.

Ranked #1 on Few-Shot Semantic Segmentation on Pascal5i

Few-Shot Semantic Segmentation One-Shot Learning +1

Paper
Add Code

TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection

no code implementations • CVPR 2019 • Lin Song, Shiwei Zhang, Gang Yu, Hongbin Sun

In this paper, we define these ambiguous samples as "transitional states", and propose a Transition-Aware Context Network (TACNet) to distinguish transitional states.

Ranked #7 on Action Detection on J-HMDB

Action Detection

Paper
Add Code

ThunderNet: Towards Real-time Generic Object Detection

3 code implementations • 28 Mar 2019 • Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, Jian Sun

In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet.

Ranked #15 on Object Detection on PASCAL VOC 2007

Object object-detection +1

277

Paper
Code

Shape Robust Text Detection with Progressive Scale Expansion Network

19 code implementations • CVPR 2019 • Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao

Due to the fact that there are large geometrical margins among the minimal scale kernels, our method is effective to split the close text instances, making it easier to use segmentation-based methods to detect arbitrary-shaped text instances.

Ranked #12 on Scene Text Detection on SCUT-CTW1500

Optical Character Recognition (OCR) Scene Text Detection +1

38,458

Paper
Code

An End-to-End Network for Panoptic Segmentation

no code implementations • CVPR 2019 • Huanyu Liu, Chao Peng, Changqian Yu, Jingbo Wang, Xu Liu, Gang Yu, Wei Jiang

Panoptic segmentation, which needs to assign a category label to each pixel and segment each object instance simultaneously, is a challenging topic.

Panoptic Segmentation Segmentation

Paper
Add Code

WIDER Face and Pedestrian Challenge 2018: Methods and Results

no code implementations • 19 Feb 2019 • Chen Change Loy, Dahua Lin, Wanli Ouyang, Yuanjun Xiong, Shuo Yang, Qingqiu Huang, Dongzhan Zhou, Wei Xia, Quanquan Li, Ping Luo, Junjie Yan, Jian-Feng Wang, Zuoxin Li, Ye Yuan, Boxun Li, Shuai Shao, Gang Yu, Fangyun Wei, Xiang Ming, Dong Chen, Shifeng Zhang, Cheng Chi, Zhen Lei, Stan Z. Li, Hongkai Zhang, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen, Wu Liu, Boyan Zhou, Huaxiong Li, Peng Cheng, Tao Mei, Artem Kukharenko, Artem Vasenin, Nikolay Sergievskiy, Hua Yang, Liangqi Li, Qiling Xu, Yuan Hong, Lin Chen, Mingjun Sun, Yirong Mao, Shiying Luo, Yongjun Li, Ruiping Wang, Qiaokang Xie, Ziyang Wu, Lei Lu, Yiheng Liu, Wengang Zhou

This paper presents a review of the 2018 WIDER Challenge on Face and Pedestrian.

Face Detection Pedestrian Detection +2

Paper
Add Code

Rethinking on Multi-Stage Networks for Human Pose Estimation

7 code implementations • 1 Jan 2019 • Wenbo Li, Zhicheng Wang, Binyi Yin, Qixiang Peng, Yuming Du, Tianzi Xiao, Gang Yu, Hongtao Lu, Yichen Wei, Jian Sun

Existing pose estimation approaches fall into two categories: single-stage and multi-stage methods.

Ranked #1 on Pose Estimation on COCO minival

Keypoint Detection

4,995

Paper
Code

Scene Text Detection with Supervised Pyramid Context Network

2 code implementations • 21 Nov 2018 • Enze Xie, Yuhang Zang, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li

We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.

Ranked #2 on Scene Text Detection on ICDAR 2013

Instance Segmentation Scene Text Detection +2

120

Paper
Code

Modeling Local Geometric Structure of 3D Point Clouds using Geo-CNN

2 code implementations • CVPR 2019 • Shiyi Lan, Ruichi Yu, Gang Yu, Larry S. Davis

This encourages the network to preserve the geometric structure in Euclidean space throughout the feature extraction hierarchy.

Modeling Local Geometric Structure

Paper
Code

DetNet: Design Backbone for Object Detection

no code implementations • ECCV 2018 • Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun

(1) Recent object detectors like FPN and RetinaNet usually involve extra stages against the task of image classification to handle the objects with various scales.

Classification General Classification +7

Paper
Add Code

Associating Inter-Image Salient Instances for Weakly Supervised Semantic Segmentation

no code implementations • ECCV 2018 • Ruochen Fan, Qibin Hou, Ming-Ming Cheng, Gang Yu, Ralph R. Martin, Shi-Min Hu

We also combine our method with Mask R-CNN for instance segmentation, and demonstrated for the first time the ability of weakly supervised instance segmentation using only keyword annotations.

Ranked #4 on Image-level Supervised Instance Segmentation on COCO test-dev (using extra training data)

Clustering graph partitioning +6

Paper
Add Code

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

18 code implementations • ECCV 2018 • Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang

Semantic segmentation requires both rich spatial information and sizeable receptive field.

Ranked #4 on Semantic Segmentation on SkyScapes-Dense

Dichotomous Image Segmentation Real-Time Semantic Segmentation +2

8,252

Paper
Code

CrowdHuman: A Benchmark for Detecting Human in a Crowd

1 code implementation • 30 Apr 2018 • Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, Jian Sun

There are a total of $470K$ human instances from the train and validation subsets, and $~22. 6$ persons per image, with various kinds of occlusions in the dataset.

Ranked #7 on Pedestrian Detection on Caltech (using extra training data)

Human Detection Object Detection +1

Paper
Code

Learning a Discriminative Feature Network for Semantic Segmentation

3 code implementations • CVPR 2018 • Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang

Most existing methods of semantic segmentation still suffer from two aspects of challenges: intra-class inconsistency and inter-class indistinction.

Ranked #5 on Semantic Segmentation on PASCAL VOC 2012 test

Semantic Segmentation Thermal Image Segmentation

1,399

Paper
Code

SFace: An Efficient Network for Face Detection in Large Scale Variations

no code implementations • 18 Apr 2018 • Jianfeng Wang, Ye Yuan, Boxun Li, Gang Yu, Sun Jian

A new dataset called 4K-Face is also introduced to evaluate the performance of face detection with extreme large scale variations.

4k Face Detection +1

Paper
Add Code

DetNet: A Backbone network for Object Detection

2 code implementations • 17 Apr 2018 • Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun

Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection.

Classification General Classification +7

Paper
Code

SOT for MOT

no code implementations • 4 Dec 2017 • Qizheng He, Jia-Nan Wu, Gang Yu, Chi Zhang

Another contribution is that we show with a deep learning based appearance model, it is easy to associate detections of the same object efficiently and also with high accuracy.

Multiple Object Tracking Object

Paper
Add Code

Light-Head R-CNN: In Defense of Two-Stage Object Detector

5 code implementations • 20 Nov 2017 • Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun

More importantly, simply replacing the backbone with a tiny network (e. g, Xception), our Light-Head R-CNN gets 30. 7 mmAP at 102 FPS on COCO, significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy.

Vocal Bursts Valence Prediction

186

Paper
Code

Face Attention Network: An Effective Face Detector for the Occluded Faces

1 code implementation • 20 Nov 2017 • Jianfeng Wang, Ye Yuan, Gang Yu

The performance of face detection has been largely improved with the development of convolutional neural network.

Ranked #1 on Occluded Face Detection on MAFA

Data Augmentation Occluded Face Detection

314

Paper
Code

Cascaded Pyramid Network for Multi-Person Pose Estimation

5 code implementations • CVPR 2018 • Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun

In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these "hard" keypoints.

Ranked #4 on Keypoint Detection on COCO test-challenge

Keypoint Detection Multi-Person Pose Estimation

790

Paper
Code

MegDet: A Large Mini-Batch Object Detector

6 code implementations • CVPR 2018 • Chao Peng, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu, Jian Sun

The improvements in recent CNN-based object detection works, from R-CNN [11], Fast/Faster R-CNN [10, 31] to recent Mask R-CNN [14] and RetinaNet [24], mainly come from new network, new framework, or novel loss design.

Object object-detection +1

4,835

Paper
Code

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network

2 code implementations • CVPR 2017 • Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun

One of recent trends [30, 31, 14] in network architec- ture design is stacking small filters (e. g., 1x1 or 3x3) in the entire network because the stacked small filters is more ef- ficient than a large kernel, given the same computational complexity.

Ranked #8 on Semantic Segmentation on PASCAL VOC 2012 val

Semantic Segmentation

1,564

Paper
Code

Fast Action Proposals for Human Action Detection and Search

no code implementations • CVPR 2015 • Gang Yu, Junsong Yuan

Assuming each action is performed by a human with meaningful motion, both appearance and motion cues are utilized to measure the actionness of the video tubes.

Action Detection Video Segmentation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.