Search Results for author: Shenghua Gao

Found 79 papers, 46 papers with code

P²Net: Patch-match and Plane-regularization for Unsupervised Indoor Depth Estimation

1 code implementation • ECCV 2020 • Zehao Yu, Lei Jin, Shenghua Gao

The task is extremely challenging because of the vast areas of non-texture regions in these scenes.

Depth Estimation Superpixels

148

Paper
Code

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

2 code implementations • 10 Apr 2024 • Jiale Xu, Weihao Cheng, Yiming Gao, Xintao Wang, Shenghua Gao, Ying Shan

We present InstantMesh, a feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability.

Image to 3D

698

Paper
Code

2D Gaussian Splatting for Geometrically Accurate Radiance Fields

no code implementations • 26 Mar 2024 • Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, Shenghua Gao

3D Gaussian Splatting (3DGS) has recently revolutionized radiance field reconstruction, achieving high quality novel view synthesis and fast rendering speed without baking.

Novel View Synthesis

Paper
Add Code

Bridging 3D Gaussian and Mesh for Freeview Video Rendering

no code implementations • 18 Mar 2024 • Yuting Xiao, Xuan Wang, Jiafei Li, Hongrui Cai, Yanbo Fan, Nan Xue, Minghui Yang, Yujun Shen, Shenghua Gao

To this end, we propose a novel approach, GauMesh, to bridge the 3D Gaussian and Mesh for modeling and rendering the dynamic scenes.

Novel View Synthesis

Paper
Add Code

MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

2 code implementations • 19 Jan 2024 • Chenyu Wang, Weixin Luo, Qianyu Chen, Haonan Mai, Jindi Guo, Sixun Dong, Xiaohua, Xuan, Zhengxin Li, Lin Ma, Shenghua Gao

Recently, the astonishing performance of large language models (LLMs) in natural language comprehension and generation tasks triggered lots of exploration of using them as central controllers to build agent systems.

Language Modelling Large Language Model

Paper
Code

Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network

1 code implementation • 25 Nov 2023 • Wenqiao Li, Xiaohao Xu, Yao Gu, Bozhong Zheng, Shenghua Gao, Yingna Wu

During testing, the point cloud repeatedly goes through the Mask Reconstruction Network, with each iteration's output becoming the next input.

3D Anomaly Detection Representation Learning +1

Paper
Code

TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding

1 code implementation • 6 Nov 2023 • Shuo Wang, Jing Li, Zibo Zhao, Dongze Lian, Binbin Huang, Xiaomei Wang, Zhengxin Li, Shenghua Gao

Holistic scene understanding includes semantic segmentation, surface normal estimation, object boundary detection, depth estimation, etc.

Boundary Detection Depth Estimation +5

Paper
Code

RoomDesigner: Encoding Anchor-latents for Style-consistent and Shape-compatible Indoor Scene Generation

1 code implementation • 16 Oct 2023 • Yiqun Zhao, Zibo Zhao, Jing Li, Sixun Dong, Shenghua Gao

Indoor scene generation aims at creating shape-compatible, style-consistent furniture arrangements within a spatially reasonable layout.

Quantization Scene Generation

Paper
Code

LivelySpeaker: Towards Semantic-Aware Co-Speech Gesture Generation

1 code implementation • ICCV 2023 • YiHao Zhi, Xiaodong Cun, Xuelin Chen, Xi Shen, Wen Guo, Shaoli Huang, Shenghua Gao

While previous methods are able to generate speech rhythm-synchronized gestures, the semantic context of the speech is generally lacking in the gesticulations.

Gesture Generation

Paper
Code

DebSDF: Delving into the Details and Bias of Neural Indoor Scene Reconstruction

no code implementations • 29 Aug 2023 • Yuting Xiao, Jingwei Xu, Zehao Yu, Shenghua Gao

This paper presents \textbf{DebSDF} to address these challenges, focusing on the utilization of uncertainty in monocular priors and the bias in SDF-based volume rendering.

Indoor Scene Reconstruction Surface Reconstruction

Paper
Add Code

Revisiting Event-based Video Frame Interpolation

no code implementations • 24 Jul 2023 • Jiaben Chen, Yichen Zhu, Dongze Lian, Jiaqi Yang, Yifu Wang, Renrui Zhang, Xinhang Liu, Shenhan Qian, Laurent Kneip, Shenghua Gao

We therefore propose to incorporate RGB information in an event-guided optical flow refinement strategy.

Optical Flow Estimation Video Frame Interpolation

Paper
Add Code

Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation

1 code implementation • NeurIPS 2023 • Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, Bin Fu, Tao Chen, Gang Yu, Shenghua Gao

We present a novel alignment-before-generation approach to tackle the challenging task of generating general 3D shapes based on 2D images or texts.

3D Shape Generation

270

Paper
Code

InstructP2P: Learning to Edit 3D Point Clouds with Text Instructions

no code implementations • 12 Jun 2023 • Jiale Xu, Xintao Wang, Yan-Pei Cao, Weihao Cheng, Ying Shan, Shenghua Gao

Enhancing AI systems to perform tasks following human instructions can significantly boost productivity.

Language Modelling Large Language Model

Paper
Add Code

Omni-Line-of-Sight Imaging for Holistic Shape Reconstruction

no code implementations • 21 Apr 2023 • Binbin Huang, Xingyue Peng, Siyuan Shen, Suan Xia, Ruiqian Li, Yanhua Yu, Yuehan Wang, Shenghua Gao, Wenzheng Chen, Shiying Li, Jingyi Yu

The core of our method is to put the object nearby diffuse walls and augment the LOS scan in the front view with the NLOS scans from the surrounding walls, which serve as virtual ``mirrors'' to trap lights toward the object.

Object

Paper
Add Code

Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos

1 code implementation • CVPR 2023 • Sixun Dong, Huazhang Hu, Dongze Lian, Weixin Luo, Yicheng Qian, Shenghua Gao

Sequential video understanding, as an emerging video understanding task, has driven lots of researchers' attention because of its goal-oriented nature.

Representation Learning Sentence +1

Paper
Code

High-level Feature Guided Decoding for Semantic Segmentation

no code implementations • 15 Mar 2023 • Ye Huang, Di Kang, Shenghua Gao, Wen Li, Lixin Duan

One crucial design of the HFG is to protect the high-level features from being contaminated by using proper stop-gradient operations so that the backbone does not update according to the noisy gradient from the upsampler.

Semantic Segmentation Vocal Bursts Intensity Prediction

Paper
Add Code

P$^2$SDF for Neural Indoor Scene Reconstruction

no code implementations • 1 Mar 2023 • Jing Li, Jinpeng Yu, Ruoyu Wang, Zhengxin Li, Zhengyu Zhang, Lina Cao, Shenghua Gao

As the unsupervised plane segments are usually noisy and inaccurate, we propose to assign different weights to the sampled points on the plane in plane estimation as well as the regularization loss.

Indoor Scene Reconstruction Surface Reconstruction

Paper
Add Code

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models

no code implementations • CVPR 2023 • Jiale Xu, Xintao Wang, Weihao Cheng, Yan-Pei Cao, Ying Shan, XiaoHu Qie, Shenghua Gao

Specifically, we first generate a high-quality 3D shape from the input text in the text-to-shape stage as a 3D shape prior.

Image Generation Text to 3D +1

Paper
Add Code

VertMatch: A Semi-supervised Framework for Vertebral Structure Detection in 3D Ultrasound Volume

no code implementations • 28 Dec 2022 • Hongye Zeng, Kang Zhou, Songhan Ge, Yuchong Gao, Jianhao Zhao, Shenghua Gao, Rui Zheng

We propose VertMatch, a two-step framework to detect vertebral structures in 3D ultrasound volume by utilizing unlabeled data in semi-supervised manner.

Paper
Add Code

Lifelong Person Re-Identification via Knowledge Refreshing and Consolidation

1 code implementation • 29 Nov 2022 • Chunlin Yu, Ye Shi, Zimo Liu, Shenghua Gao, Jingya Wang

Lifelong person re-identification (LReID) is in significant demand for real-world development as a large amount of ReID data is captured from diverse locations over time and cannot be accessed at once inherently.

Continual Learning Person Re-Identification

Paper
Code

ResNeRF: Geometry-Guided Residual Neural Radiance Field for Indoor Scene Novel View Synthesis

no code implementations • 26 Nov 2022 • Yuting Xiao, Yiqun Zhao, Yanyu Xu, Shenghua Gao

In the first stage, we focus on geometry reconstruction based on SDF representation, which would lead to a good geometry surface of the scene and also a sharp density.

Novel View Synthesis

Paper
Add Code

PlaneDepth: Self-supervised Depth Estimation via Orthogonal Planes

1 code implementation • CVPR 2023 • Ruoyu Wang, Zehao Yu, Shenghua Gao

PlaneDepth estimates the depth distribution using a Laplacian Mixture Model based on orthogonal planes for an input image.

Ranked #3 on Monocular Depth Estimation on KITTI Eigen split unsupervised

Autonomous Driving Data Augmentation +1

Paper
Code

Dual-Space NeRF: Learning Animatable Avatars and Scene Lighting in Separate Spaces

1 code implementation • 31 Aug 2022 • YiHao Zhi, Shenhan Qian, Xinhao Yan, Shenghua Gao

Previous methods alleviate the inconsistency of lighting by learning a per-frame embedding, but this operation does not generalize to unseen poses.

Paper
Code

UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation

1 code implementation • 20 Jul 2022 • Shenhan Qian, Jiale Xu, Ziwei Liu, Liqian Ma, Shenghua Gao

We propose united implicit functions (UNIF), a part-based method for clothed human reconstruction and animation with raw scans and skeletons as the input.

Position

Paper
Code

PREF: Phasorial Embedding Fields for Compact Neural Representations

1 code implementation • 26 May 2022 • Binbin Huang, Xinhao Yan, Anpei Chen, Shenghua Gao, Jingyi Yu

We present an efficient frequency-based neural representation termed PREF: a shallow MLP augmented with a phasor volume that covers significant border spectra than previous Fourier feature mapping or Positional Encoding.

109

Paper
Code

DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

no code implementations • CVPR 2022 • Xianing Chen, Qiong Cao, Yujie Zhong, Jing Zhang, Shenghua Gao, DaCheng Tao

Our DearKD is a two-stage framework that first distills the inductive biases from the early intermediate layers of a CNN and then gives the transformer full play by training without distillation.

Knowledge Distillation

Paper
Add Code

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

1 code implementation • CVPR 2022 • Huazhang Hu, Sixun Dong, Yiqun Zhao, Dongze Lian, Zhengxin Li, Shenghua Gao

Existing methods focus on performing repetitive action counting in short videos, which is tough for dealing with longer videos in more realistic scenarios.

Ranked #2 on Repetitive Action Counting on RepCount

Repetitive Action Counting

107

Paper
Code

Taylor3DNet: Fast 3D Shape Inference With Landmark Points Based Taylor Series

no code implementations • 18 Jan 2022 • Yuting Xiao, Jiale Xu, Shenghua Gao

Taylor3DNet exploits a set of discrete landmark points and their corresponding Taylor series coefficients to represent the implicit field of a 3D shape, and the number of landmark points is independent of the resolution of the iso-surface extraction.

3D Shape Reconstruction 3D Shape Representation

Paper
Add Code

SVIP: Sequence VerIfication for Procedures in Videos

1 code implementation • CVPR 2022 • Yicheng Qian, Weixin Luo, Dongze Lian, Xu Tang, Peilin Zhao, Shenghua Gao

In this paper, we propose a novel sequence verification task that aims to distinguish positive video pairs performing the same action sequence from negative ones with step-level transformations but still conducting the same task.

Action Detection Action Recognition

Paper
Code

Raw Bayer Pattern Image Synthesis for Computer Vision-oriented Image Signal Processing Pipeline Design

no code implementations • 25 Oct 2021 • Wei Zhou, Xiangyu Zhang, Hongyu Wang, Shenghua Gao, Xin Lou

It is shown that by adding another transformation, the proposed method is able to synthesize high-quality RAW Bayer images with arbitrary size.

Demosaicking Image Generation +3

Paper
Add Code

Cross-domain Trajectory Prediction with CTP-Net

no code implementations • 22 Oct 2021 • Pingxuan Huang, Zhenhua Cui, Jing Li, Shenghua Gao, Bo Hu, Yanyan Fang

Further, considering the consistency between the observed and the predicted trajectories, a target domain offset discriminator is utilized to adversarially regularize the future trajectory predictions to be in line with the observed trajectories.

Domain Adaptation Pedestrian Trajectory Prediction +1

Paper
Add Code

MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation

1 code implementation • 21 Oct 2021 • Yepeng Liu, Zaiwang Gu, Shenghua Gao, Dong Wang, Yusheng Zeng, Jun Cheng

Very often, the pose is estimated after the face detection.

Face Detection Face Recognition +1

132

Paper
Code

Proxy-bridged Image Reconstruction Network for Anomaly Detection in Medical Images

no code implementations • 5 Oct 2021 • Kang Zhou, Jing Li, Weixin Luo, Zhengxin Li, Jianlong Yang, Huazhu Fu, Jun Cheng, Jiang Liu, Shenghua Gao

To mitigate this problem, in this paper, we propose a novel Proxy-bridged Image Reconstruction Network (ProxyAno) for anomaly detection in medical images.

Anomaly Detection Image Reconstruction

Paper
Add Code

OH-Former: Omni-Relational High-Order Transformer for Person Re-Identification

no code implementations • 23 Sep 2021 • Xianing Chen, Chunlin Xu, Qiong Cao, Jialang Xu, Yujie Zhong, Jiale Xu, Zhengxin Li, Jingya Wang, Shenghua Gao

Transformers have shown preferable performance on many vision tasks.

Person Re-Identification Vocal Bursts Intensity Prediction

Paper
Add Code

Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates

1 code implementation • ICCV 2021 • Shenhan Qian, Zhi Tu, YiHao Zhi, Wen Liu, Shenghua Gao

Co-speech gesture generation is to synthesize a gesture sequence that not only looks real but also matches with the input speech audio.

Gesture Generation

Paper
Code

AS-MLP: An Axial Shifted MLP Architecture for Vision

2 code implementations • ICLR 2022 • Dongze Lian, Zehao Yu, Xing Sun, Shenghua Gao

Our proposed AS-MLP obtains 51. 5 mAP on the COCO validation set and 49. 5 MS mIoU on the ADE20K dataset, which is competitive compared to the transformer-based architectures.

Ranked #13 on Semantic Segmentation on DensePASS

object-detection Object Detection +1

160

Paper
Code

Prior Based Human Completion

no code implementations • CVPR 2021 • Zibo Zhao, Wen Liu, Yanyu Xu, Xianing Chen, Weixin Luo, Lei Jin, Bohui Zhu, Tong Liu, Binqiang Zhao, Shenghua Gao

One is a structure prior, it uses a human parsing map to represent the human body structure.

Human Parsing

Paper
Add Code

Look Before You Leap: Learning Landmark Features for One-Stage Visual Grounding

1 code implementation • CVPR 2021 • Binbin Huang, Dongze Lian, Weixin Luo, Shenghua Gao

Then we combine the contextual information from the landmark feature convolution module with the target's visual features for grounding.

Descriptive Object +1

Paper
Code

Layout-Guided Novel View Synthesis from a Single Indoor Panorama

1 code implementation • CVPR 2021 • Jiale Xu, Jia Zheng, Yanyu Xu, Rui Tang, Shenghua Gao

Then, we leverage the room layout prior, a strong structural constraint of the indoor scene, to guide the generation of target views.

Novel View Synthesis

Paper
Code

Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild

1 code implementation • CVPR 2021 • Zhaoyuan Yin, Jia Zheng, Weixin Luo, Shenhan Qian, Hanling Zhang, Shenghua Gao

This paper proposes a framework for the interactive video object segmentation (VOS) in the wild where users can choose some frames for annotations iteratively.

Interactive Video Object Segmentation

Paper
Code

Crowd Counting With Partial Annotations in an Image

1 code implementation • ICCV 2021 • Yanyu Xu, Ziming Zhong, Dongze Lian, Jing Li, Zhengxin Li, Xinxing Xu, Shenghua Gao

To fully leverage the data captured from different scenes with different view angles while reducing the annotation cost, this paper studies a novel crowd counting setting, i. e. only using partial annotations in each image as training data.

Active Learning Crowd Counting

Paper
Code

Amodal Segmentation Based on Visible Region Segmentation and Shape Prior

1 code implementation • 10 Dec 2020 • Yuting Xiao, Yanyu Xu, Ziming Zhong, Weixin Luo, Jiawei Li, Shenghua Gao

In this way, features corresponding to background and occlusion can be suppressed for amodal mask estimation.

Segmentation

Paper
Code

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

2 code implementations • 18 Nov 2020 • Wen Liu, Zhixin Piao, Zhi Tu, Wenhan Luo, Lin Ma, Shenghua Gao

Also, we build a new dataset, namely iPER dataset, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis.

Denoising Image Generation +1

2,425

Paper
Code

SIRI: Spatial Relation Induced Network For Spatial Description Resolution

no code implementations • NeurIPS 2020 • Peiyao Wang, Weixin Luo, Yanyu Xu, Haojie Li, Shugong Xu, Jianyu Yang, Shenghua Gao

Spatial Description Resolution, as a language-guided localization task, is proposed for target location in a panoramic street view, given corresponding language descriptions.

Relation

Paper
Add Code

Encoding Structure-Texture Relation with P-Net for Anomaly Detection in Retinal Images

1 code implementation • ECCV 2020 • Kang Zhou, Yuting Xiao, Jianlong Yang, Jun Cheng, Wen Liu, Weixin Luo, Zaiwang Gu, Jiang Liu, Shenghua Gao

In the end, we further utilize the reconstructed image to extract the structure and measure the difference between structure extracted from original and the reconstructed image.

Anatomy Anomaly Detection +2

Paper
Code

P$^{2}$Net: Patch-match and Plane-regularization for Unsupervised Indoor Depth Estimation

1 code implementation • 15 Jul 2020 • Zehao Yu, Lei Jin, Shenghua Gao

Furthermore, because those textureless regions in indoor scenes (e. g., wall, floor, roof, \etc) usually correspond to planar regions, we propose to leverage superpixels as a plane prior.

Ranked #5 on Monocular Depth Estimation on NYU-Depth V2 self-supervised

Monocular Depth Estimation Superpixels

148

Paper
Code

Learning to Parse Wireframes in Images of Man-Made Environments

1 code implementation • CVPR 2018 • Kun Huang, Yifan Wang, Zihan Zhou, Tianjiao Ding, Shenghua Gao, Yi Ma

To this end, we have built a very large new dataset of over 5, 000 images with wireframes thoroughly labelled by humans.

3D Reconstruction Junction Detection +1

203

Paper
Code

Towards Fast Adaptation of Neural Architectures with Meta Learning

1 code implementation • ICLR 2020 • Dongze Lian, Yin Zheng, Yintao Xu, Yanxiong Lu, Leyu Lin, Peilin Zhao, Junzhou Huang, Shenghua Gao

Recently, Neural Architecture Search (NAS) has been successfully applied to multiple artificial intelligence areas and shows better performance compared with hand-designed networks.

Few-Shot Learning Neural Architecture Search

Paper
Code

Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement

1 code implementation • CVPR 2020 • Zehao Yu, Shenghua Gao

On one hand, the high-resolution depth map, the data-adaptive propagation method and the Gauss-Newton layer jointly guarantee the effectiveness of our method.

Depth Estimation

243

Paper
Code

BioNet: Infusing Biomarker Prior into Global-to-Local Network for Choroid Segmentation in Optical Coherence Tomography Images

no code implementations • 11 Dec 2019 • Huihong Zhang, Jianlong Yang, Kang Zhou, Zhenjie Chai, Jun Cheng, Shenghua Gao, Jiang Liu

Firstly, our method trains a biomarker prediction network to learn the features of the biomarker.

Segmentation

Paper
Add Code

Sparse-GAN: Sparsity-constrained Generative Adversarial Network for Anomaly Detection in Retinal OCT Image

no code implementations • 28 Nov 2019 • Kang Zhou, Shenghua Gao, Jun Cheng, Zaiwang Gu, Huazhu Fu, Zhi Tu, Jianlong Yang, Yitian Zhao, Jiang Liu

With the development of convolutional neural network, deep learning has shown its success for retinal disease detection from optical coherence tomography (OCT) images.

Anomaly Detection Generative Adversarial Network

Paper
Add Code

Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis

2 code implementations • ICCV 2019 • Wen Liu, Zhixin Piao, Jie Min, Wenhan Luo, Lin Ma, Shenghua Gao

In this paper, we propose to use a 3D body mesh recovery module to disentangle the pose and shape, which can not only model the joint location and rotation but also characterize the personalized body shape.

Denoising Novel View Synthesis

1,727

Paper
Code

SkrGAN: Sketching-rendering Unconditional Generative Adversarial Networks for Medical Image Synthesis

no code implementations • 6 Aug 2019 • Tianyang Zhang, Huazhu Fu, Yitian Zhao, Jun Cheng, Mengjie Guo, Zaiwang Gu, Bing Yang, Yuting Xiao, Shenghua Gao, Jiang Liu

Generative Adversarial Networks (GANs) have the capability of synthesizing images, which have been successfully applied to medical image synthesis tasks.

Computed Tomography (CT) Data Augmentation +6

Paper
Add Code

Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

1 code implementation • ECCV 2020 • Jia Zheng, Junfei Zhang, Jing Li, Rui Tang, Shenghua Gao, Zihan Zhou

Recently, there has been growing interest in developing learning-based methods to detect and utilize salient semi-global or global structures, such as junctions, lines, planes, cuboids, smooth surfaces, and all types of symmetries, for 3D scene modeling and understanding.

Room Layout Estimation

495

Paper
Code

Locality-constrained Spatial Transformer Network for Video Crowd Counting

1 code implementation • 18 Jul 2019 • Yanyan Fang, Biyun Zhan, Wandi Cai, Shenghua Gao, Bo Hu

Then to relate the density maps between neighbouring frames, a Locality-constrained Spatial Transformer (LST) module is introduced to estimate the density map of next frame with that of current frame.

Crowd Counting Translation

Paper
Code

Believe It or Not, We Know What You Are Looking at!

1 code implementation • 4 Jul 2019 • Dongze Lian, Zehao Yu, Shenghua Gao

There are two merits for our two-stage solution based gaze following: i) our solution mimics the behavior of human in gaze following, therefore it is more psychological plausible; ii) besides using heatmap to supervise the output of our network, we can also leverage gaze direction to facilitate the training of gaze direction pathway, therefore our network can be more robustly trained.

100

Paper
Code

Learning Semantics-aware Distance Map with Semantics Layering Network for Amodal Instance Segmentation

1 code implementation • 30 May 2019 • Ziheng Zhang, Anpei Chen, Ling Xie, Jingyi Yu, Shenghua Gao

Specifically, we first introduce a new representation, namely a semantics-aware distance map (sem-dist map), to serve as our target for amodal segmentation instead of the commonly used masks and heatmaps.

Amodal Instance Segmentation Segmentation +1

Paper
Code

Domain Adversarial Reinforcement Learning for Partial Domain Adaptation

no code implementations • 10 May 2019 • Jin Chen, Xinxiao wu, Lixin Duan, Shenghua Gao

In this more general and practical scenario, a major challenge is how to select source instances in the shared classes across different domains for positive transfer.

Partial Domain Adaptation Q-Learning +2

Paper
Add Code

PPGNet: Learning Point-Pair Graph for Line Segment Detection

1 code implementation • CVPR 2019 • Ziheng Zhang, Zhengxin Li, Ning Bi, Jia Zheng, Jinlei Wang, Kun Huang, Weixin Luo, Yanyu Xu, Shenghua Gao

In this paper, we present a novel framework to detect line segments in man-made environments.

Line Segment Detection

173

Paper
Code

Generic Multiview Visual Tracking

no code implementations • 4 Apr 2019 • Minye Wu, Haibin Ling, Ning Bi, Shenghua Gao, Hao Sheng, Jingyi Yu

A natural solution to these challenges is to use multiple cameras with multiview inputs, though existing systems are mostly limited to specific targets (e. g. human), static cameras, and/or camera calibration.

Camera Calibration Trajectory Prediction +1

Paper
Add Code

CE-Net: Context Encoder Network for 2D Medical Image Segmentation

3 code implementations • 7 Mar 2019 • Zaiwang Gu, Jun Cheng, Huazhu Fu, Kang Zhou, Huaying Hao, Yitian Zhao, Tianyang Zhang, Shenghua Gao, Jiang Liu

In this paper, we propose a context encoder network (referred to as CE-Net) to capture more high-level information and preserve spatial information for 2D medical image segmentation.

Ranked #1 on Optic Disc Segmentation on Messidor

Cell Segmentation Image Segmentation +4

223

Paper
Code

Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding

1 code implementation • CVPR 2019 • Zehao Yu, Jia Zheng, Dongze Lian, Zihan Zhou, Shenghua Gao

In the first stage, we train a CNN to map each pixel to an embedding space where pixels from the same plane instance have similar embeddings.

Ranked #1 on Plane Instance Segmentation on NYU Depth v2

3D Plane Detection 3D Reconstruction +4

354

Paper
Code

Deep Surface Light Fields

no code implementations • 15 Oct 2018 • Anpei Chen, Minye Wu, Yingliang Zhang, Nianyi Li, Jie Lu, Shenghua Gao, Jingyi Yu

A surface light field represents the radiance of rays originating from any points on the surface in any directions.

Data Compression Image Registration

Paper
Add Code

Evaluating Capability of Deep Neural Networks for Image Classification via Information Plane

no code implementations • ECCV 2018 • Hao Cheng, Dongze Lian, Shenghua Gao, Yanlin Geng

Inspired by the pioneering work of information bottleneck principle for Deep Neural Networks (DNNs) analysis, we design an information plane based framework to evaluate the capability of DNNs for image classification tasks, which not only helps understand the capability of DNNs, but also helps us choose a neural network which leads to higher classification accuracy more efficiently.

General Classification Image Classification +1

Paper
Add Code

Multi-Cell Multi-Task Convolutional Neural Networks for Diabetic Retinopathy Grading

no code implementations • 31 Aug 2018 • Kang Zhou, Zaiwang Gu, Wen Liu, Weixin Luo, Jun Cheng, Shenghua Gao, Jiang Liu

To considering the relationships of images with different stages, we propose a \textbf{Multi-Task} learning strategy which predicts the label with both classification and regression.

Diabetic Retinopathy Grading General Classification +1

Paper
Add Code

Gaze Prediction in Dynamic 360Â° Immersive Videos

no code implementations • CVPR 2018 • Yanyu Xu, Yanbing Dong, Junru Wu, Zhengzhong Sun, Zhiru Shi, Jingyi Yu, Shenghua Gao

This paper explores gaze prediction in dynamic $360^circ$ immersive videos, emph{i. e.}, based on the history scan path and VR contents, we predict where a viewer will look at an upcoming time.

Gaze Prediction

Paper
Add Code

Face Aging With Identity-Preserved Conditional Generative Adversarial Networks

2 code implementations • CVPR 2018 • Zongwei Wang, Xu Tang, Weixin Luo, Shenghua Gao

By grouping faces with target age together, the objective of face aging is equivalent to transferring aging patterns of faces within the target age group to the face whose aged face is to be synthesized.

285

Paper
Code

Future Frame Prediction for Anomaly Detection â A New Baseline

1 code implementation • CVPR 2018 • Wen Liu, Weixin Luo, Dongze Lian, Shenghua Gao

To predict a future frame with higher quality for normal events, other than the commonly used appearance (spatial) constraints on intensity and gradient, we also introduce a motion (temporal) constraint in video prediction by enforcing the optical flow between predicted frames and ground truth frames to be consistent, and this is the first work that introduces a temporal constraint into the video prediction task.

Anomaly Detection Optical Flow Estimation +1

426

Paper
Code

Encoding Crowd Interaction With Deep Neural Network for Pedestrian Trajectory Prediction

1 code implementation • CVPR 2018 • Yanyu Xu, Zhixin Piao, Shenghua Gao

Specifically, motivated by the residual learning in deep learning, we propose to predict displacement between neighboring frames for each pedestrian sequentially.

Pedestrian Trajectory Prediction Trajectory Prediction

Paper
Code

Future Frame Prediction for Anomaly Detection -- A New Baseline

1 code implementation • 28 Dec 2017 • Wen Liu, Weixin Luo, Dongze Lian, Shenghua Gao

Ranked #2 on Traffic Accident Detection on SA

Anomaly Detection Optical Flow Estimation +2

426

Paper
Code

Personalized Saliency and its Prediction

1 code implementation • 9 Oct 2017 • Yanyu Xu, Shenghua Gao, Junru Wu, Nianyi Li, Jingyi Yu

Specifically, we propose to decompose a personalized saliency map (referred to as PSM) into a universal saliency map (referred to as USM) predictable by existing saliency detection models and a new discrepancy map across users that characterizes personalized saliency.

Saliency Detection

Paper
Code

A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework

1 code implementation • ICCV 2017 • Weixin Luo, Wen Liu, Shenghua Gao

Motivated by the capability of sparse coding based anomaly detection, we propose a Temporally-coherent Sparse Coding (TSC) where we enforce similar neighbouring frames be encoded with similar reconstruction coefficients.

Ranked #22 on Anomaly Detection on ShanghaiTech

Anomaly Detection

Paper
Code

Graph Construction with Label Information for Semi-Supervised Learning

no code implementations • 8 Jul 2016 • Liansheng Zhuang, Zihan Zhou, Jingwen Yin, Shenghua Gao, Zhouchen Lin, Yi Ma, Nenghai Yu

In the literature, most existing graph-based semi-supervised learning (SSL) methods only use the label information of observed samples in the label propagation stage, while ignoring such valuable information when learning the graph.

graph construction Graph Learning

Paper
Add Code

Progressively Parsing Interactional Objects for Fine Grained Action Detection

no code implementations • CVPR 2016 • Bingbing Ni, Xiaokang Yang, Shenghua Gao

Fine grained video action analysis often requires reliable detection and tracking of various interacting objects and human body parts, denoted as interactional object parsing.

Action Analysis Action Recognition +5

Paper
Add Code

Single-Image Crowd Counting via Multi-Column Convolutional Neural Network

5 code implementations • Conference 2016 • Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, Yi Ma

To this end, we have proposed a simple but effective Multi-column Convolutional Neural Network (MCNN) architecture to map the image to its crowd density map.

Ranked #5 on Crowd Counting on Venice

Crowd Counting

491

Paper
Code

Constructing a Non-Negative Low Rank and Sparse Graph with Data-Adaptive Features

no code implementations • 3 Sep 2014 • Liansheng Zhuang, Shenghua Gao, Jinhui Tang, Jingjing Wang, Zhouchen Lin, Yi Ma

This paper aims at constructing a good graph for discovering intrinsic data structures in a semi-supervised learning setting.

graph construction

Paper
Add Code

PCANet: A Simple Deep Learning Baseline for Image Classification?

2 code implementations • 14 Apr 2014 • Tsung-Han Chan, Kui Jia, Shenghua Gao, Jiwen Lu, Zinan Zeng, Yi Ma

In this work, we propose a very simple deep learning network for image classification which comprises only the very basic data processing components: cascaded principal component analysis (PCA), binary hashing, and block-wise histograms.

Ranked #46 on Image Classification on MNIST

Classification Face Recognition +5

Paper
Code

ROML: A Robust Feature Correspondence Approach for Matching Objects in A Set of Images

no code implementations • 31 Mar 2014 • Kui Jia, Tsung-Han Chan, Zinan Zeng, Shenghua Gao, Gang Wang, Tianzhu Zhang, Yi Ma

The task is to identify the inlier features and establish their consistent correspondences across the image set.

3D Reconstruction Distributed Optimization +4

Paper
Add Code

Learning by Associating Ambiguously Labeled Images

no code implementations • CVPR 2013 • Zinan Zeng, Shijie Xiao, Kui Jia, Tsung-Han Chan, Shenghua Gao, Dong Xu, Yi Ma

Our framework is motivated by the observation that samples from the same class repetitively appear in the collection of ambiguously labeled training images, while they are just ambiguously labeled in each image.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.