Search Results for author: Hengshuang Zhao

Found 85 papers, 54 papers with code

OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation

no code implementations • 28 Mar 2024 • Zhenyu Wang, YaLi Li, Taichi Liu, Hengshuang Zhao, Shengjin Wang

Specifically, we propose the cycle-modality propagation, aimed at propagating knowledge bridging 2D and 3D modalities, to support the aforementioned functionalities.

3D Object Detection Novel Class Discovery +1

Paper
Add Code

Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

no code implementations • 22 Mar 2024 • Zheng Zhang, WenBo Hu, Yixing Lao, Tong He, Hengshuang Zhao

3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance.

Novel View Synthesis

Paper
Add Code

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation

1 code implementation • 21 Mar 2024 • Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao, Zhuotao Tian, Jiaya Jia

This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module to greatly enhance the adaptivity of sparse CNNs at minimal computational cost.

Ranked #5 on 3D Semantic Segmentation on ScanNet200

3D Semantic Segmentation LIDAR Semantic Segmentation

1,097

Paper
Code

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

1 code implementation • 14 Mar 2024 • Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bohao Peng, Hengshuang Zhao, Jiaya Jia

To address this issue, we propose GroupContrast, a novel approach that combines segment grouping and semantic-aware contrastive learning.

Contrastive Learning Representation Learning +2

Paper
Code

UniMODE: Unified Monocular 3D Object Detection

no code implementations • 28 Feb 2024 • Zhuoling Li, Xiaogang Xu, SerNam Lim, Hengshuang Zhao

To address these challenges, we build a detector based on the bird's-eye-view (BEV) detection paradigm, where the explicit feature projection is beneficial to addressing the geometry learning ambiguity when employing multiple scenarios of data to train detectors.

Monocular 3D Object Detection Object +2

Paper
Add Code

OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

no code implementations • 23 Feb 2024 • Francis Engelmann, Ayca Takmaz, Jonas Schult, Elisabetta Fedele, Johanna Wald, Songyou Peng, Xi Wang, Or Litany, Siyu Tang, Federico Tombari, Marc Pollefeys, Leonidas Guibas, Hongbo Tian, Chunjie Wang, Xiaosheng Yan, Bingwen Wang, Xuanyang Zhang, Xiao Liu, Phuc Nguyen, Khoi Nguyen, Anh Tran, Cuong Pham, Zhening Huang, Xiaoyang Wu, Xi Chen, Hengshuang Zhao, Lei Zhu, Joan Lasenby

This report provides an overview of the challenge hosted at the OpenSUN3D Workshop on Open-Vocabulary 3D Scene Understanding held in conjunction with ICCV 2023.

Scene Understanding

Paper
Add Code

Memory Consistency Guided Divide-and-Conquer Learning for Generalized Category Discovery

no code implementations • 24 Jan 2024 • Yuanpeng Tu, Zhun Zhong, Yuxi Li, Hengshuang Zhao

Generalized category discovery (GCD) aims at addressing a more realistic and challenging setting of semi-supervised learning, where only part of the category labels are assigned to certain training samples.

Contrastive Learning

Paper
Add Code

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

3 code implementations • 19 Jan 2024 • Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao

To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error.

Ranked #2 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Data Augmentation Monocular Depth Estimation +1

5,507

Paper
Code

Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases

1 code implementation • 22 Dec 2023 • Zhangyang Qi, Ye Fang, Mengchen Zhang, Zeyi Sun, Tong Wu, Ziwei Liu, Dahua Lin, Jiaqi Wang, Hengshuang Zhao

We conducted a series of structured experiments to evaluate their performance in various industrial application scenarios, offering a comprehensive perspective on their practical utility.

182

Paper
Code

Self-supervised Learning for Enhancing Geometrical Modeling in 3D-Aware Generative Adversarial Network

no code implementations • 19 Dec 2023 • Jiarong Guo, Xiaogang Xu, Hengshuang Zhao

To address this, we present a Self-Supervised Learning (SSL) technique tailored as an auxiliary loss for any 3D-GAN, designed to improve its 3D geometrical modeling capabilities.

Generative Adversarial Network Self-Supervised Learning +1

Paper
Add Code

Point Transformer V3: Simpler, Faster, Stronger

3 code implementations • 15 Dec 2023 • Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, Hengshuang Zhao

This paper is not motivated to seek innovation within the attention mechanism.

Ranked #1 on Semantic Segmentation on S3DIS (using extra training data)

3D Semantic Segmentation LIDAR Semantic Segmentation +1

1,988

Paper
Code

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

1 code implementation • 14 Dec 2023 • Jinguo Zhu, Xiaohan Ding, Yixiao Ge, Yuying Ge, Sijie Zhao, Hengshuang Zhao, Xiaohua Wang, Ying Shan

In combination with the existing text tokenizer and detokenizer, this framework allows for the encoding of interleaved image-text data into a multimodal sequence, which can subsequently be fed into the transformer model.

Image Captioning In-Context Learning +4

Paper
Code

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation

1 code implementation • NeurIPS 2023 • Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao

What we possess are numerous isolated filed-specific datasets, thus, it is appealing to jointly train models across the aggregation of datasets to enhance data volume and diversity.

Instance Segmentation Semantic Segmentation +1

Paper
Code

CorresNeRF: Image Correspondence Priors for Neural Radiance Fields

1 code implementation • NeurIPS 2023 • Yixing Lao, Xiaogang Xu, Zhipeng Cai, Xihui Liu, Hengshuang Zhao

We present CorresNeRF, a novel method that leverages image correspondence priors computed by off-the-shelf methods to supervise NeRF training.

Novel View Synthesis Surface Reconstruction

Paper
Code

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

no code implementations • 6 Dec 2023 • Yunhan Yang, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang, Hengshuang Zhao, Tong He, Xihui Liu

However, due to the lack of information from multiple views, these works encounter difficulties in generating controllable novel views.

3D Object Reconstruction Novel View Synthesis +1

Paper
Add Code

LivePhoto: Real Image Animation with Text-guided Motion Control

no code implementations • 5 Dec 2023 • Xi Chen, Zhiheng Liu, Mengting Chen, Yutong Feng, Yu Liu, Yujun Shen, Hengshuang Zhao

In particular, considering the facts that (1) text can only describe motions roughly (e. g., regardless of the moving speed) and (2) text may include both content and motion descriptions, we introduce a motion intensity estimation module as well as a text re-weighting module to reduce the ambiguity of text-to-motion mapping.

Image Animation Text-to-Video Generation +1

Paper
Add Code

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

1 code implementation • 5 Dec 2023 • Zhangyang Qi, Ye Fang, Zeyi Sun, Xiaoyang Wu, Tong Wu, Jiaqi Wang, Dahua Lin, Hengshuang Zhao

Multimodal Large Language Models (MLLMs) have excelled in 2D image-text comprehension and image generation, but their understanding of the 3D world is notably deficient, limiting progress in 3D language understanding and generation.

3D Generation Reading Comprehension

250

Paper
Code

A Lightweight Clustering Framework for Unsupervised Semantic Segmentation

no code implementations • 30 Nov 2023 • Yau Shing Jonathan Cheung, Xi Chen, Lihe Yang, Hengshuang Zhao

We thus propose a lightweight clustering framework for unsupervised semantic segmentation.

Clustering Segmentation +1

Paper
Add Code

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

no code implementations • 26 Nov 2023 • Zhihao Yuan, Jinke Ren, Chun-Mei Feng, Hengshuang Zhao, Shuguang Cui, Zhen Li

Building on this, we design a visual program that consists of three types of modules, i. e., view-independent, view-dependent, and functional modules.

Object Visual Grounding

Paper
Add Code

FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models

1 code implementation • NeurIPS 2023 • Lihe Yang, Xiaogang Xu, Bingyi Kang, Yinghuan Shi, Hengshuang Zhao

Then, we investigate the role of synthetic images by joint training with real images, or pre-training for real images.

Segmentation Semantic Segmentation

115

Paper
Code

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

2 code implementations • 12 Oct 2023 • Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang

In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway to 3D foundational models.

Ranked #2 on Semantic Segmentation on ScanNet (using extra training data)

3D Object Detection 3D Reconstruction +5

1,097

Paper
Code

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

1 code implementation • 12 Oct 2023 • Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang

In the context of autonomous driving, the significance of effective feature learning is widely acknowledged.

3D Object Detection 3D Semantic Segmentation +3

120

Paper
Code

Uni3DETR: Unified 3D Detection Transformer

1 code implementation • NeurIPS 2023 • Zhenyu Wang, YaLi Li, Xi Chen, Hengshuang Zhao, Shengjin Wang

In this paper, we propose Uni3DETR, a unified 3D detector that addresses indoor and outdoor 3D detection within the same framework.

Paper
Code

DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

no code implementations • 2 Oct 2023 • Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee. K. Wong, Zhenguo Li, Hengshuang Zhao

Multimodal large language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos.

Autonomous Driving Language Modelling +2

Paper
Add Code

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

1 code implementation • 1 Sep 2023 • Zhening Huang, Xiaoyang Wu, Xi Chen, Hengshuang Zhao, Lei Zhu, Joan Lasenby

When integrated with powerful 2D open-world models such as ODISE and GroundingDINO, excellent results were observed on open-vocabulary instance segmentation.

Ranked #1 on 3D Open-Vocabulary Object Detection on ScanNet on unseen classes

3D Open-Vocabulary Instance Segmentation 3D Open-Vocabulary Object Detection +5

Paper
Code

Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training

1 code implementation • 18 Aug 2023 • Xiaoyang Wu, Zhuotao Tian, Xin Wen, Bohao Peng, Xihui Liu, Kaicheng Yu, Hengshuang Zhao

In contrast, such privilege has not yet fully benefited 3D deep learning, mainly due to the limited availability of large-scale 3D datasets.

Ranked #3 on 3D Semantic Segmentation on ScanNet200 (using extra training data)

3D Semantic Segmentation LIDAR Semantic Segmentation +1

1,097

Paper
Code

InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping

no code implementations • 16 Aug 2023 • Zhenhua Xu, Kwan-Yee. K. Wong, Hengshuang Zhao

Vectorized high-definition (HD) maps contain detailed information about surrounding road elements, which are crucial for various downstream tasks in modern autonomous vehicles, such as motion planning and vehicle control.

Autonomous Driving Line Detection +1

Paper
Add Code

Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning

1 code implementation • ICCV 2023 • Lihe Yang, Zhen Zhao, Lei Qi, Yu Qiao, Yinghuan Shi, Hengshuang Zhao

To mitigate potentially incorrect pseudo labels, recent frameworks mostly set a fixed confidence threshold to discard uncertain samples.

Ranked #1 on Semi-Supervised Image Classification on SVHN, 40 Labels

Semi-Supervised Image Classification

Paper
Code

AnyDoor: Zero-shot Object-level Image Customization

2 code implementations • 18 Jul 2023 • Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao

This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations in a harmonious way.

Object Virtual Try-on

5,985

Paper
Code

GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping

no code implementations • 18 Jul 2023 • Zhuoling Li, Chunrui Han, Zheng Ge, Jinrong Yang, En Yu, Haoqian Wang, Hengshuang Zhao, Xiangyu Zhang

Besides, GroupLane with ResNet18 still surpasses PersFormer by 4. 9% F1 score, while the inference speed is nearly 7x faster and the FLOPs is only 13. 3% of it.

3D Lane Detection

Paper
Add Code

SAM3D: Segment Anything in 3D Scenes

1 code implementation • 6 Jun 2023 • Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu

In this work, we propose SAM3D, a novel framework that is able to predict masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB images without further training or finetuning.

Segmentation

848

Paper
Code

OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection

no code implementations • 2 Jun 2023 • Zhangyang Qi, Jiaqi Wang, Xiaoyang Wu, Hengshuang Zhao

Multi-view 3D object detection is becoming popular in autonomous driving due to its high effectiveness and low cost.

3D Object Detection Autonomous Driving +2

Paper
Add Code

LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields

1 code implementation • 20 Apr 2023 • Tang Tao, Longfei Gao, Guangrun Wang, Yixing Lao, Peng Chen, Hengshuang Zhao, Dayang Hao, Xiaodan Liang, Mathieu Salzmann, Kaicheng Yu

We address this challenge by formulating, to the best of our knowledge, the first differentiable end-to-end LiDAR rendering framework, LiDAR-NeRF, leveraging a neural radiance field (NeRF) to facilitate the joint learning of geometry and the attributes of 3D points.

3D Reconstruction Novel LiDAR View Synthesis +1

110

Paper
Code

VoxelFormer: Bird's-Eye-View Feature Generation based on Dual-view Attention for Multi-view 3D Object Detection

1 code implementation • 3 Apr 2023 • Zhuoling Li, Chuanrui Zhang, Wei-Chiu Ma, Yipin Zhou, Linyan Huang, Haoqian Wang, SerNam Lim, Hengshuang Zhao

In recent years, transformer-based detectors have demonstrated remarkable performance in 2D visual perception tasks.

3D Object Detection object-detection

Paper
Code

Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning

1 code implementation • CVPR 2023 • Xiaoyang Wu, Xin Wen, Xihui Liu, Hengshuang Zhao

As a pioneering work, PointContrast conducts unsupervised 3D representation learning via leveraging contrastive learning over raw RGB-D frames and proves its effectiveness on various downstream tasks.

Ranked #9 on Semantic Segmentation on ScanNet (val mIoU metric, using extra training data)

Contrastive Learning Data Augmentation +3

1,097

Paper
Code

Detecting Everything in the Open World: Towards Universal Object Detection

1 code implementation • CVPR 2023 • Zhenyu Wang, YaLi Li, Xi Chen, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao, Shengjin Wang

In this paper, we formally address universal object detection, which aims to detect every scene and predict every category.

object-detection Open World Object Detection +1

480

Paper
Code

Influencer Backdoor Attack on Semantic Segmentation

1 code implementation • 21 Mar 2023 • Haoheng Lan, Jindong Gu, Philip Torr, Hengshuang Zhao

In this work, we explore backdoor attacks on segmentation models to misclassify all pixels of a victim class by injecting a specific trigger on non-victim pixels during inferences, which is dubbed Influencer Backdoor Attack (IBA).

Backdoor Attack Position +2

Paper
Code

Open-vocabulary Panoptic Segmentation with Embedding Modulation

no code implementations • ICCV 2023 • Xi Chen, Shuang Li, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao

Open-vocabulary image segmentation is attracting increasing attention due to its critical applications in the real world.

Image Segmentation Open Vocabulary Panoptic Segmentation +2

Paper
Add Code

ScribbleSeg: Scribble-based Interactive Image Segmentation

no code implementations • 20 Mar 2023 • Xi Chen, Yau Shing Jonathan Cheung, Ser-Nam Lim, Hengshuang Zhao

We hope this could serve as a more powerful and general solution for interactive segmentation.

Image Segmentation Interactive Segmentation +2

Paper
Add Code

GeoSpark: Sparking up Point Cloud Segmentation with Geometry Clue

no code implementations • 14 Mar 2023 • Zhening Huang, Xiaoyang Wu, Hengshuang Zhao, Lei Zhu, Shujun Wang, Georgios Hadjidemetriou, Ioannis Brilakis

For feature aggregation, it improves feature modeling by allowing the network to learn from both local points and neighboring geometry partitions, resulting in an enlarged data-tailored receptive field.

Point Cloud Segmentation

Paper
Add Code

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation

no code implementations • 11 Mar 2023 • Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen, Hengshuang Zhao, Philip H. S. Torr

Referring image segmentation segments an image from a language expression.

Image Segmentation Object +1

Paper
Add Code

PhysFormer++: Facial Video-based Physiological Measurement with SlowFast Temporal Difference Transformer

no code implementations • 7 Feb 2023 • Zitong Yu, Yuming Shen, Jingang Shi, Hengshuang Zhao, Yawen Cui, Jiehua Zhang, Philip Torr, Guoying Zhao

As key modules in PhysFormer, the temporal difference transformers first enhance the quasi-periodic rPPG features with temporal difference guided global attention, and then refine the local spatio-temporal representation against interference.

Paper
Add Code

Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners

no code implementations • CVPR 2023 • Zitian Chen, Yikang Shen, Mingyu Ding, Zhenfang Chen, Hengshuang Zhao, Erik G. Learned-Miller, Chuang Gan

To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad').

Multi-Task Learning

Paper
Add Code

BT^2: Backward-compatible Training with Basis Transformation

1 code implementation • ICCV 2023 • Yifei Zhou, Zilu Li, Abhinav Shrivastava, Hengshuang Zhao, Antonio Torralba, Taipeng Tian, Ser-Nam Lim

In this way, the new representation can be directly compared with the old representation, in principle avoiding the need for any backfilling.

Paper
Code

Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners

no code implementations • 15 Dec 2022 • Zitian Chen, Yikang Shen, Mingyu Ding, Zhenfang Chen, Hengshuang Zhao, Erik Learned-Miller, Chuang Gan

To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad').

Multi-Task Learning

Paper
Add Code

General Adversarial Defense Against Black-box Attacks via Pixel Level and Feature Level Distribution Alignments

no code implementations • 11 Dec 2022 • Xiaogang Xu, Hengshuang Zhao, Philip Torr, Jiaya Jia

In this paper, we use Deep Generative Networks (DGNs) with a novel training mechanism to eliminate the distribution gap.

Adversarial Attack Adversarial Defense +4

Paper
Add Code

$BT^2$: Backward-compatible Training with Basis Transformation

1 code implementation • 8 Nov 2022 • Yifei Zhou, Zilu Li, Abhinav Shrivastava, Hengshuang Zhao, Antonio Torralba, Taipeng Tian, Ser-Nam Lim

In this way, the new representation can be directly compared with the old representation, in principle avoiding the need for any backfilling.

Retrieval

Paper
Code

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling

2 code implementations • 11 Oct 2022 • Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, Hengshuang Zhao

In this work, we analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transformer V2 model with novel designs that overcome the limitations of previous work.

Ranked #1 on 3D Semantic Segmentation on nuScenes

3D Point Cloud Classification 3D Semantic Segmentation +5

1,097

Paper
Code

SegPGD: An Effective and Efficient Adversarial Attack for Evaluating and Boosting Segmentation Robustness

1 code implementation • 25 Jul 2022 • Jindong Gu, Hengshuang Zhao, Volker Tresp, Philip Torr

Since SegPGD can create more effective adversarial examples, the adversarial training with our SegPGD can boost the robustness of segmentation models.

Adversarial Attack Segmentation +1

Paper
Code

DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation

1 code implementation • 20 Jul 2022 • Xin Lai, Zhuotao Tian, Xiaogang Xu, Yingcong Chen, Shu Liu, Hengshuang Zhao, LiWei Wang, Jiaya Jia

Unsupervised domain adaptation in semantic segmentation has been raised to alleviate the reliance on expensive pixel-wise annotations.

Segmentation Semantic Segmentation +2

Paper
Code

Universal Adaptive Data Augmentation

no code implementations • 14 Jul 2022 • Xiaogang Xu, Hengshuang Zhao

Different from existing methods, UADA would adaptively update DA's parameters according to the target model's gradient information during training: given a pre-defined set of DA operations, we randomly decide types and magnitudes of DA operations for every data batch during training, and adaptively update DA's parameters along the gradient direction of the loss concerning DA's parameters.

Data Augmentation Image Classification +3

Paper
Add Code

FocalClick: Towards Practical Interactive Image Segmentation

1 code implementation • CVPR 2022 • Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, Hengshuang Zhao

To make the model work with preexisting masks, we formulate a sub-task termed Interactive Mask Correction, and propose Progressive Merge as the solution.

Ranked #1 on Interactive Segmentation on GrabCut (using extra training data)

Image Segmentation Interactive Segmentation +2

174

Paper
Code

Stratified Transformer for 3D Point Cloud Segmentation

4 code implementations • CVPR 2022 • Xin Lai, Jianhui Liu, Li Jiang, LiWei Wang, Hengshuang Zhao, Shu Liu, Xiaojuan Qi, Jiaya Jia

In this paper, we propose Stratified Transformer that is able to capture long-range contexts and demonstrates strong generalization ability and high performance.

Ranked #14 on Semantic Segmentation on ScanNet

Point Cloud Segmentation Position +1

1,097

Paper
Code

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

1 code implementation • CVPR 2022 • Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen, Hengshuang Zhao, Philip H. S. Torr

Referring image segmentation is a fundamental vision-language task that aims to segment out an object referred to by a natural language expression from an image.

Ranked #3 on Generalized Referring Expression Segmentation on gRefCOCO

Generalized Referring Expression Segmentation Image Segmentation +2

168

Paper
Code

PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer

1 code implementation • CVPR 2022 • Zitong Yu, Yuming Shen, Jingang Shi, Hengshuang Zhao, Philip Torr, Guoying Zhao

Remote photoplethysmography (rPPG), which aims at measuring heart activities and physiological signals from facial video without any contact, has great potential in many applications (e. g., remote healthcare and affective computing).

Paper
Code

Hierarchical interaction network for video object segmentation from referring expressions

no code implementations • British Machine Vision Conference 2021 • Zhao Yang, Yansong Tang, Luca Bertinetto, Hengshuang Zhao, Philip Torr

In this paper, we investigate the problem of video object segmentation from referring expressions (VOSRE).

Ranked #1 on Referring Expression Segmentation on J-HMDB (Precision@0.9 metric)

Optical Flow Estimation Referring Expression Segmentation +3

Paper
Add Code

Adversarial Examples on Segmentation Models Can be Easy to Transfer

no code implementations • 22 Nov 2021 • Jindong Gu, Hengshuang Zhao, Volker Tresp, Philip Torr

The high transferability achieved by our method shows that, in contrast to the observations in previous work, adversarial examples on a segmentation model can be easy to transfer to other segmentation models.

Adversarial Robustness Attribute +5

Paper
Add Code

Fully Convolutional Networks for Panoptic Segmentation with Point-based Supervision

1 code implementation • 17 Aug 2021 • Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, Yukang Chen, Lu Qi, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia

In particular, Panoptic FCN encodes each object instance or stuff category with the proposed kernel generator and produces the prediction by convolving the high-resolution feature directly.

Panoptic Segmentation Segmentation +1

388

Paper
Code

Open-World Entity Segmentation

2 code implementations • 29 Jul 2021 • Lu Qi, Jason Kuen, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia

By removing the need of class label prediction, the models trained for such task can focus more on improving segmentation quality.

Image Manipulation Image Segmentation +2

656

Paper
Code

Do Different Tracking Tasks Require Different Appearance Models?

1 code implementation • NeurIPS 2021 • Zhongdao Wang, Hengshuang Zhao, Ya-Li Li, Shengjin Wang, Philip H. S. Torr, Luca Bertinetto

We show how most tracking tasks can be solved within this framework, and that the same appearance model can be successfully used to obtain results that are competitive against specialised methods for most of the tasks considered.

Ranked #2 on Video Object Segmentation on DAVIS 2017 (mIoU metric)

Multi-Object Tracking Multi-Object Tracking and Segmentation +10

335

Paper
Code

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency

2 code implementations • CVPR 2021 • Xin Lai, Zhuotao Tian, Li Jiang, Shu Liu, Hengshuang Zhao, LiWei Wang, Jiaya Jia

Semantic segmentation has made tremendous progress in recent years.

Semi-Supervised Semantic Segmentation

177

Paper
Code

Dual-Cross Central Difference Network for Face Anti-Spoofing

1 code implementation • 4 May 2021 • Zitong Yu, Yunxiao Qin, Hengshuang Zhao, Xiaobai Li, Guoying Zhao

In this paper, we propose two Cross Central Difference Convolutions (C-CDC), which exploit the difference of the center and surround sparse local features from the horizontal/vertical and diagonal directions, respectively.

Face Anti-Spoofing Face Recognition

545

Paper
Code

Distilling Knowledge via Knowledge Review

7 code implementations • CVPR 2021 • Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia

Knowledge distillation transfers knowledge from the teacher network to the student one, with the goal of greatly improving the performance of the student network.

Ranked #12 on Knowledge Distillation on CIFAR-100

Instance Segmentation Knowledge Distillation +3

1,258

Paper
Code

Bidirectional Projection Network for Cross Dimension Scene Understanding

1 code implementation • CVPR 2021 • WenBo Hu, Hengshuang Zhao, Li Jiang, Jiaya Jia, Tien-Tsin Wong

Via the \emph{BPM}, complementary 2D and 3D information can interact with each other in multiple architectural levels, such that advantages in these two visual domains can be combined for better scene recognition.

Ranked #11 on Semantic Segmentation on ScanNet

2D Semantic Segmentation 3D Semantic Segmentation +3

163

Paper
Code

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

2 code implementations • CVPR 2021 • Mutian Xu, Runyu Ding, Hengshuang Zhao, Xiaojuan Qi

The key of PAConv is to construct the convolution kernel by dynamically assembling basic weight matrices stored in Weight Bank, where the coefficients of these weight matrices are self-adaptively learned from point positions through ScoreNet.

Ranked #2 on Point Cloud Segmentation on PointCloud-C

3D Point Cloud Classification Point Cloud Classification +2

529

Paper
Code

General Adversarial Defense via Pixel Level and Feature Level Distribution Alignment

no code implementations • 1 Jan 2021 • Xiaogang Xu, Hengshuang Zhao, Philip Torr, Jiaya Jia

Specifically, compared with previous methods, we propose a more efficient pixel-level training constraint to weaken the hardness of aligning adversarial samples to clean samples, which can thus obviously enhance the robustness on adversarial samples.

Adversarial Defense Image Classification +3

Paper
Add Code

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

5 code implementations • CVPR 2021 • Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H. S. Torr, Li Zhang

In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task.

Ranked #2 on Semantic Segmentation on FoodSeg103 (using extra training data)

Medical Image Segmentation Segmentation +1

8,216

Paper
Code

Point Transformer

24 code implementations • ICCV 2021 • Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun

For example, on the challenging S3DIS dataset for large-scale semantic scene segmentation, the Point Transformer attains an mIoU of 70. 4% on Area 5, outperforming the strongest prior model by 3. 3 absolute percentage points and crossing the 70% mIoU threshold for the first time.

Ranked #3 on 3D Semantic Segmentation on STPLS3D

3D Part Segmentation 3D Point Cloud Classification +8

1,654

Paper
Code

Fully Convolutional Networks for Panoptic Segmentation

6 code implementations • CVPR 2021 • Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia

In this paper, we present a conceptually simple, strong, and efficient framework for panoptic segmentation, called Panoptic FCN.

Ranked #1 on Panoptic Segmentation on COCO minival (SQ metric)

Panoptic Segmentation Segmentation

388

Paper
Code

Generalized Few-shot Semantic Segmentation

1 code implementation • CVPR 2022 • Zhuotao Tian, Xin Lai, Li Jiang, Shu Liu, Michelle Shu, Hengshuang Zhao, Jiaya Jia

Then, since context is essential for semantic segmentation, we propose the Context-Aware Prototype Learning (CAPL) that significantly improves performance by 1) leveraging the co-occurrence prior knowledge from support samples, and 2) dynamically enriching contextual information to the classifier, conditioned on the content of each query image.

Ranked #3 on Generalized Few-Shot Semantic Segmentation on COCO-20i (1-shot)

Generalized Few-Shot Semantic Segmentation Segmentation +1

Paper
Code

Prior Guided Feature Enrichment Network for Few-Shot Segmentation

3 code implementations • 4 Aug 2020 • Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Zhicheng Yang, Ruiyu Li, Jiaya Jia

It consists of novel designs of (1) a training-free prior mask generation method that not only retains generalization power but also improves model performance and (2) Feature Enrichment Module (FEM) that overcomes spatial inconsistency by adaptively enriching query features with support features and prior masks.

Ranked #63 on Few-Shot Semantic Segmentation on COCO-20i (1-shot)

Few-Shot Semantic Segmentation Semantic Segmentation

293

Paper
Code

Exploring Self-attention for Image Recognition

1 code implementation • CVPR 2020 • Hengshuang Zhao, Jiaya Jia, Vladlen Koltun

Recent work has shown that self-attention can serve as a basic building block for image recognition models.

746

Paper
Code

PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

2 code implementations • CVPR 2020 • Li Jiang, Hengshuang Zhao, Shaoshuai Shi, Shu Liu, Chi-Wing Fu, Jiaya Jia

Instance segmentation is an important task for scene understanding.

Ranked #5 on 3D Instance Segmentation on STPLS3D

3D Instance Segmentation Clustering +3

1,097

Paper
Code

Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation

1 code implementation • ICCV 2021 • Xiaogang Xu, Hengshuang Zhao, Jiaya Jia

Adversarial training is promising for improving robustness of deep neural networks towards adversarial perturbations, especially on the classification task.

Segmentation Semantic Segmentation

Paper
Code

GridMask Data Augmentation

7 code implementations • 13 Jan 2020 • Pengguang Chen, Shu Liu, Hengshuang Zhao, Xingquan Wang, Jiaya Jia

Then we show limitation of existing information dropping algorithms and propose our structured method, which is simple and yet very effective.

Data Augmentation object-detection +4

5,245

Paper
Code

Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation

no code implementations • ICCV 2019 • Li Jiang, Hengshuang Zhao, Shu Liu, Xiaoyong Shen, Chi-Wing Fu, Jiaya Jia

To incorporate point features in the edge branch, we establish a hierarchical graph framework, where the graph is initialized from a coarse layer and gradually enriched along the point decoding process.

Ranked #41 on Semantic Segmentation on S3DIS Area5

Scene Labeling Semantic Segmentation

Paper
Add Code

Region Refinement Network for Salient Object Detection

no code implementations • 27 Jun 2019 • Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Jiaze Wang, Ruiyu Li, Xiaoyong Shen, Jiaya Jia

Albeit intensively studied, false prediction and unclear boundaries are still major issues of salient object detection.

Object object-detection +5

Paper
Add Code

PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing

1 code implementation • CVPR 2019 • Hengshuang Zhao, Li Jiang, Chi-Wing Fu, Jiaya Jia

Unlike previous work, we densely connect each point with every other in a local neighborhood, aiming to specify feature of each point based on the local region characteristics for better representing the region.

Ranked #2 on Semantic Segmentation on S3DIS Area5 (Number of params metric)

3D Point Cloud Classification General Classification +3

191

Paper
Code

UPSNet: A Unified Panoptic Segmentation Network

1 code implementation • CVPR 2019 • Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun

More importantly, we introduce a parameter-free panoptic head which solves the panoptic segmentation via pixel-wise classification.

Ranked #3 on Panoptic Segmentation on Indian Driving Dataset

Instance Segmentation Panoptic Segmentation +1

639

Paper
Code

Compositing-aware Image Search

no code implementations • ECCV 2018 • Hengshuang Zhao, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Brian Price, Jiaya Jia

We present a new image search technique that, given a background image, returns compatible foreground objects for image compositing tasks.

Image Retrieval Object

Paper
Add Code

PSANet: Point-wise Spatial Attention Network for Scene Parsing

4 code implementations • ECCV 2018 • Hengshuang Zhao, Yi Zhang, Shu Liu, Jianping Shi, Chen Change Loy, Dahua Lin, Jiaya Jia

We notice information flow in convolutional neural networks is restricted inside local neighborhood regions due to the physical design of convolutional filters, which limits the overall understanding of complex scenes.

Ranked #51 on Semantic Segmentation on Cityscapes test

Position Scene Parsing +1

7,353

Paper
Code

SegStereo: Exploiting Semantic Information for Disparity Estimation

no code implementations • ECCV 2018 • Guorun Yang, Hengshuang Zhao, Jianping Shi, Zhidong Deng, Jiaya Jia

Disparity estimation for binocular stereo images finds a wide range of applications.

Ranked #6 on Semantic Segmentation on KITTI Semantic Segmentation

Disparity Estimation Semantic Segmentation

Paper
Add Code

Automatic Real-time Background Cut for Portrait Videos

no code implementations • 28 Apr 2017 • Xiaoyong Shen, RuiXing Wang, Hengshuang Zhao, Jiaya Jia

A spatial-temporal refinement network is developed to further refine the segmentation errors in each frame and ensure temporal coherence in the segmentation map.

Segmentation Semantic Segmentation +2