Search Results for author: Jiaya Jia

Found 172 papers, 96 papers with code

Memory Selection Network for Video Propagation

no code implementations ECCV 2020 Ruizheng Wu, Huaijia Lin, Xiaojuan Qi, Jiaya Jia

Video propagation is a fundamental problem in video processing where guidance frame predictions are propagated to guide predictions of the target frame.

Colorization Semantic Segmentation +3

Particularity beyond Commonality: Unpaired Identity Transfer with Multiple References

no code implementations ECCV 2020 Ruizheng Wu, Xin Tao, Ying-Cong Chen, Xiaoyong Shen, Jiaya Jia

Unpaired image-to-image translation aims to translate images from the source class to target one by providing sufficient data for these classes.

Image-to-Image Translation Translation

CN: Channel Normalization For Point Cloud Recognition

no code implementations ECCV 2020 Zetong Yang, Yanan sun, Shu Liu, Xiaojuan Qi, Jiaya Jia

In 3D recognition, to fuse multi-scale structure information, existing methods apply hierarchical frameworks stacked by multiple fusion layers for integrating current relative locations with structure information from the previous level.

RL-GPT: Integrating Reinforcement Learning and Code-as-policy

no code implementations29 Feb 2024 Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia

To seamlessly integrate both modalities, we introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent.

reinforcement-learning Reinforcement Learning (RL)

VLPose: Bridging the Domain Gap in Pose Estimation with Language-Vision Tuning

no code implementations22 Feb 2024 Jingyao Li, Pengguang Chen, Xuan Ju, Hong Xu, Jiaya Jia

Our research aims to bridge the domain gap between natural and artificial scenarios with efficient tuning strategies.

Pose Estimation

MOODv2: Masked Image Modeling for Out-of-Distribution Detection

no code implementations5 Jan 2024 Jingyao Li, Pengguang Chen, Shaozuo Yu, Shu Liu, Jiaya Jia

The crux of effective out-of-distribution (OOD) detection lies in acquiring a robust in-distribution (ID) representation, distinct from OOD samples.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

MR-GSM8K: A Meta-Reasoning Revolution in Large Language Model Evaluation

2 code implementations28 Dec 2023 Zhongshen Zeng, Pengguang Chen, Shu Liu, Haiyun Jiang, Jiaya Jia

In this work, we introduce a novel evaluation paradigm for Large Language Models, one that challenges them to engage in meta-reasoning.

GSM8K Language Modelling +2

LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model

no code implementations28 Dec 2023 Senqiao Yang, Tianyuan Qu, Xin Lai, Zhuotao Tian, Bohao Peng, Shu Liu, Jiaya Jia

While LISA effectively bridges the gap between segmentation and large language models to enable reasoning segmentation, it poses certain limitations: unable to distinguish different instances of the target region, and constrained by the pre-defined textual response formats.

Instance Segmentation Language Modelling +3

BAL: Balancing Diversity and Novelty for Active Learning

1 code implementation26 Dec 2023 Jingyao Li, Pengguang Chen, Shaozuo Yu, Shu Liu, Jiaya Jia

Experimental results demonstrate that, when labeling 80% of the samples, the performance of the current SOTA method declines by 0. 74%, whereas our proposed BAL achieves performance comparable to the full dataset.

Active Learning Self-Supervised Learning

MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks

1 code implementation26 Dec 2023 Jingyao Li, Pengguang Chen, Jiaya Jia

Large Language Models (LLMs) have showcased impressive capabilities in handling straightforward programming tasks.

 Ranked #1 on Code Generation on CodeContests (Test Set pass@1 metric)

Code Generation

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

1 code implementation7 Dec 2023 Yuechen Zhang, Shengju Qian, Bohao Peng, Shu Liu, Jiaya Jia

Without tuning on LLaVA-v1. 5, our method secured 69. 5 in the MMBench test and 1552. 5 in MME-perception.

Text Generation

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

2 code implementations28 Nov 2023 Yanwei Li, Chengyao Wang, Jiaya Jia

Current VLMs, while proficient in tasks like image captioning and visual question answering, face computational burdens when processing long videos due to the excessive visual tokens.

Image Captioning Video-based Generative Performance Benchmarking +2

LLMGA: Multimodal Large Language Model based Generation Assistant

1 code implementation27 Nov 2023 Bin Xia, Shiyin Wang, Yingfan Tao, Yitong Wang, Jiaya Jia

In the first stage, we train the MLLM to grasp the properties of image generation and editing, enabling it to generate detailed prompts.

Image Generation Language Modelling +4

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

2 code implementations21 Sep 2023 Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia

For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048.

Instruction Following Question Answering +1

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

1 code implementation8 Aug 2023 Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar, Jiaya Jia, Jose Alvarez

For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.

3D Object Detection Autonomous Driving +2

Hierarchical Dense Correlation Distillation for Few-Shot Segmentation-Extended Abstract

no code implementations27 Jun 2023 Bohao Peng, Zhuotao Tian, Xiaoyang Wu, Chengyao Wang, Shu Liu, Jingyong Su, Jiaya Jia

We hope our work can benefit broader industrial applications where novel classes with limited annotations are required to be decently identified.

Few-Shot Semantic Segmentation Segmentation +2

Real-World Image Variation by Aligning Diffusion Inversion Chain

2 code implementations NeurIPS 2023 Yuechen Zhang, Jinbo Xing, Eric Lo, Jiaya Jia

Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.

Image-Variation Semantic Similarity +2

Self-supervised Learning by View Synthesis

no code implementations22 Apr 2023 Shaoteng Liu, Xiangyu Zhang, Tao Hu, Jiaya Jia

In each iteration, the input to VSA is one view (or multiple views) of a 3D object and the output is a synthesized image in another target pose.

3D Classification Self-Supervised Learning

TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation

no code implementations15 Apr 2023 Jingyao Li, Pengguang Chen, Shengju Qian, Jiaya Jia

However, existing models easily misidentify input pixels from unseen classes, thus confusing novel classes with semantically-similar ones.

Language Modelling Open Vocabulary Semantic Segmentation +2

Point2Pix: Photo-Realistic Point Cloud Rendering via Neural Radiance Fields

no code implementations CVPR 2023 Tao Hu, Xiaogang Xu, Shu Liu, Jiaya Jia

Also, we present Point Encoding to build Multi-scale Radiance Fields that provide discriminative 3D point features.

valid

TriVol: Point Cloud Rendering via Triple Volumes

1 code implementation CVPR 2023 Tao Hu, Xiaogang Xu, Ruihang Chu, Jiaya Jia

However, artifacts still appear in rendered images, due to the challenges in extracting continuous and discriminative 3D features from point clouds.

Spherical Transformer for LiDAR-based 3D Recognition

2 code implementations CVPR 2023 Xin Lai, Yukang Chen, Fanbin Lu, Jianhui Liu, Jiaya Jia

In this work, we study the varying-sparsity distribution of LiDAR points and present SphereFormer to directly aggregate information from dense close points to the sparse distant ones.

3D Object Detection 3D Semantic Segmentation +3

Learning Context-aware Classifier for Semantic Segmentation

2 code implementations21 Mar 2023 Zhuotao Tian, Jiequan Cui, Li Jiang, Xiaojuan Qi, Xin Lai, Yixin Chen, Shu Liu, Jiaya Jia

Semantic segmentation is still a challenging task for parsing diverse contexts in different scenes, thus the fixed classifier might not be able to well address varying feature distributions during testing.

Segmentation Semantic Segmentation

Video-P2P: Video Editing with Cross-attention Control

1 code implementation8 Mar 2023 Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia

This paper presents Video-P2P, a novel framework for real-world video editing with cross-attention control.

Image Generation Video Editing +1

StraIT: Non-autoregressive Generation with Stratified Image Transformer

no code implementations1 Mar 2023 Shengju Qian, Huiwen Chang, Yuanzhen Li, Zizhao Zhang, Jiaya Jia, Han Zhang

We propose Stratified Image Transformer(StraIT), a pure non-autoregressive(NAR) generative model that demonstrates superiority in high-quality image synthesis over existing autoregressive(AR) and diffusion models(DMs).

Image Generation

Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need

1 code implementation CVPR 2023 Jingyao Li, Pengguang Chen, Shaozuo Yu, Zexin He, Shu Liu, Jiaya Jia

The core of out-of-distribution (OOD) detection is to learn the in-distribution (ID) representation, which is distinguishable from OOD samples.

Out-of-Distribution Detection

Understanding Imbalanced Semantic Segmentation Through Neural Collapse

2 code implementations CVPR 2023 Zhisheng Zhong, Jiequan Cui, Yibo Yang, Xiaoyang Wu, Xiaojuan Qi, Xiangyu Zhang, Jiaya Jia

Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers.

3D Semantic Segmentation Segmentation

FocalFormer3D: Focusing on Hard Instance for 3D Object Detection

1 code implementation ICCV 2023 Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Anima Anandkumar, Jiaya Jia, Jose M. Alvarez

For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.

3D Object Detection Autonomous Driving +2

Command-Driven Articulated Object Understanding and Manipulation

no code implementations CVPR 2023 Ruihang Chu, Zhengzhe Liu, Xiaoqing Ye, Xiao Tan, Xiaojuan Qi, Chi-Wing Fu, Jiaya Jia

The key of Cart is to utilize the prediction of object structures to connect visual observations with user commands for effective manipulations.

motion prediction Object

Removing Anomalies as Noises for Industrial Defect Localization

no code implementations ICCV 2023 Fanbin Lu, Xufeng Yao, Chi-Wing Fu, Jiaya Jia

Our denoising model outperforms the state-of-the-art reconstruction-based anomaly detection methods for precise anomaly localization and high-quality normal image reconstruction on the MVTec-AD benchmark.

Denoising Image Reconstruction +1

High Quality Entity Segmentation

no code implementations ICCV 2023 Lu Qi, Jason Kuen, Tiancheng Shen, Jiuxiang Gu, Wenbo Li, Weidong Guo, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang

Given the high-quality and -resolution nature of the dataset, we propose CropFormer which is designed to tackle the intractability of instance-level segmentation on high-resolution images.

Image Segmentation Segmentation +1

What Makes for Good Tokenizers in Vision Transformer?

no code implementations21 Dec 2022 Shengju Qian, Yi Zhu, Wenbo Li, Mu Li, Jiaya Jia

The architecture of transformers, which recently witness booming applications in vision tasks, has pivoted against the widespread convolutional paradigm.

Image Inpainting via Iteratively Decoupled Probabilistic Modeling

2 code implementations6 Dec 2022 Wenbo Li, Xin Yu, Kun Zhou, Yibing Song, Zhe Lin, Jiaya Jia

To achieve high-quality results with low computational cost, we present a novel pixel spread model (PSM) that iteratively employs decoupled probabilistic modeling, combining the optimization efficiency of GANs with the prediction tractability of probabilistic models.

Denoising Image Inpainting

High-Quality Entity Segmentation

1 code implementation10 Nov 2022 Lu Qi, Jason Kuen, Weidong Guo, Tiancheng Shen, Jiuxiang Gu, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang

It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.

Image Segmentation Segmentation +2

Generalized Parametric Contrastive Learning

4 code implementations26 Sep 2022 Jiequan Cui, Zhisheng Zhong, Zhuotao Tian, Shu Liu, Bei Yu, Jiaya Jia

Based on theoretical analysis, we observe that supervised contrastive loss tends to bias high-frequency classes and thus increases the difficulty of imbalanced learning.

Contrastive Learning Domain Generalization +3

End-to-end View Synthesis via NeRF Attention

no code implementations29 Jul 2022 Zelin Zhao, Jiaya Jia

On the one hand, NeRFA considers the volumetric rendering equation as a soft feature modulation procedure.

Inductive Bias Novel View Synthesis

DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation

1 code implementation20 Jul 2022 Xin Lai, Zhuotao Tian, Xiaogang Xu, Yingcong Chen, Shu Liu, Hengshuang Zhao, LiWei Wang, Jiaya Jia

Unsupervised domain adaptation in semantic segmentation has been raised to alleviate the reliance on expensive pixel-wise annotations.

Segmentation Semantic Segmentation +2

Tracking Objects as Pixel-wise Distributions

1 code implementation12 Jul 2022 Zelin Zhao, Ze Wu, Yueqing Zhuang, Boxun Li, Jiaya Jia

During inference, a pixel-wise association procedure is proposed to recover object connections through frames based on the pixel-wise prediction.

Multi-Object Tracking Object

Deep Parametric 3D Filters for Joint Video Denoising and Illumination Enhancement in Video Super Resolution

1 code implementation5 Jul 2022 Xiaogang Xu, RuiXing Wang, Chi-Wing Fu, Jiaya Jia

Despite the quality improvement brought by the recent methods, video super-resolution (SR) is still very challenging, especially for videos that are low-light and noisy.

Denoising Video Denoising +1

Towards Real-World Video Denosing: A Practical Video Denosing Dataset and Network

no code implementations4 Jul 2022 Xiaogang Xu, Yitong Yu, Nianjuan Jiang, Jiangbo Lu, Bei Yu, Jiaya Jia

Moreover, we also propose a new video denoising framework, called Recurrent Video Denoising Transformer (RVDT), which can achieve SOTA performance on PVDD and other current video denoising benchmarks.

Denoising Video Denoising

EfficientNeRF: Efficient Neural Radiance Fields

1 code implementation2 Jun 2022 Tao Hu, Shu Liu, Yilun Chen, Tiancheng Shen, Jiaya Jia

Neural Radiance Fields (NeRF) has been wildly applied to various tasks for its high-quality representation of 3D scenes.

valid

Unifying Voxel-based Representation with Transformer for 3D Object Detection

1 code implementation1 Jun 2022 Yanwei Li, Yilun Chen, Xiaojuan Qi, Zeming Li, Jian Sun, Jiaya Jia

To this end, the modality-specific space is first designed to represent different inputs in the voxel feature space.

3D Object Detection Object +3

Voxel Field Fusion for 3D Object Detection

1 code implementation CVPR 2022 Yanwei Li, Xiaojuan Qi, Yukang Chen, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia

In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion.

3D Object Detection Data Augmentation +2

Video Frame Interpolation with Transformer

1 code implementation CVPR 2022 Liying Lu, Ruizheng Wu, Huaijia Lin, Jiangbo Lu, Jiaya Jia

Video frame interpolation (VFI), which aims to synthesize intermediate frames of a video, has made remarkable progress with development of deep convolutional networks over past years.

Video Frame Interpolation

Focal Sparse Convolutional Networks for 3D Object Detection

2 code implementations CVPR 2022 Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun, Jiaya Jia

In this paper, we introduce two new modules to enhance the capability of Sparse CNNs, both are based on making feature sparsity learnable with position-wise importance prediction.

3D Object Detection Object +1

DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors

1 code implementation6 Apr 2022 Yilun Chen, Shijia Huang, Shu Liu, Bei Yu, Jiaya Jia

First, to effectively lift the 2D information to stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser connections and extracts depth-guided features.

3D Object Detection From Stereo Images Relation

Multi-View Transformer for 3D Visual Grounding

1 code implementation CVPR 2022 Shijia Huang, Yilun Chen, Jiaya Jia, LiWei Wang

The multi-view space enables the network to learn a more robust multi-modal representation for 3D visual grounding and eliminates the dependence on specific views.

Visual Grounding

Stratified Transformer for 3D Point Cloud Segmentation

4 code implementations CVPR 2022 Xin Lai, Jianhui Liu, Li Jiang, LiWei Wang, Hengshuang Zhao, Shu Liu, Xiaojuan Qi, Jiaya Jia

In this paper, we propose Stratified Transformer that is able to capture long-range contexts and demonstrates strong generalization ability and high performance.

Point Cloud Segmentation Position +1

Rebalanced Siamese Contrastive Mining for Long-Tailed Recognition

2 code implementations22 Mar 2022 Zhisheng Zhong, Jiequan Cui, Zeming Li, Eric Lo, Jian Sun, Jiaya Jia

Given the promising performance of contrastive learning, we propose Rebalanced Siamese Contrastive Mining (ResCom) to tackle imbalanced recognition.

Contrastive Learning Long-tail Learning +1

A Unified Query-based Paradigm for Point Cloud Understanding

1 code implementation CVPR 2022 Zetong Yang, Li Jiang, Yanan sun, Bernt Schiele, Jiaya Jia

This is achieved by introducing an intermediate representation, i. e., Q-representation, in the querying stage to serve as a bridge between the embedding stage and task heads.

Autonomous Driving object-detection +2

SEA: Bridging the Gap Between One- and Two-stage Detector Distillation via SEmantic-aware Alignment

no code implementations2 Mar 2022 Yixin Chen, Zhuotao Tian, Pengguang Chen, Shu Liu, Jiaya Jia

We revisit the one- and two-stage detector distillation tasks and present a simple and efficient semantic-aware framework to fill the gap between them.

Instance Segmentation object-detection +2

SNR-Aware Low-Light Image Enhancement

1 code implementation CVPR 2022 Xiaogang Xu, RuiXing Wang, Chi-Wing Fu, Jiaya Jia

They are long-range operations for image regions of extremely low Signal-to-Noise-Ratio (SNR) and short-range operations for other regions.

Low-Light Image Enhancement

EfficientNeRF Efficient Neural Radiance Fields

no code implementations CVPR 2022 Tao Hu, Shu Liu, Yilun Chen, Tiancheng Shen, Jiaya Jia

Neural Radiance Fields (NeRF) has been wildly applied to various tasks for its high-quality representation of 3D scenes.

valid

On Efficient Transformer-Based Image Pre-training for Low-Level Vision

1 code implementation19 Dec 2021 Wenbo Li, Xin Lu, Shengju Qian, Jiangbo Lu, Xiangyu Zhang, Jiaya Jia

Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems.

Ranked #5 on Image Super-Resolution on Set5 - 2x upscaling (using extra training data)

Denoising Image Super-Resolution

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation

1 code implementation9 Dec 2021 Lu Qi, Jason Kuen, Zhe Lin, Jiuxiang Gu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen, Ming-Hsuan Yang, Jiaya Jia

To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data.

object-detection Object Detection +2

Blending Anti-Aliasing into Vision Transformer

no code implementations NeurIPS 2021 Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia

In this work, we analyze the uncharted problem of aliasing in vision transformer and explore to incorporate anti-aliasing properties.

Guided Point Contrastive Learning for Semi-supervised Point Cloud Semantic Segmentation

1 code implementation ICCV 2021 Li Jiang, Shaoshuai Shi, Zhuotao Tian, Xin Lai, Shu Liu, Chi-Wing Fu, Jiaya Jia

To address the high cost and challenges of 3D point-level labeling, we present a method for semi-supervised point cloud semantic segmentation to adopt unlabeled point clouds in training to boost the model performance.

3D Semantic Segmentation Contrastive Learning +1

Deep Structured Instance Graph for Distilling Object Detectors

1 code implementation ICCV 2021 Yixin Chen, Pengguang Chen, Shu Liu, LiWei Wang, Jiaya Jia

Effectively structuring deep knowledge plays a pivotal role in transfer from teacher to student, especially in semantic vision tasks.

Instance Segmentation Knowledge Distillation +5

Image Synthesis via Semantic Composition

no code implementations ICCV 2021 Yi Wang, Lu Qi, Ying-Cong Chen, Xiangyu Zhang, Jiaya Jia

In this paper, we present a novel approach to synthesize realistic images based on their semantic layouts.

Image Generation Semantic Composition

Exploring and Improving Mobile Level Vision Transformers

no code implementations30 Aug 2021 Pengguang Chen, Yixin Chen, Shu Liu, MingChang Yang, Jiaya Jia

We analyze the reason behind this phenomenon, and propose a novel irregular patch embedding module and adaptive patch fusion module to improve the performance.

Fully Convolutional Networks for Panoptic Segmentation with Point-based Supervision

1 code implementation17 Aug 2021 Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, Yukang Chen, Lu Qi, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia

In particular, Panoptic FCN encodes each object instance or stuff category with the proposed kernel generator and produces the prediction by convolving the high-resolution feature directly.

Panoptic Segmentation Segmentation +1

Conditional Temporal Variational AutoEncoder for Action Video Prediction

no code implementations12 Aug 2021 Xiaogang Xu, Yi Wang, LiWei Wang, Bei Yu, Jiaya Jia

To synthesize a realistic action sequence based on a single human image, it is crucial to model both motion patterns and diversity in the action video.

motion prediction Video Prediction

Open-World Entity Segmentation

2 code implementations29 Jul 2021 Lu Qi, Jason Kuen, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia

By removing the need of class label prediction, the models trained for such task can focus more on improving segmentation quality.

Image Manipulation Image Segmentation +2

Self-Supervised 3D Mesh Reconstruction From Single Images

no code implementations CVPR 2021 Tao Hu, LiWei Wang, Xiaogang Xu, Shu Liu, Jiaya Jia

Recent single-view 3D reconstruction methods reconstruct object's shape and texture from a single image with only 2D image-level annotation.

3D Reconstruction Attribute +2

LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-Resolution and Beyond

2 code implementations NeurIPS 2020 Wenbo Li, Kun Zhou, Lu Qi, Nianjuan Jiang, Jiangbo Lu, Jiaya Jia

Single image super-resolution (SISR) deals with a fundamental problem of upsampling a low-resolution (LR) image to its high-resolution (HR) version.

Image Deblocking Image Denoising +2

Distilling Knowledge via Knowledge Review

7 code implementations CVPR 2021 Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia

Knowledge distillation transfers knowledge from the teacher network to the student one, with the goal of greatly improving the performance of the student network.

Instance Segmentation Knowledge Distillation +3

Improving Calibration for Long-Tailed Recognition

5 code implementations CVPR 2021 Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia

Motivated by the fact that predicted probability distributions of classes are highly related to the numbers of class instances, we propose label-aware smoothing to deal with different degrees of over-confidence for classes and improve classifier learning.

Long-tail Learning Representation Learning

Best-Buddy GANs for Highly Detailed Image Super-Resolution

2 code implementations29 Mar 2021 Wenbo Li, Kun Zhou, Lu Qi, Liying Lu, Nianjuan Jiang, Jiangbo Lu, Jiaya Jia

We consider the single image super-resolution (SISR) problem, where a high-resolution (HR) image is generated based on a low-resolution (LR) input.

Image Super-Resolution

Bidirectional Projection Network for Cross Dimension Scene Understanding

1 code implementation CVPR 2021 WenBo Hu, Hengshuang Zhao, Li Jiang, Jiaya Jia, Tien-Tsin Wong

Via the \emph{BPM}, complementary 2D and 3D information can interact with each other in multiple architectural levels, such that advantages in these two visual domains can be combined for better scene recognition.

2D Semantic Segmentation 3D Semantic Segmentation +3

Video Instance Segmentation with a Propose-Reduce Paradigm

1 code implementation ICCV 2021 Huaijia Lin, Ruizheng Wu, Shu Liu, Jiangbo Lu, Jiaya Jia

Video instance segmentation (VIS) aims to segment and associate all instances of predefined classes for each frame in videos.

Instance Segmentation Segmentation +3

ResLT: Residual Learning for Long-tailed Recognition

5 code implementations26 Jan 2021 Jiequan Cui, Shu Liu, Zhuotao Tian, Zhisheng Zhong, Jiaya Jia

From this perspective, the trivial solution utilizes different branches for the head, medium, and tail classes respectively, and then sums their outputs as the final results is not feasible.

Long-tail Learning

General Adversarial Defense via Pixel Level and Feature Level Distribution Alignment

no code implementations1 Jan 2021 Xiaogang Xu, Hengshuang Zhao, Philip Torr, Jiaya Jia

Specifically, compared with previous methods, we propose a more efficient pixel-level training constraint to weaken the hardness of aligning adversarial samples to clean samples, which can thus obviously enhance the robustness on adversarial samples.

Adversarial Defense Image Classification +3

Point Transformer

23 code implementations ICCV 2021 Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun

For example, on the challenging S3DIS dataset for large-scale semantic scene segmentation, the Point Transformer attains an mIoU of 70. 4% on Area 5, outperforming the strongest prior model by 3. 3 absolute percentage points and crossing the 70% mIoU threshold for the first time.

3D Part Segmentation 3D Point Cloud Classification +8

GeoNet++: Iterative Geometric Neural Network with Edge-Aware Refinement for Joint Depth and Surface Normal Estimation

2 code implementations13 Dec 2020 Xiaojuan Qi, Zhengzhe Liu, Renjie Liao, Philip H. S. Torr, Raquel Urtasun, Jiaya Jia

Note that GeoNet++ is generic and can be used in other depth/normal prediction frameworks to improve the quality of 3D reconstruction and pixel-wise accuracy of depth and surface normals.

3D Reconstruction Depth Estimation +2

Fully Convolutional Networks for Panoptic Segmentation

6 code implementations CVPR 2021 Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia

In this paper, we present a conceptually simple, strong, and efficient framework for panoptic segmentation, called Panoptic FCN.

Panoptic Segmentation Segmentation

Learnable Boundary Guided Adversarial Training

3 code implementations ICCV 2021 Jiequan Cui, Shu Liu, LiWei Wang, Jiaya Jia

Previous adversarial training raises model robustness under the compromise of accuracy on natural data.

Adversarial Defense

Generalized Few-shot Semantic Segmentation

1 code implementation CVPR 2022 Zhuotao Tian, Xin Lai, Li Jiang, Shu Liu, Michelle Shu, Hengshuang Zhao, Jiaya Jia

Then, since context is essential for semantic segmentation, we propose the Context-Aware Prototype Learning (CAPL) that significantly improves performance by 1) leveraging the co-occurrence prior knowledge from support samples, and 2) dynamically enriching contextual information to the classifier, conditioned on the content of each query image.

Generalized Few-Shot Semantic Segmentation Segmentation +1

Prior Guided Feature Enrichment Network for Few-Shot Segmentation

3 code implementations4 Aug 2020 Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Zhicheng Yang, Ruiyu Li, Jiaya Jia

It consists of novel designs of (1) a training-free prior mask generation method that not only retains generalization power but also improves model performance and (2) Feature Enrichment Module (FEM) that overcomes spatial inconsistency by adaptively enriching query features with support features and prior masks.

Few-Shot Semantic Segmentation Semantic Segmentation

MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution

1 code implementation ECCV 2020 Wenbo Li, Xin Tao, Taian Guo, Lu Qi, Jiangbo Lu, Jiaya Jia

Motivated by these findings, we propose a temporal multi-correspondence aggregation strategy to leverage similar patches across frames, and a cross-scale nonlocal-correspondence aggregation scheme to explore self-similarity of images across scales.

Optical Flow Estimation Video Super-Resolution

Exploring Self-attention for Image Recognition

1 code implementation CVPR 2020 Hengshuang Zhao, Jiaya Jia, Vladlen Koltun

Recent work has shown that self-attention can serve as a basic building block for image recognition models.

Dynamic Scale Training for Object Detection

4 code implementations26 Apr 2020 Yukang Chen, Peizhen Zhang, Zeming Li, Yanwei Li, Xiangyu Zhang, Lu Qi, Jian Sun, Jiaya Jia

We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.

Instance Segmentation Model Optimization +4

Attentive Normalization for Conditional Image Generation

1 code implementation CVPR 2020 Yi Wang, Ying-Cong Chen, Xiangyu Zhang, Jian Sun, Jiaya Jia

Traditional convolution-based generative adversarial networks synthesize images based on hierarchical local operations, where long-range dependency relation is implicitly modeled with a Markov chain.

Conditional Image Generation Semantic correspondence +2

VCNet: A Robust Approach to Blind Image Inpainting

2 code implementations ECCV 2020 Yi Wang, Ying-Cong Chen, Xin Tao, Jiaya Jia

Blind inpainting is a task to automatically complete visual contents without specifying masks for missing areas in an image.

Image Inpainting

Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation

1 code implementation ICCV 2021 Xiaogang Xu, Hengshuang Zhao, Jiaya Jia

Adversarial training is promising for improving robustness of deep neural networks towards adversarial perturbations, especially on the classification task.

Segmentation Semantic Segmentation

PointINS: Point-based Instance Segmentation

no code implementations13 Mar 2020 Lu Qi, Yi Wang, Yukang Chen, Yingcong Chen, Xiangyu Zhang, Jian Sun, Jiaya Jia

In this paper, we explore the mask representation in instance segmentation with Point-of-Interest (PoI) features.

Instance Segmentation Object Detection +3

3DSSD: Point-based 3D Single Stage Object Detector

2 code implementations CVPR 2020 Zetong Yang, Yanan sun, Shu Liu, Jiaya Jia

Our method outperforms all state-of-the-art voxel-based single stage methods by a large margin, and has comparable performance to two stage point-based methods as well, with inference speed more than 25 FPS, 2x faster than former state-of-the-art point-based methods.

Object

GridMask Data Augmentation

7 code implementations13 Jan 2020 Pengguang Chen, Shu Liu, Hengshuang Zhao, Xingquan Wang, Jiaya Jia

Then we show limitation of existing information dropping algorithms and propose our structured method, which is simple and yet very effective.

Data Augmentation object-detection +4

DSGN: Deep Stereo Geometry Network for 3D Object Detection

1 code implementation CVPR 2020 Yilun Chen, Shu Liu, Xiaoyong Shen, Jiaya Jia

Most state-of-the-art 3D object detectors heavily rely on LiDAR sensors because there is a large performance gap between image-based and LiDAR-based methods.

3D Object Detection From Stereo Images Object +2

Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation

no code implementations ICCV 2019 Li Jiang, Hengshuang Zhao, Shu Liu, Xiaoyong Shen, Chi-Wing Fu, Jiaya Jia

To incorporate point features in the edge branch, we establish a hierarchical graph framework, where the graph is initialized from a coarse layer and gradually enriched along the point decoding process.

Scene Labeling Semantic Segmentation

Fast Point R-CNN

no code implementations ICCV 2019 Yilun Chen, Shu Liu, Xiaoyong Shen, Jiaya Jia

We present a unified, efficient and effective framework for point-cloud based 3D object detection.

3D Object Detection object-detection

Attribute-Driven Spontaneous Motion in Unpaired Image Translation

1 code implementation ICCV 2019 Ruizheng Wu, Xin Tao, Xiaodong Gu, Xiaoyong Shen, Jiaya Jia

Current image translation methods, albeit effective to produce high-quality results in various applications, still do not consider much geometric transform.

Attribute Motion Estimation +1

Landmark Assisted CycleGAN for Cartoon Face Generation

no code implementations2 Jul 2019 Ruizheng Wu, Xiaodong Gu, Xin Tao, Xiaoyong Shen, Yu-Wing Tai, Jiaya Jia

In this paper, we are interested in generating an cartoon face of a person by using unpaired training data between real faces and cartoon ones.

Face Generation

Region Refinement Network for Salient Object Detection

no code implementations27 Jun 2019 Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Jiaze Wang, Ruiyu Li, Xiaoyong Shen, Jiaya Jia

Albeit intensively studied, false prediction and unclear boundaries are still major issues of salient object detection.

Object object-detection +5

Associatively Segmenting Instances and Semantics in Point Clouds

3 code implementations CVPR 2019 Xinlong Wang, Shu Liu, Xiaoyong Shen, Chunhua Shen, Jiaya Jia

A 3D point cloud describes the real scene precisely and intuitively. To date how to segment diversified elements in such an informative 3D scene is rarely discussed.

Ranked #15 on 3D Instance Segmentation on S3DIS (mRec metric)

3D Instance Segmentation 3D Semantic Segmentation +1

Human Pose Estimation with Spatial Contextual Information

no code implementations7 Jan 2019 Hong Zhang, Hao Ouyang, Shu Liu, Xiaojuan Qi, Xiaoyong Shen, Ruigang Yang, Jiaya Jia

With this principle, we present two conceptually simple and yet computational efficient modules, namely Cascade Prediction Fusion (CPF) and Pose Graph Neural Network (PGNN), to exploit underlying contextual information.

Pose Estimation

Sequential Context Encoding for Duplicate Removal

no code implementations NeurIPS 2018 Lu Qi, Shu Liu, Jianping Shi, Jiaya Jia

Duplicate removal is a critical step to accomplish a reasonable amount of predictions in prevalent proposal-based object detection frameworks.

Object object-detection +1

PSANet: Point-wise Spatial Attention Network for Scene Parsing

4 code implementations ECCV 2018 Hengshuang Zhao, Yi Zhang, Shu Liu, Jianping Shi, Chen Change Loy, Dahua Lin, Jiaya Jia

We notice information flow in convolutional neural networks is restricted inside local neighborhood regions due to the physical design of convolutional filters, which limits the overall understanding of complex scenes.

Position Scene Parsing +1

Compositing-aware Image Search

no code implementations ECCV 2018 Hengshuang Zhao, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Brian Price, Jiaya Jia

We present a new image search technique that, given a background image, returns compatible foreground objects for image compositing tasks.

Image Retrieval Object

Facelet-Bank for Fast Portrait Manipulation

no code implementations CVPR 2018 Ying-Cong Chen, Huaijia Lin, Michelle Shu, Ruiyu Li, Xin Tao, Yangang Ye, Xiaoyong Shen, Jiaya Jia

Digital face manipulation has become a popular and fascinating way to touch images with the prevalence of smartphones and social networks.

Facial Editing

Scale-recurrent Network for Deep Image Deblurring

4 code implementations CVPR 2018 Xin Tao, Hongyun Gao, Yi Wang, Xiaoyong Shen, Jue Wang, Jiaya Jia

In single image deblurring, the "coarse-to-fine" scheme, i. e. gradually restoring the sharp image on different resolutions in a pyramid, is very successful in both traditional optimization-based methods and recent neural-network-based approaches.

Ranked #3 on Image Deblurring on GoPro (Params (M) metric, using extra training data)

Deblurring Image Deblurring +1

SGN: Sequential Grouping Networks for Instance Segmentation

no code implementations ICCV 2017 Shu Liu, Jiaya Jia, Sanja Fidler, Raquel Urtasun

By exploiting two-directional information, the second network groups horizontal and vertical lines into connected components.

Instance Segmentation Object +1

Unsupervised Learning of Stereo Matching

no code implementations ICCV 2017 Chao Zhou, Hong Zhang, Xiaoyong Shen, Jiaya Jia

However, due to the limitations of these datasets and the difficulty of collecting new stereo data, current methods fail in real-life cases.

Stereo Matching Stereo Matching Hand

3D Graph Neural Networks for RGBD Semantic Segmentation

2 code implementations ICCV 2017 Xiaojuan Qi, Renjie Liao, Jiaya Jia, Sanja Fidler, Raquel Urtasun

Each node in the graph corresponds to a set of points and is associated with a hidden representation vector initialized with an appearance feature extracted by a unary CNN from 2D images.

RGBD Semantic Segmentation Semantic Segmentation

Makeup-Go: Blind Reversion of Portrait Edit

no code implementations ICCV 2017 Ying-Cong Chen, Xiaoyong Shen, Jiaya Jia

In this paper, we propose the task of restoring a portrait image from this process.

regression

Automatic Real-time Background Cut for Portrait Videos

no code implementations28 Apr 2017 Xiaoyong Shen, RuiXing Wang, Hengshuang Zhao, Jiaya Jia

A spatial-temporal refinement network is developed to further refine the segmentation errors in each frame and ensure temporal coherence in the segmentation map.

Segmentation Semantic Segmentation +2

Zero-order Reverse Filtering

1 code implementation ICCV 2017 Xin Tao, Chao Zhou, Xiaoyong Shen, Jue Wang, Jiaya Jia

In this paper, we study an unconventional but practically meaningful reversibility problem of commonly used image filters.

Convolutional Neural Pyramid for Image Processing

no code implementations7 Apr 2017 Xiaoyong Shen, Ying-Cong Chen, Xin Tao, Jiaya Jia

We propose a principled convolutional neural pyramid (CNP) framework for general low-level vision and image processing tasks.

Colorization Image Enhancement +2

High-Quality Correspondence and Segmentation Estimation for Dual-Lens Smart-Phone Portraits

no code implementations ICCV 2017 Xiaoyong Shen, Hongyun Gao, Xin Tao, Chao Zhou, Jiaya Jia

Estimating correspondence between two images and extracting the foreground object are two challenges in computer vision.

Multi-Scale Patch Aggregation (MPA) for Simultaneous Detection and Segmentation

no code implementations CVPR 2016 Shu Liu, Xiaojuan Qi, Jianping Shi, Hong Zhang, Jiaya Jia

Aiming at simultaneous detection and segmentation (SDS), we propose a proposal-free framework, which detect and segment object instances via mid-level patches.

Object Object Proposal Generation +2

ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation

no code implementations CVPR 2016 Di Lin, Jifeng Dai, Jiaya Jia, Kaiming He, Jian Sun

Large-scale data is of crucial importance for learning semantic segmentation models, but annotating per-pixel masks is a tedious and inefficient procedure.

Image Segmentation Segmentation +1

A Closed-Form Solution to Tensor Voting: Theory and Applications

no code implementations19 Jan 2016 Tai-Pang Wu, Sai-Kit Yeung, Jiaya Jia, Chi-Keung Tang, Gerard Medioni

We prove a closed-form solution to tensor voting (CFTV): given a point set in any dimensions, our closed-form solution provides an exact, continuous and efficient algorithm for computing a structure-aware tensor that simultaneously achieves salient structure detection and outlier attenuation.

Stereo Matching Stereo Matching Hand

Mutual-Structure for Joint Filtering

no code implementations ICCV 2015 Xiaoyong Shen, Chao Zhou, Li Xu, Jiaya Jia

Previous joint/guided filters directly transfer the structural information in the reference image to the target one.

Depth Completion Image Enhancement +3

Video Super-Resolution via Deep Draft-Ensemble Learning

no code implementations ICCV 2015 Renjie Liao, Xin Tao, Ruiyu Li, Ziyang Ma, Jiaya Jia

We propose a new direction for fast video super-resolution (VideoSR) via a SR draft ensemble, which is defined as the set of high-resolution patch candidates before final image deconvolution.

Ensemble Learning Image Deconvolution +1

ENFT: Efficient Non-Consecutive Feature Tracking for Robust Structure-from-Motion

3 code implementations27 Oct 2015 Guofeng Zhang, Hao-Min Liu, Zilong Dong, Jiaya Jia, Tien-Tsin Wong, Hujun Bao

Our framework consists of steps of solving the feature `dropout' problem when indistinctive structures, noise or large image distortion exists, and of rapidly recognizing and joining common features located in different subsequences.