Search Results for author: Xinggang Wang

Found 133 papers, 86 papers with code

ViTGaze: Gaze Following with Interaction Features in Vision Transformers

1 code implementation19 Mar 2024 Yuehao Song, Xinggang Wang, Jingfeng Yao, Wenyu Liu, Jinglin Zhang, Xiangmin Xu

Our method achieves state-of-the-art (SOTA) performance among all single-modality methods (3. 4% improvement on AUC, 5. 1% improvement on AP) and very comparable performance against multi-modality methods with 59% number of parameters less.

MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning

1 code implementation13 Mar 2024 Jialv Zou, Bencheng Liao, Qian Zhang, Wenyu Liu, Xinggang Wang

Learning robust and scalable visual representations from massive multi-view video data remains a challenge in computer vision and autonomous driving.

3D Object Detection Autonomous Driving +2

WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition

1 code implementation22 Feb 2024 Lianghui Zhu, Junwei Zhou, Yan Liu, Xin Hao, Wenyu Liu, Xinggang Wang

Weakly supervised visual recognition using inexact supervision is a critical yet challenging learning problem.

object-detection Segmentation +2

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

no code implementations20 Feb 2024 Shaoyu Chen, Bo Jiang, Hao Gao, Bencheng Liao, Qing Xu, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang

Learning a human-like driving policy from large-scale driving demonstrations is promising, but the uncertainty and non-deterministic nature of planning make it challenging.

Autonomous Driving

YOLO-World: Real-Time Open-Vocabulary Object Detection

1 code implementation30 Jan 2024 Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan

The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools.

Instance Segmentation Language Modelling +4

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

4 code implementations17 Jan 2024 Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, Xinggang Wang

The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images and it has great potential to be the next-generation backbone for vision foundation models.

object-detection Object Detection +3

Fast High Dynamic Range Radiance Fields for Dynamic Scenes

no code implementations11 Jan 2024 Guanjun Wu, Taoran Yi, Jiemin Fang, Wenyu Liu, Xinggang Wang

To extend HDR NeRF methods to wider applications, we propose a dynamic HDR NeRF framework, named HDR-HexPlane, which can learn 3D scenes from dynamic 2D images captured with various exposures.

Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views

no code implementations7 Dec 2023 Yabo Chen, Jiemin Fang, YuYang Huang, Taoran Yi, Xiaopeng Zhang, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian

We propose a cascade generation framework constructed with two Zero-1-to-3 models, named Cascade-Zero123, to tackle this issue, which progressively extracts 3D information from the source image.

Transparent objects

Circuit as Set of Points

1 code implementation NeurIPS 2023 Jialv Zou, Xinggang Wang, Jiahao Guo, Wenyu Liu, Qian Zhang, Chang Huang

In our work, we propose a novel perspective for circuit design by treating circuit components as point clouds and using Transformer-based point cloud perception methods to extract features from the circuit.

JudgeLM: Fine-tuned Large Language Models are Scalable Judges

1 code implementation26 Oct 2023 Lianghui Zhu, Xinggang Wang, Xinlong Wang

To address this problem, we propose to fine-tune LLMs as scalable judges (JudgeLM) to evaluate LLMs efficiently and effectively in open-ended benchmarks.

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

1 code implementation12 Oct 2023 Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, Xinggang Wang

Representing and rendering dynamic scenes has been an important but challenging task.

TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance

1 code implementation ICCV 2023 Kan Wu, Houwen Peng, Zhenghong Zhou, Bin Xiao, Mengchen Liu, Lu Yuan, Hong Xuan, Michael Valenzuela, Xi, Chen, Xinggang Wang, Hongyang Chao, Han Hu

In this paper, we propose a novel cross-modal distillation method, called TinyCLIP, for large-scale language-image pre-trained models.

TouchStone: Evaluating Vision-Language Models by Language Models

1 code implementation31 Aug 2023 Shuai Bai, Shusheng Yang, Jinze Bai, Peng Wang, Xingxuan Zhang, Junyang Lin, Xinggang Wang, Chang Zhou, Jingren Zhou

Large vision-language models (LVLMs) have recently witnessed rapid advancements, exhibiting a remarkable capacity for perceiving, understanding, and processing visual information by connecting visual receptor with large language models (LLMs).

Visual Storytelling

MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction

1 code implementation10 Aug 2023 Bencheng Liao, Shaoyu Chen, Yunchi Zhang, Bo Jiang, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang

We propose a unified permutation-equivalent modeling approach, \ie, modeling map element as a point set with a group of equivalent permutations, which accurately describes the shape of map element and stabilizes the learning process.

Autonomous Driving

Symphonize 3D Semantic Scene Completion with Contextual Instance Queries

1 code implementation27 Jun 2023 Haoyi Jiang, Tianheng Cheng, Naiyu Gao, Haoyang Zhang, Tianwei Lin, Wenyu Liu, Xinggang Wang

`3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal undertaking in autonomous driving, aiming to predict voxel occupancy within volumetric scenes.

3D Semantic Scene Completion from a single RGB image Autonomous Driving

ProRes: Exploring Degradation-aware Visual Prompt for Universal Image Restoration

1 code implementation23 Jun 2023 Jiaqi Ma, Tianheng Cheng, Guoli Wang, Qian Zhang, Xinggang Wang, Lefei Zhang

We then leverage degradation-aware visual prompts to establish a controllable and universal model for image restoration, called ProRes, which is applicable to an extensive range of image restoration tasks.

Deblurring Denoising +1

Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification

1 code implementation8 Jun 2023 Ruijie Zhang, Qiaozhe Zhang, Yingzhuang Liu, Hao Xin, Yan Liu, Xinggang Wang

Whole slide image (WSI) refers to a type of high-resolution scanned tissue image, which is extensively employed in computer-assisted diagnosis (CAD).

Image Classification Multiple Instance Learning

SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth

2 code implementations8 Jun 2023 Zelin Liu, Xinggang Wang, Cheng Wang, Wenyu Liu, Xiang Bai

By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack.

 Ranked #1 on Multi-Object Tracking on MOT20 (using extra training data)

Depth Estimation Multi-Object Tracking +1

Matte Anything: Interactive Natural Image Matting with Segment Anything Models

1 code implementation7 Jun 2023 Jingfeng Yao, Xinggang Wang, Lang Ye, Wenyu Liu

In our work, we leverage vision foundation models to enhance the performance of natural image matting.

Image Matting

GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition

no code implementations31 May 2023 Haijun Xiong, Yunze Deng, Xiaohu Huang, Xinggang Wang, Wenyu Liu, Bin Feng

In order to fully harness the potential of gait recognition, it is crucial to consider temporal features at various granularities and spans.

Gait Recognition

ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers

4 code implementations24 May 2023 Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang

Recently, plain vision Transformers (ViTs) have shown impressive performance on various computer vision tasks, thanks to their strong modeling capacity and large-scale pretraining.

Image Matting

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

2 code implementations18 May 2023 Peng Wang, Shijie Wang, Junyang Lin, Shuai Bai, Xiaohuan Zhou, Jingren Zhou, Xinggang Wang, Chang Zhou

In this work, we explore a scalable way for building a general representation model toward unlimited modalities.

 Ranked #1 on Semantic Segmentation on ADE20K (using extra training data)

Action Classification AudioCaps +16

VMA: Divide-and-Conquer Vectorized Map Annotation System for Large-Scale Driving Scene

2 code implementations19 Apr 2023 Shaoyu Chen, Yunchi Zhang, Bencheng Liao, Jiafeng Xie, Tianheng Cheng, Wei Sui, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang

We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geometric patterns as unified point sequence representation, which can be extended to most map elements in the driving scene.

Autonomous Driving

TinyDet: Accurate Small Object Detection in Lightweight Generic Detectors

no code implementations7 Apr 2023 Shaoyu Chen, Tianheng Cheng, Jiemin Fang, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang

Small object detection requires the detection head to scan a large number of positions on image feature maps, which is extremely hard for computation- and energy-efficient lightweight generic detectors.

object-detection Small Object Detection

WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation

1 code implementation3 Apr 2023 Lianghui Zhu, Yingyue Li, Jiemin Fang, Yan Liu, Hao Xin, Wenyu Liu, Xinggang Wang

Thus a novel weight-based method is proposed to end-to-end estimate the importance of attention heads, while the self-attention maps are adaptively fused for high-quality CAM results that tend to have more complete objects.

Weakly-supervised Learning Weakly supervised Semantic Segmentation +1

RPTQ: Reorder-based Post-training Quantization for Large Language Models

1 code implementation3 Apr 2023 Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu

In this paper, we identify that the challenge in quantizing activations in LLMs arises from varying ranges across channels, rather than solely the presence of outliers.

Quantization

MobileInst: Video Instance Segmentation on the Mobile

no code implementations30 Mar 2023 Renhong Zhang, Tianheng Cheng, Shusheng Yang, Haoyi Jiang, Shuai Zhang, Jiancheng Lyu, Xin Li, Xiaowen Ying, Dashan Gao, Wenyu Liu, Xinggang Wang

To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices.

Instance Segmentation Segmentation +2

OpenInst: A Simple Query-Based Method for Open-World Instance Segmentation

no code implementations28 Mar 2023 Cheng Wang, Guoli Wang, Qian Zhang, Peng Guo, Wenyu Liu, Xinggang Wang

Fortunately, we have identified two observations that help us achieve the best of both worlds: 1) query-based methods demonstrate superiority over dense proposal-based methods in open-world instance segmentation, and 2) learning localization cues is sufficient for open world instance segmentation.

Autonomous Driving Open-World Instance Segmentation +2

ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every Detection Box

no code implementations27 Mar 2023 Yifu Zhang, Xinggang Wang, Xiaoqing Ye, Wei zhang, Jincheng Lu, Xiao Tan, Errui Ding, Peize Sun, Jingdong Wang

We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes, which alleviates the problems of object missing and fragmented trajectories.

3D Multi-Object Tracking motion prediction +1

Generalizable Neural Voxels for Fast Human Radiance Fields

no code implementations27 Mar 2023 Taoran Yi, Jiemin Fang, Xinggang Wang, Wenyu Liu

Rendering moving human bodies at free viewpoints only from a monocular video is quite a challenging problem.

Novel View Synthesis

Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance

no code implementations23 Mar 2023 Zhihang Yuan, Jiawei Liu, Jiaxiang Wu, Dawei Yang, Qiang Wu, Guangyu Sun, Wenyu Liu, Xinggang Wang, Bingzhe Wu

Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures.

Benchmarking Data Augmentation +1

VAD: Vectorized Scene Representation for Efficient Autonomous Driving

2 code implementations ICCV 2023 Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang

In this paper, we propose VAD, an end-to-end vectorized paradigm for autonomous driving, which models the driving scene as a fully vectorized representation.

Autonomous Driving Trajectory Planning

EVA-02: A Visual Representation for Neon Genesis

6 code implementations20 Mar 2023 Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao

We launch EVA-02, a next-generation Transformer-based visual representation pre-trained to reconstruct strong and robust language-aligned vision features via masked image modeling.

Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction

1 code implementation15 Mar 2023 Bencheng Liao, Shaoyu Chen, Bo Jiang, Tianheng Cheng, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang

We present a path-based online lane graph construction method, termed LaneGAP, which end-to-end learns the path and recovers the lane graph via a Path2Graph algorithm.

Autonomous Driving graph construction +1

Boosting Low-Data Instance Segmentation by Unsupervised Pre-training with Saliency Prompt

no code implementations CVPR 2023 Hao Li, Dingwen Zhang, Nian Liu, Lechao Cheng, Yalun Dai, Chao Zhang, Xinggang Wang, Junwei Han

Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models by giving Saliency Prompt for queries/kernels.

Instance Segmentation Semantic Segmentation +1

Understanding Self-Supervised Pretraining with Part-Aware Representation Learning

1 code implementation27 Jan 2023 Jie Zhu, Jiyang Qi, Mingyu Ding, Xiaokang Chen, Ping Luo, Xinggang Wang, Wenyu Liu, Leye Wang, Jingdong Wang

The study is mainly motivated by that random views, used in contrastive learning, and random masked (visible) patches, used in masked image modeling, are often about object parts.

Contrastive Learning Object +1

Graph Contrastive Learning for Skeleton-based Action Recognition

1 code implementation26 Jan 2023 Xiaohu Huang, Hao Zhou, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang, Xinggang Wang, Wenyu Liu, Bin Feng

In this paper, we propose a graph contrastive learning framework for skeleton-based action recognition (\textit{SkeletonGCL}) to explore the \textit{global} context across all sequences.

Action Recognition Contrastive Learning +2

A Simple Adaptive Unfolding Network for Hyperspectral Image Reconstruction

1 code implementation24 Jan 2023 Junyu Wang, Shijie Wang, Wenyu Liu, Zengqiang Zheng, Xinggang Wang

We present a simple, efficient, and scalable unfolding network, SAUNet, to simplify the network design with an adaptive alternate optimization framework for hyperspectral image (HSI) reconstruction.

Image Reconstruction

RILS: Masked Visual Reconstruction in Language Semantic Space

1 code implementation CVPR 2023 Shusheng Yang, Yixiao Ge, Kun Yi, Dian Li, Ying Shan, XiaoHu Qie, Xinggang Wang

Both masked image modeling (MIM) and natural language supervision have facilitated the progress of transferable visual pre-training.

Sentence

PD-Quant: Post-Training Quantization based on Prediction Difference Metric

1 code implementation CVPR 2023 Jiawei Liu, Lin Niu, Zhihang Yuan, Dawei Yang, Xinggang Wang, Wenyu Liu

It determines the quantization parameters by using the information of differences between network prediction before and after quantization.

Neural Network Compression Quantization

MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction

1 code implementation30 Aug 2022 Bencheng Liao, Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Wenyu Liu, Chang Huang

High-definition (HD) map provides abundant and precise environmental information of the driving scene, serving as a fundamental and indispensable component for planning in autonomous driving system.

3D Lane Detection Autonomous Driving

Robust Multi-Object Tracking by Marginal Inference

no code implementations7 Aug 2022 Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, Wenyu Liu

To address the problem, we present an efficient approach to compute a marginal probability for each pair of objects in real time.

Multi-Object Tracking Object

AiATrack: Attention in Attention for Transformer Visual Tracking

1 code implementation20 Jul 2022 Shenyuan Gao, Chunluan Zhou, Chao Ma, Xinggang Wang, Junsong Yuan

However, the independent correlation computation in the attention mechanism could result in noisy and ambiguous attention weights, which inhibits further performance improvement.

Visual Object Tracking Visual Tracking

Polar Parametrization for Vision-based Surround-View 3D Detection

1 code implementation22 Jun 2022 Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Chang Huang, Wenyu Liu

Based on Polar Parametrization, we propose a surround-view 3D DEtection TRansformer, named PolarDETR.

Inductive Bias Position

Featurized Query R-CNN

1 code implementation13 Jun 2022 Wenqiang Zhang, Tianheng Cheng, Xinggang Wang, Shaoyu Chen, Qian Zhang, Wenyu Liu

The query mechanism introduced in the DETR method is changing the paradigm of object detection and recently there are many query-based methods have obtained strong object detection performance.

Object object-detection +1

Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer

1 code implementation9 Jun 2022 Shaoyu Chen, Tianheng Cheng, Xinggang Wang, Wenming Meng, Qian Zhang, Wenyu Liu

GKT leverages the geometric priors to guide the transformer to focus on discriminative regions and unfolds kernel features to generate BEV representation.

Autonomous Driving Representation Learning

Fast Dynamic Radiance Fields with Time-Aware Neural Voxels

1 code implementation30 May 2022 Jiemin Fang, Taoran Yi, Xinggang Wang, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Matthias Nießner, Qi Tian

A multi-distance interpolation method is proposed and applied on voxel features to model both small and large motions.

Temporally Efficient Vision Transformer for Video Instance Segmentation

3 code implementations CVPR 2022 Shusheng Yang, Xinggang Wang, Yu Li, Yuxin Fang, Jiemin Fang, Wenyu Liu, Xun Zhao, Ying Shan

To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision Transformer (TeViT) for video instance segmentation (VIS).

Instance Segmentation Semantic Segmentation +1

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

3 code implementations CVPR 2022 Wenqiang Zhang, Zilong Huang, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen

Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices.

Segmentation Semantic Segmentation

Multi-scale Context-aware Network with Transformer for Gait Recognition

1 code implementation ICCV 2021 Duowang Zhu, Xiaohu Huang, Xinggang Wang, Bo Yang, Botao He, Wenyu Liu, Bin Feng

Although gait recognition has drawn increasing research attention recently, since the silhouette differences are quite subtle in spatial domain, temporal feature representation is crucial for gait recognition.

Multiview Gait Recognition Relation

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection

2 code implementations ICCV 2023 Yuxin Fang, Shusheng Yang, Shijie Wang, Yixiao Ge, Ying Shan, Xinggang Wang

We present an approach to efficiently and effectively adapt a masked image modeling (MIM) pre-trained vanilla Vision Transformer (ViT) for object detection, which is based on our two novel observations: (i) A MIM pre-trained vanilla ViT encoder can work surprisingly well in the challenging object-level recognition scenario even with randomly sampled partial observations, e. g., only 25% $\sim$ 50% of the input embeddings.

Instance Segmentation Object +2

Corrupted Image Modeling for Self-Supervised Visual Pre-Training

no code implementations7 Feb 2022 Yuxin Fang, Li Dong, Hangbo Bao, Xinggang Wang, Furu Wei

Given this corrupted image, an enhancer network learns to either recover all the original image pixels, or predict whether each visual token is replaced by a generator sample or not.

Image Classification Semantic Segmentation

NeuSample: Neural Sample Field for Efficient View Synthesis

1 code implementation30 Nov 2021 Jiemin Fang, Lingxi Xie, Xinggang Wang, Xiaopeng Zhang, Wenyu Liu, Qi Tian

Neural radiance fields (NeRF) have shown great potentials in representing 3D scenes and synthesizing novel views, but the computational overhead of NeRF at the inference stage is still heavy.

Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge

no code implementations15 Nov 2021 Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

To promote the development of occlusion understanding, we collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario.

Instance Segmentation Object Recognition +3

VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the Wild

no code implementations5 Aug 2021 Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenyu Liu, Wenjun Zeng

We estimate 3D poses from the voxel representation by predicting whether each voxel contains a particular body joint.

Ranked #7 on 3D Multi-Person Pose Estimation on Panoptic (using extra training data)

3D Multi-Person Pose Estimation 3D Pose Estimation

What Makes for Hierarchical Vision Transformer?

no code implementations5 Jul 2021 Yuxin Fang, Xinggang Wang, Rui Wu, Wenyu Liu

Recent studies indicate that hierarchical Vision Transformer with a macro architecture of interleaved non-overlapped window-based self-attention \& shifted-window operation is able to achieve state-of-the-art performance in various visual recognition tasks, and challenges the ubiquitous convolutional neural networks (CNNs) using densely slid kernels.

Instance Segmentation object-detection +3

Bag of Instances Aggregation Boosts Self-supervised Distillation

1 code implementation ICLR 2022 Haohang Xu, Jiemin Fang, Xiaopeng Zhang, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian

Here bag of instances indicates a set of similar samples constructed by the teacher and are grouped within a bag, and the goal of distillation is to aggregate compact representations over the student with respect to instances in a bag.

Contrastive Learning Self-Supervised Learning

Tracking Instances as Queries

1 code implementation22 Jun 2021 Shusheng Yang, Yuxin Fang, Xinggang Wang, Yu Li, Ying Shan, Bin Feng, Wenyu Liu

Recently, query based deep networks catch lots of attention owing to their end-to-end pipeline and competitive results on several fundamental computer vision tasks, such as object detection, semantic segmentation, and instance segmentation.

Instance Segmentation object-detection +4

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

2 code implementations NeurIPS 2021 Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu

Can Transformer perform 2D object- and region-level recognition from a pure sequence-to-sequence perspective with minimal knowledge about the 2D spatial structure?

Object object-detection +1

Instances as Queries

5 code implementations ICCV 2021 Yuxin Fang, Shusheng Yang, Xinggang Wang, Yu Li, Chen Fang, Ying Shan, Bin Feng, Wenyu Liu

The key insight of QueryInst is to leverage the intrinsic one-to-one correspondence in object queries across different stages, as well as one-to-one correspondence between mask RoI features and object queries in the same stage.

Ranked #13 on Object Detection on COCO-O (using extra training data)

Instance Segmentation Object +4

Crossover Learning for Fast Online Video Instance Segmentation

1 code implementation ICCV 2021 Shusheng Yang, Yuxin Fang, Xinggang Wang, Yu Li, Chen Fang, Ying Shan, Bin Feng, Wenyu Liu

For temporal information modeling in VIS, we present a novel crossover learning scheme that uses the instance feature in the current frame to pixel-wisely localize the same instance in other frames.

Instance Segmentation Semantic Segmentation +2

Weakly-supervised Instance Segmentation via Class-agnostic Learning with Salient Images

no code implementations CVPR 2021 Xinggang Wang, Jiapei Feng, Bin Hu, Qi Ding, Longjin Ran, Xiaoxin Chen, Wenyu Liu

Humans have a strong class-agnostic object segmentation ability and can outline boundaries of unknown objects precisely, which motivates us to propose a box-supervised class-agnostic object segmentation (BoxCaseg) based solution for weakly-supervised instance segmentation.

Ranked #5 on Box-supervised Instance Segmentation on COCO test-dev (using extra training data)

Box-supervised Instance Segmentation Multi-Task Learning +5

Half-Real Half-Fake Distillation for Class-Incremental Semantic Segmentation

no code implementations2 Apr 2021 Zilong Huang, Wentian Hao, Xinggang Wang, Mingyuan Tao, Jianqiang Huang, Wenyu Liu, Xian-Sheng Hua

Despite their success for semantic segmentation, convolutional neural networks are ill-equipped for incremental learning, \ie, adapting the original segmentation model as new classes are available but the initial training data is not retained.

Class-Incremental Semantic Segmentation Incremental Learning +1

Noise Modulation: Let Your Model Interpret Itself

no code implementations19 Mar 2021 Haoyang Li, Xinggang Wang

Given the great success of Deep Neural Networks(DNNs) and the black-box nature of it, the interpretability of these models becomes an important issue. The majority of previous research works on the post-hoc interpretation of a trained model. But recently, adversarial training shows that it is possible for a model to have an interpretable input-gradient through training. However, adversarial training lacks efficiency for interpretability. To resolve this problem, we construct an approximation of the adversarial perturbations and discover a connection between adversarial training and amplitude modulation.

Deep Online Correction for Monocular Visual Odometry

no code implementations18 Mar 2021 Jiaxin Zhang, Wei Sui, Xinggang Wang, Wenming Meng, Hongmei Zhu, Qian Zhang

Second, the poses predicted by CNNs are further improved by minimizing photometric errors via gradient updates of poses during inference phases.

Monocular Visual Odometry RTE

Occluded Video Instance Segmentation: A Benchmark

2 code implementations2 Feb 2021 Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai

On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16. 3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.

Instance Segmentation Segmentation +3

Learning to Focus: Cascaded Feature Matching Network for Few-shot Image Recognition

no code implementations13 Jan 2021 Mengting Chen, Xinggang Wang, Heng Luo, Yifeng Geng, Wenyu Liu

By applying the proposed feature matching block in different layers of the few-shot recognition network, multi-scale information among the compared images can be incorporated into the final cascaded matching feature, which boosts the recognition performance further and generalizes better by learning on relationships.

Few-Shot Learning

ResizeMix: Mixing Data with Preserved Object Information and True Labels

1 code implementation21 Dec 2020 Jie Qin, Jiemin Fang, Qian Zhang, Wenyu Liu, Xingang Wang, Xinggang Wang

Especially, CutMix uses a simple but effective method to improve the classifiers by randomly cropping a patch from one image and pasting it on another image.

Data Augmentation Image Classification +3

Medial Injury/Dysfunction Induced Granulation Tissue Repair is the Pathogenesis of Atherosclerosis

no code implementations13 Oct 2020 Xinggang Wang, Aijun Sun, Junbo Ge

Myofibroblasts, ECM and lumen (intima)/vasa vasorum (VV) (adventitia) constitute granulation tissue repair.

Hemodynamic Bigger Hydrostatic Pressure Instead of Lower Shear Stress Aggravates Atherosclerosis

no code implementations20 Aug 2020 Xinggang Wang, Junbo Ge

When blood micro cluster flows over a very short distance or the same transection of the artery, previous studies did not consider the conversion between 1/2\r{ho}v^2 and P. Therefore, low shear stress aggravates atherosclerosis is an appearance, and the essence is that these areas with smaller blood velocity have much bigger hydrostatic pressure, which aggravates atherosclerosis.

Myofibroblast Forms Atherosclerotic Plaques

no code implementations21 Jul 2020 Xinggang Wang, Junbo Ge

It is the first time that lipid rich plaques with lots of foam cells, extracellular lipids and collagen fibers formed in vitro.

Boundary-preserving Mask R-CNN

1 code implementation ECCV 2020 Tianheng Cheng, Xinggang Wang, Lichao Huang, Wenyu Liu

Besides, it is not surprising to observe that BMask R-CNN obtains more obvious improvement when the evaluation criterion requires better localization (e. g., AP$_{75}$) as shown in Fig. 1.

Instance Segmentation Object +1

Deep multi-metric learning for text-independent speaker verification

1 code implementation17 Jul 2020 Jiwei Xu, Xinggang Wang, Bin Feng, Wenyu Liu

Text-independent speaker verification is an important artificial intelligence problem that has a wide spectrum of applications, such as criminal investigation, payment certification, and interest-based customer services.

Metric Learning Text-Independent Speaker Verification

FNA++: Fast Network Adaptation via Parameter Remapping and Architecture Search

2 code implementations21 Jun 2020 Jiemin Fang, Yuzhu Sun, Qian Zhang, Kangjian Peng, Yuan Li, Wenyu Liu, Xinggang Wang

In this paper, we propose a Fast Network Adaptation (FNA++) method, which can adapt both the architecture and parameters of a seed network (e. g. an ImageNet pre-trained network) to become a network with different depths, widths, or kernel sizes via a parameter remapping technique, making it possible to use NAS for segmentation and detection tasks a lot more efficiently.

Image Classification Neural Architecture Search +5

FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking

31 code implementations4 Apr 2020 Yifu Zhang, Chunyu Wang, Xinggang Wang, Wen-Jun Zeng, Wenyu Liu

Formulating MOT as multi-task learning of object detection and re-ID in a single network is appealing since it allows joint optimization of the two tasks and enjoys high computation efficiency.

 Ranked #1 on Multi-Object Tracking on 2DMOT15 (using extra training data)

Fairness Multi-Object Tracking +4

Deep Learning-based Detection for COVID-19 from Chest CT using Weak Label

1 code implementation medRxiv 2020 Chuansheng Zheng, Xianbo Deng, Qing Fu, Qiang Zhou, Jiapei Feng, Hui Ma, Wenyu Liu, Xinggang Wang

Our weakly-supervised deep learning model can accurately predict the COVID-19 infectious probability in chest CT volumes without the need for annotating the lesions for training.

COVID-19 Diagnosis Specificity

AlignSeg: Feature-Aligned Segmentation Networks

1 code implementation24 Feb 2020 Zilong Huang, Yunchao Wei, Xinggang Wang, Wenyu Liu, Thomas S. Huang, Humphrey Shi

Aggregating features in terms of different convolutional blocks or contextual embeddings has been proven to be an effective way to strengthen feature representations for semantic segmentation.

Segmentation Semantic Segmentation

Fast Neural Network Adaptation via Parameter Remapping and Architecture Search

no code implementations ICLR 2020 Jiemin Fang, Yuzhu Sun, Kangjian Peng, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang

In our experiments, we conduct FNA on MobileNetV2 to obtain new networks for both segmentation and detection that clearly out-perform existing networks designed both manually and by NAS.

Image Classification Neural Architecture Search +4

Diversity Transfer Network for Few-Shot Learning

1 code implementation31 Dec 2019 Mengting Chen, Yuxin Fang, Xinggang Wang, Heng Luo, Yifeng Geng, Xin-Yu Zhang, Chang Huang, Wenyu Liu, Bo wang

The learning problem of the sample generation (i. e., diversity transfer) is solved via minimizing an effective meta-classification loss in a single-stage network, instead of the generative loss in previous works.

Few-Shot Learning

IoU-aware Single-stage Object Detector for Accurate Localization

2 code implementations12 Dec 2019 Shengkai Wu, Xiaoping Li, Xinggang Wang

The detection confidence is then used as the input of the subsequent NMS and COCO AP computation, which will substantially improve the localization accuracy of models.

General Classification Object

Deep High-Resolution Representation Learning for Visual Recognition

42 code implementations20 Aug 2019 Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.

 Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)

Dichotomous Image Segmentation Face Alignment +7

IoU-balanced Loss Functions for Single-stage Object Detection

no code implementations15 Aug 2019 Shengkai Wu, Jinrong Yang, Xinggang Wang, Xiaoping Li

The IoU-balanced localization loss decreases the gradient of examples with low IoU and increases the gradient of examples with high IoU, which can improve the localization accuracy of models.

Classification General Classification +3

Object Detection in Video with Spatial-temporal Context Aggregation

no code implementations11 Jul 2019 Hao Luo, Lichao Huang, Han Shen, Yuan Li, Chang Huang, Xinggang Wang

Without any bells and whistles, our method obtains 80. 3\% mAP on the ImageNet VID dataset, which is superior over the previous state-of-the-arts.

Object object-detection +1

Direct Object Recognition Without Line-of-Sight Using Optical Coherence

no code implementations CVPR 2019 Xin Lei, Liangyu He, Yixuan Tan, Ken Xingze Wang, Xinggang Wang, Yihan Du, Shanhui Fan, Zongfu Yu

Visual object recognition under situations in which the direct line-of-sight is blocked, such as when it is occluded around the corner, is of practical importance in a wide range of applications.

Object Object Recognition

Mask Scoring R-CNN

3 code implementations CVPR 2019 Zhaojin Huang, Lichao Huang, Yongchao Gong, Chang Huang, Xinggang Wang

In this paper, we study this problem and propose Mask Scoring R-CNN which contains a network block to learn the quality of the predicted instance masks.

General Classification Instance Segmentation +2

CCNet: Criss-Cross Attention for Semantic Segmentation

4 code implementations ICCV 2019 Zilong Huang, Xinggang Wang, Yunchao Wei, Lichao Huang, Humphrey Shi, Wenyu Liu, Thomas S. Huang

Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11x less GPU memory usage.

Ranked #7 on Semantic Segmentation on FoodSeg103 (using extra training data)

Computational Efficiency Human Parsing +8

Mancs: A Multi-task Attentional Network with Curriculum Sampling for Person Re-identification

no code implementations ECCV 2018 Cheng Wang, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang

We propose a novel deep network called Mancs that solves the person re-identification problem from the following aspects: fully utilizing the attention mechanism for the person misalignment problem and properly sampling for the ranking loss to obtain more stable person representation.

Person Re-Identification

Weakly Supervised Region Proposal Network and Object Detection

no code implementations ECCV 2018 Peng Tang, Xinggang Wang, Angtian Wang, Yongluan Yan, Wenyu Liu, Junzhou Huang, Alan Yuille

The Convolutional Neural Network (CNN) based region proposal generation method (i. e. region proposal network), trained using bounding box annotations, is an essential component in modern fully supervised object detectors.

Object object-detection +2

Reinforced Evolutionary Neural Architecture Search

1 code implementation1 Aug 2018 Yukang Chen, Gaofeng Meng, Qian Zhang, Shiming Xiang, Chang Huang, Lisen Mu, Xinggang Wang

To address this issue, we propose the Reinforced Evolutionary Neural Architecture Search (RE- NAS), which is an evolutionary method with the reinforced mutation for NAS.

Neural Architecture Search Semantic Segmentation

Unsupervised Domain Adaptive Re-Identification: Theory and Practice

3 code implementations30 Jul 2018 Liangchen Song, Cheng Wang, Lefei Zhang, Bo Du, Qian Zhang, Chang Huang, Xinggang Wang

We study the problem of unsupervised domain adaptive re-identification (re-ID) which is an active topic in computer vision but lacks a theoretical foundation.

General Classification Unsupervised Domain Adaptation

PCL: Proposal Cluster Learning for Weakly Supervised Object Detection

4 code implementations9 Jul 2018 Peng Tang, Xinggang Wang, Song Bai, Wei Shen, Xiang Bai, Wenyu Liu, Alan Yuille

The iterative instance classifier refinement is implemented online using multiple streams in convolutional neural networks, where the first is an MIL network and the others are for instance classifier refinement supervised by the preceding one.

Multiple Instance Learning Object +3

Weakly-Supervised Semantic Segmentation Network With Deep Seeded Region Growing

1 code implementation CVPR 2018 Zilong Huang, Xinggang Wang, Jiasi Wang, Wenyu Liu, Jingdong Wang

Inspired by the traditional image segmentation methods of seeded region growing, we propose to train a semantic segmentation network starting from the discriminative regions and progressively increase the pixel-level supervision using by seeded region growing.

Ranked #37 on Weakly-Supervised Semantic Segmentation on COCO 2014 val (using extra training data)

Image Segmentation Segmentation +2

Object Detection in Videos by High Quality Object Linking

no code implementations30 Jan 2018 Peng Tang, Chunyu Wang, Xinggang Wang, Wenyu Liu, Wen-Jun Zeng, Jingdong Wang

In particular, our method improves results by 8. 8% over the static image detector for fast moving objects.

General Classification Object +3

Point Linking Network for Object Detection

no code implementations12 Jun 2017 Xinggang Wang, Kaibing Chen, Zilong Huang, Cong Yao, Wenyu Liu

The deep ConvNets based object detectors mainly focus on regressing the coordinates of bounding box, e. g., Faster-R-CNN, YOLO and SSD.

Object object-detection +1

Deep Patch Learning for Weakly Supervised Object Classification and Discovery

1 code implementation6 May 2017 Peng Tang, Xinggang Wang, Zilong Huang, Xiang Bai, Wenyu Liu

Patch-level image representation is very important for object classification and detection, since it is robust to spatial transformation, scale variation, and cluttered background.

Classification General Classification +3

Multiple Instance Detection Network with Online Instance Classifier Refinement

4 code implementations CVPR 2017 Peng Tang, Xinggang Wang, Xiang Bai, Wenyu Liu

We propose a novel online instance classifier refinement algorithm to integrate MIL and the instance classifier refinement procedure into a single deep network, and train the network end-to-end with only image-level supervision, i. e., without object location information.

Multiple Instance Learning Object +3

TextBoxes: A Fast Text Detector with a Single Deep Neural Network

3 code implementations21 Nov 2016 Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu

This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard non-maximum suppression.

Revisiting Multiple Instance Neural Networks

no code implementations8 Oct 2016 Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, Wenyu Liu

We propose a new multiple instance neural network to learn bag representations, which is different from the existing multiple instance neural networks that focus on estimating instance label.

Multiple Instance Learning Weakly-supervised Learning

Deep FisherNet for Object Classification

no code implementations31 Jul 2016 Peng Tang, Xinggang Wang, Baoguang Shi, Xiang Bai, Wenyu Liu, Zhuowen Tu

Our proposed FisherNet combines convolutional neural network training and Fisher Vector encoding in a single end-to-end structure.

Classification Computational Efficiency +3

Shape Recognition by Bag of Skeleton-associated Contour Parts

no code implementations20 May 2016 Wei Shen, Yuan Jiang, Wenjing Gao, Dan Zeng, Xinggang Wang

Contour and skeleton are two complementary representations for shape recognition.

Bag Reference Vector for Multi-instance Learning

no code implementations3 Dec 2015 Hanqiang Song, Zhuotun Zhu, Xinggang Wang

Multi-instance learning (MIL) has a wide range of applications due to its distinctive characteristics.

Text Categorization

Relaxed Multiple-Instance SVM with Application to Object Discovery

no code implementations ICCV 2015 Xinggang Wang, Zhuotun Zhu, Cong Yao, Xiang Bai

Multiple-instance learning (MIL) has served as an important tool for a wide range of vision applications, for instance, image classification, object detection, and visual tracking.

General Classification Image Classification +6

Deep Learning Representation using Autoencoder for 3D Shape Retrieval

no code implementations25 Sep 2014 Zhuotun Zhu, Xinggang Wang, Song Bai, Cong Yao, Xiang Bai

By combing the global deep learning representation and the local descriptor representation, our method can obtain the state-of-the-art performance on 3D shape retrieval benchmarks.

3D Shape Classification 3D Shape Recognition +5

Cannot find the paper you are looking for? You can Submit a new open access paper.