Search Results for author: Wenhai Wang

Found 52 papers, 44 papers with code

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

1 code implementation24 May 2023 Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo

In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities.

Image Captioning Language Modelling +2

Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization

no code implementations18 May 2023 Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Wenhai Wang, Shouling Ji

Most previous approaches rely on the sentence-level retrieval and combination paradigm (retrieval of similar code snippets and use of the corresponding code and summary pairs) on the encoder side.

Code Summarization Retrieval +1

VideoChat: Chat-Centric Video Understanding

1 code implementation10 May 2023 Kunchang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, LiMin Wang, Yu Qiao

In this study, we initiate an exploration into video understanding by introducing VideoChat, an end-to-end chat-centric video understanding system.

Video Understanding

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

2 code implementations9 May 2023 Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, LiMin Wang, Ping Luo, Jifeng Dai, Yu Qiao

Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, the proposed iGPT significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios where the number of objects is greater than 2.

Language Modelling

A Survey of Historical Learning: Learning Models with Learning History

1 code implementation23 Mar 2023 Xiang Li, Ge Wu, Lingfeng Yang, Wenhai Wang, RenJie Song, Jian Yang

The various types of elements, deposited in the training history, are a large amount of wealth for improving learning deep models.

Ensemble Learning

Champion Solution for the WSDM2023 Toloka VQA Challenge

1 code implementation22 Jan 2023 Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu

In this report, we present our champion solution to the WSDM2023 Toloka Visual Question Answering (VQA) Challenge.

Question Answering Visual Grounding +1

Planning-oriented Autonomous Driving

1 code implementation CVPR 2023 Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li

Oriented at this, we revisit the key components within perception and prediction, and prioritize the tasks such that all these tasks contribute to planning.

Autonomous Driving Philosophy

VLG: General Video Recognition with Web Textual Knowledge

1 code implementation3 Dec 2022 Jintao Lin, Zhaoyang Liu, Wenhai Wang, Wayne Wu, LiMin Wang

Our VLG is first pre-trained on video and language datasets to learn a shared feature space, and then devises a flexible bi-modal attention head to collaborate high-level semantic concepts under different settings.

Video Recognition

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

2 code implementations CVPR 2023 Hao Li, Jinguo Zhu, Xiaohu Jiang, Xizhou Zhu, Hongsheng Li, Chun Yuan, Xiaohua Wang, Yu Qiao, Xiaogang Wang, Wenhai Wang, Jifeng Dai

In this paper, we propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-language tasks with competitive performance.

Language Modelling Multi-Task Learning

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

2 code implementations CVPR 2023 Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state.

 Ranked #1 on Instance Segmentation on COCO minival (using extra training data)

2D object detection Classification +4

Demystify Transformers & Convolutions in Modern Image Deep Networks

1 code implementation10 Nov 2022 Jifeng Dai, Min Shi, Weiyun Wang, Sitong Wu, Linjie Xing, Wenhai Wang, Xizhou Zhu, Lewei Lu, Jie zhou, Xiaogang Wang, Yu Qiao, Xiaowei Hu

Although the novel feature transformation designs are often claimed as the source of gain, some backbones may benefit from advanced engineering techniques, which makes it hard to identify the real gain from the key feature transformation operators.

Image Deep Networks Spatial Token Mixer

Incremental Few-Shot Semantic Segmentation via Embedding Adaptive-Update and Hyper-class Representation

no code implementations26 Jul 2022 Guangchen Shi, Yirui Wu, Jun Liu, Shaohua Wan, Wenhai Wang, Tong Lu

Second, to resist overfitting issues caused by few training samples, a hyper-class embedding is learned by clustering all category embeddings for initialization and aligned with category embedding of the new class for enhancement, where learned knowledge assists to learn new knowledge, thus alleviating performance dependence on training data scale.

Few-Shot Semantic Segmentation Semantic Segmentation

Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality

1 code implementation20 May 2022 Xiang Li, Wenhai Wang, Lingfeng Yang, Jian Yang

Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an elegant asymmetric encoder-decoder design, which significantly optimizes both the pre-training efficiency and fine-tuning accuracy.

Object Detection

Hybrid Cloud-Edge Collaborative Data Anomaly Detection in Industrial Sensor Networks

no code implementations21 Apr 2022 Tao Yang, Jinming Wang, Weijie Hao, Qiang Yang, Wenhai Wang

The sensor data detection model based on Gaussian and Bayesian algorithms can detect the anomalous sensor data in real-time and upload them to the cloud for further analysis, filtering the normal sensor data and reducing traffic load.

Anomaly Detection

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

2 code implementations31 Mar 2022 Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai

In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries.

3D Object Detection Autonomous Driving

WegFormer: Transformers for Weakly Supervised Semantic Segmentation

no code implementations16 Mar 2022 Chunmeng Liu, Enze Xie, Wenjia Wang, Wenhai Wang, Guangyao Li, Ping Luo

Although convolutional neural networks (CNNs) have achieved remarkable progress in weakly supervised semantic segmentation (WSSS), the effective receptive field of CNN is insufficient to capture global context information, leading to sub-optimal results.

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

2 code implementations3 Nov 2021 Zhe Chen, Jiahao Wang, Wenhai Wang, Guo Chen, Enze Xie, Ping Luo, Tong Lu

We propose an accurate and efficient scene text detection framework, termed FAST (i. e., faster arbitrarily-shaped text detector).

Image Classification Scene Text Detection

An empirical evaluation of attention-based multi-head models for improved turbofan engine remaining useful life prediction

1 code implementation4 Sep 2021 Abiodun Ayodeji, Wenhai Wang, Jianzhong Su, Jianquan Yuan, Xinggao Liu

The results presented in this study demonstrate the importance of multi-head models and attention mechanisms to an improved understanding of the remaining useful life of industrial assets.

Time Series Analysis

Learning Class-level Prototypes for Few-shot Learning

no code implementations25 Aug 2021 Minglei Yuan, Wenhai Wang, Tao Wang, Chunhao Cai, Qian Xu, Tong Lu

Few-shot learning aims to recognize new categories using very few labeled samples.

Few-Shot Learning

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

2 code implementations16 Aug 2021 Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, Ling Shao

Unlike existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations.

Medical Image Segmentation

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

14 code implementations NeurIPS 2021 Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo

We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders.

Semantic Segmentation Thermal Image Segmentation

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond

1 code implementation5 May 2021 Enze Xie, Wenhai Wang, Mingyu Ding, Ruimao Zhang, Ping Luo

Extensive experiments demonstrate the effectiveness of both PolarMask and PolarMask++, which achieve competitive results on instance segmentation in the challenging COCO dataset with single-model and single-scale training and testing, as well as new state-of-the-art results on rotate text detection and cell segmentation.

Ranked #78 on Instance Segmentation on COCO test-dev (using extra training data)

Cell Segmentation Instance Segmentation +3

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

1 code implementation2 May 2021 Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen

By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.

Scene Text Detection Text Spotting

An Introduction of mini-AlphaStar

1 code implementation14 Apr 2021 Ruo-Ze Liu, Wenhai Wang, Yanjie Shen, Zhiqi Li, Yang Yu, Tong Lu

StarCraft II (SC2) is a real-time strategy game in which players produce and control multiple units to fight against opponent's units.

Starcraft Starcraft II

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

1 code implementation22 Mar 2021 Zhe Chen, Wenhai Wang, Enze Xie, Tong Lu, Ping Luo

(1) We divide input image into small patches and adopt TIN, successfully transferring image style with arbitrary high-resolution.

Style Transfer

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

9 code implementations ICCV 2021 Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

Unlike the recently-proposed Transformer model (e. g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks.

Image Classification Instance Segmentation +3

DetCo: Unsupervised Contrastive Learning for Object Detection

2 code implementations ICCV 2021 Enze Xie, Jian Ding, Wenhai Wang, Xiaohang Zhan, Hang Xu, Peize Sun, Zhenguo Li, Ping Luo

Unlike most recent methods that focused on improving accuracy of image classification, we present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches to learn discriminative representations for object detection.

Contrastive Learning Image Classification +1

Segmenting Transparent Object in the Wild with Transformer

2 code implementations21 Jan 2021 Enze Xie, Wenjia Wang, Wenhai Wang, Peize Sun, Hang Xu, Ding Liang, Ping Luo

This work presents a new fine-grained transparent object segmentation dataset, termed Trans10K-v2, extending Trans10K-v1, the first large-scale transparent object segmentation dataset.

Semantic Segmentation Transparent objects

Polygon-free: Unconstrained Scene Text Detection with Box Annotations

1 code implementation26 Nov 2020 Weijia Wu, Enze Xie, Ruimao Zhang, Wenhai Wang, Hong Zhou, Ping Luo

For example, without using polygon annotations, PSENet achieves an 80. 5% F-score on TotalText [3] (vs. 80. 9% of fully supervised counterpart), 31. 1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs.

Scene Text Detection

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

2 code implementations ECCV 2020 Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo

Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.

Language Modelling Text Spotting

Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

no code implementations ECCV 2020 Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, Ping Luo

The modules of HGG can be trained end-to-end with the keypoint detection network and is able to supervise the grouping process in a hierarchical manner.

2D Human Pose Estimation Graph Clustering +3

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

7 code implementations NeurIPS 2020 Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang

Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations.

Dense Object Detection General Classification

Segmenting Transparent Objects in the Wild

1 code implementation ECCV 2020 Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo

To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10, 428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.

Semantic Segmentation Transparent objects

False Data Injection Attacks and the Distributed Countermeasure in DC Microgrids

no code implementations7 Jan 2020 Mengxiang Liu, Peng Cheng, Chengcheng Zhao, Ruilong Deng, Wenhai Wang, Jiming Chen

In this paper, we consider a hierarchical control based DC microgrid (DCmG) equipped with unknown input observer (UIO) based detectors, where the potential false data injection (FDI) attacks and the distributed countermeasure are investigated.

PolarMask: Single Shot Instance Segmentation with Polar Representation

2 code implementations CVPR 2020 Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo

In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used as a mask prediction module for instance segmentation, by easily embedding it into most off-the-shelf detection methods.

Instance Segmentation Object Detection +2

TextSR: Content-Aware Text Super-Resolution Guided by Recognition

1 code implementation16 Sep 2019 Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo

Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images.

Scene Text Recognition Super-Resolution

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

6 code implementations ICCV 2019 Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen

Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications. In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing.

Scene Text Detection

Shape Robust Text Detection with Progressive Scale Expansion Network

14 code implementations CVPR 2019 Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao

Due to the fact that there are large geometrical margins among the minimal scale kernels, our method is effective to split the close text instances, making it easier to use segmentation-based methods to detect arbitrary-shaped text instances.

Optical Character Recognition (OCR) Scene Text Detection

Selective Kernel Networks

16 code implementations CVPR 2019 Xiang Li, Wenhai Wang, Xiaolin Hu, Jian Yang

A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches.

Ranked #95 on Image Classification on CIFAR-100 (using extra training data)

Image Classification

Shape Robust Text Detection with Progressive Scale Expansion Network

10 code implementations7 Jun 2018 Xiang Li, Wenhai Wang, Wenbo Hou, Ruo-Ze Liu, Tong Lu, Jian Yang

To address these problems, we propose a novel Progressive Scale Expansion Network (PSENet), designed as a segmentation-based detector with multiple predictions for each text instance.

Curved Text Detection

Mixed Link Networks

1 code implementation6 Feb 2018 Wenhai Wang, Xiang Li, Jian Yang, Tong Lu

Basing on the analysis by revealing the equivalence of modern networks, we find that both ResNet and DenseNet are essentially derived from the same "dense topology", yet they only differ in the form of connection -- addition (dubbed "inner link") vs. concatenation (dubbed "outer link").

Representation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.