Search Results for author: Wenhai Wang

Found 74 papers, 62 papers with code

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

2 code implementations • 9 Apr 2024 • Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang

The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution.

Ranked #11 on Visual Question Answering on MM-Vet

4k Language Modelling +1

1,636

Paper
Code

Does Knowledge Graph Really Matter for Recommender Systems?

1 code implementation • 4 Apr 2024 • Haonan Zhang, Dongxia Wang, Zhu Sun, Yanhui Li, Youcheng Sun, HuiZhi Liang, Wenhai Wang

We consider the scenarios where knowledge in a KG gets completely removed, randomly distorted and decreased, and also where recommendations are for cold-start users.

Knowledge Graphs Recommendation Systems

Paper
Code

Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments

1 code implementation • 20 Mar 2024 • Yang Yang, Wenhai Wang, Zhe Chen, Jifeng Dai, Liang Zheng

However, in the real-world where test ground truths are not provided, it is non-trivial to find out whether bounding boxes are accurate, thus preventing us from assessing the detector generalization ability.

object-detection Object Detection +1

Paper
Code

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

1 code implementation • 4 Mar 2024 • Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, Wenhai Wang

Our evaluations demonstrate that VRWKV surpasses ViT's performance in image classification and has significantly faster speeds and lower memory usage processing high-resolution inputs.

Image Classification

214

Paper
Code

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

1 code implementation • 29 Feb 2024 • Weiyun Wang, Yiming Ren, Haowen Luo, Tiantong Li, Chenxiang Yan, Zhe Chen, Wenhai Wang, Qingyun Li, Lewei Lu, Xizhou Zhu, Yu Qiao, Jifeng Dai

In addition, we design a new benchmark, termed Circular-based Relation Probing Evaluation (CRPE) for comprehensively evaluating the relation comprehension capabilities of MLLMs.

Ranked #30 on Visual Question Answering on MM-Vet

Hallucination Object Localization +3

373

Paper
Code

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

no code implementations • 25 Feb 2024 • Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao, Mingyu Ding, Ping Luo

Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI.

Ranked #72 on Visual Question Answering on MM-Vet

Code Generation Multimodal Reasoning +1

Paper
Add Code

FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution

no code implementations • 6 Feb 2024 • Qi Zhou, Dongxia Wang, Tianlin Li, Zhihong Xu, Yang Liu, Kui Ren, Wenhai Wang, Qing Guo

To expose this potential vulnerability, we aim to build an adversarial attack forcing SDEdit to generate a specific data distribution aligned with a specified attribute (e. g., female), without changing the input's attribute characteristics.

Adversarial Attack Attribute +1

Paper
Add Code

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

1 code implementation • 18 Jan 2024 • Changyao Tian, Xizhou Zhu, Yuwen Xiong, Weiyun Wang, Zhe Chen, Wenhai Wang, Yuntao Chen, Lewei Lu, Tong Lu, Jie zhou, Hongsheng Li, Yu Qiao, Jifeng Dai

Developing generative models for interleaved image-text data has both research and practical value.

157

Paper
Code

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

1 code implementation • 11 Jan 2024 • Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li, Yu Qiao, Lewei Lu, Jie zhou, Jifeng Dai

The advancements in speed and efficiency of DCNv4, combined with its robust performance across diverse vision tasks, show its potential as a foundational building block for future vision models.

Image Classification Image Generation +1

326

Paper
Code

Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion

2 code implementations • 8 Jan 2024 • Minglong Xue, Jinhong He, Wenhai Wang, Mingliang Zhou

Moreover, to further promote the effective recovery of the image details, we combine the Fourier transform based on the wavelet transform and construct a Hybrid High Frequency Perception Module (HFPM) with a significant perception of the detailed features.

Low-Light Image Enhancement

Paper
Code

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

2 code implementations • 21 Dec 2023 • Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai

However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs.

Ranked #1 on Zero-Shot Video Retrieval on MSR-VTT-full (using extra training data)

Image Retrieval Image-to-Text Retrieval +10

844

Paper
Code

A Survey of Reasoning with Foundation Models

1 code implementation • 17 Dec 2023 • Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, Zhenguo Li

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation.

Medical Diagnosis

344

Paper
Code

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

1 code implementation • 14 Dec 2023 • Wenhai Wang, Jiangwei Xie, Chuanyang Hu, Haoming Zou, Jianan Fan, Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, Hao Tian, Lewei Lu, Xizhou Zhu, Xiaogang Wang, Yu Qiao, Jifeng Dai

In this work, we delve into the potential of large language models (LLMs) in autonomous driving (AD).

Autonomous Driving Motion Planning

119

Paper
Code

How ChatGPT is Solving Vulnerability Management Problem

no code implementations • 11 Nov 2023 • Peiyu Liu, Junming Liu, Lirong Fu, Kangjie Lu, Yifan Xia, Xuhong Zhang, Wenzhi Chen, Haiqin Weng, Shouling Ji, Wenhai Wang

Prior works show that ChatGPT has the capabilities of processing foundational code analysis tasks, such as abstract syntax tree generation, which indicates the potential of using ChatGPT to comprehend code syntax and static behaviors.

Management

Paper
Add Code

ControlLLM: Augment Language Models with Tools by Searching on Graphs

1 code implementation • 26 Oct 2023 • Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Ziheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang

We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks.

Scheduling

161

Paper
Code

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

1 code implementation • NeurIPS 2023 • Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li

Current research is primarily dedicated to advancing the accuracy of camera-only 3D object detectors (apprentice) through the knowledge transferred from LiDAR- or multi-modal-based counterparts (expert).

Ranked #6 on 3D Object Detection on nuScenes Camera Only

3D Object Detection object-detection

1,070

Paper
Code

CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code

1 code implementation • 24 Oct 2023 • Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Shouling Ji, Wenhai Wang

Automatically generating function summaries for binaries is an extremely valuable but challenging task, since it involves translating the execution behavior and semantics of the low-level language (assembly code) into human-readable natural language.

Code Summarization

Paper
Code

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

1 code implementation • 11 Oct 2023 • Zeqiang Lai, Xizhou Zhu, Jifeng Dai, Yu Qiao, Wenhai Wang

The revolution of artificial intelligence content generation has been rapidly accelerated with the booming text-to-image (T2I) diffusion models.

Code Generation Image Generation +2

287

Paper
Code

FB-BEV: BEV Representation from Forward-Backward View Transformations

1 code implementation • ICCV 2023 • Zhiqi Li, Zhiding Yu, Wenhai Wang, Anima Anandkumar, Tong Lu, Jose M. Alvarez

Currently, the two most prominent VTM paradigms are forward projection and backward projection.

542

Paper
Code

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

1 code implementation • 3 Aug 2023 • Weiyun Wang, Min Shi, Qingyun Li, Wenhai Wang, Zhenhang Huang, Linjie Xing, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai, Yu Qiao

We present the All-Seeing (AS) project: a large-scale data and model for recognizing and understanding everything in the open world.

Question Answering Retrieval +1

373

Paper
Code

AVSegFormer: Audio-Visual Segmentation with Transformer

1 code implementation • 3 Jul 2023 • Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu

In this paper, we propose AVSegFormer, a novel framework for AVS tasks that leverages the transformer architecture.

Scene Understanding Segmentation

Paper
Code

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling

no code implementations • 2 Jun 2023 • Zeqiang Lai, Yuchen Duan, Jifeng Dai, Ziheng Li, Ying Fu, Hongsheng Li, Yu Qiao, Wenhai Wang

In this paper, we propose to ameliorate the semantic segmentation quality of existing discriminative approaches with a mask prior modeled by a recently-developed denoising diffusion generative model.

Denoising Segmentation +1

Paper
Add Code

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

no code implementations • NeurIPS 2023 • Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo

In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities.

Image Captioning Language Modelling +3

Paper
Add Code

Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization

1 code implementation • 18 May 2023 • Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Shouling Ji, Wenhai Wang

In this paper, we propose a fine-grained Token-level retrieval-augmented mechanism (Tram) on the decoder side rather than the encoder side to enhance the performance of neural models and produce more low-frequency tokens in generating summaries.

Code Summarization Retrieval +2

Paper
Code

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

2 code implementations • NeurIPS 2023 • Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie zhou, Yu Qiao, Jifeng Dai

We hope this model can set a new baseline for generalist vision and language models.

Language Modelling Large Language Model

3,121

Paper
Code

VideoChat: Chat-Centric Video Understanding

1 code implementation • 10 May 2023 • Kunchang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, LiMin Wang, Yu Qiao

In this paper, we initiate an attempt of developing an end-to-end chat-centric video understanding system, coined as VideoChat.

Ranked #6 on Video Question Answering on MVBench

Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +5

2,667

Paper
Code

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

2 code implementations • 9 May 2023 • Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Zeqiang Lai, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, LiMin Wang, Ping Luo, Jifeng Dai, Yu Qiao

Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, the proposed iGPT significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios where the number of objects is greater than 2.

Language Modelling

3,121

Paper
Code

A Survey of Historical Learning: Learning Models with Learning History

1 code implementation • 23 Mar 2023 • Xiang Li, Ge Wu, Lingfeng Yang, Wenhai Wang, RenJie Song, Jian Yang

The various types of elements, deposited in the training history, are a large amount of wealth for improving learning deep models.

Ensemble Learning

Paper
Code

Champion Solution for the WSDM2023 Toloka VQA Challenge

1 code implementation • 22 Jan 2023 • Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu

In this report, we present our champion solution to the WSDM2023 Toloka Visual Question Answering (VQA) Challenge.

Question Answering Visual Grounding +1

1,118

Paper
Code

Planning-oriented Autonomous Driving

1 code implementation • CVPR 2023 • Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li

Oriented at this, we revisit the key components within perception and prediction, and prioritize the tasks such that all these tasks contribute to planning.

Autonomous Driving Philosophy

2,816

Paper
Code

VLG: General Video Recognition with Web Textual Knowledge

1 code implementation • 3 Dec 2022 • Jintao Lin, Zhaoyang Liu, Wenhai Wang, Wayne Wu, LiMin Wang

Our VLG is first pre-trained on video and language datasets to learn a shared feature space, and then devises a flexible bi-modal attention head to collaborate high-level semantic concepts under different settings.

Video Recognition

Paper
Code

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

2 code implementations • CVPR 2023 • Hao Li, Jinguo Zhu, Xiaohu Jiang, Xizhou Zhu, Hongsheng Li, Chun Yuan, Xiaohua Wang, Yu Qiao, Xiaogang Wang, Wenhai Wang, Jifeng Dai

In this paper, we propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-language tasks with competitive performance.

Language Modelling Multi-Task Learning

2,310

Paper
Code

Demystify Transformers & Convolutions in Modern Image Deep Networks

1 code implementation • 10 Nov 2022 • Xiaowei Hu, Min Shi, Weiyun Wang, Sitong Wu, Linjie Xing, Wenhai Wang, Xizhou Zhu, Lewei Lu, Jie zhou, Xiaogang Wang, Yu Qiao, Jifeng Dai

Our experiments on various tasks and an analysis of inductive bias show a significant performance boost due to advanced network-level and block-level designs, but performance differences persist among different STMs.

Image Deep Networks Spatial Token Mixer

Paper
Code

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

2 code implementations • CVPR 2023 • Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state.

Ranked #1 on Instance Segmentation on COCO test-dev (AP50 metric, using extra training data)

Classification Image Classification +3

2,310

Paper
Code

On Efficient Reinforcement Learning for Full-length Game of StarCraft II

2 code implementations • 23 Sep 2022 • Ruo-Ze Liu, Zhen-Jia Pang, Zhou-Yu Meng, Wenhai Wang, Yang Yu, Tong Lu

In this work, we investigate a set of RL techniques for the full-length game of StarCraft II.

reinforcement-learning Reinforcement Learning (RL) +3

291

Paper
Code

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

2 code implementations • 12 Sep 2022 • Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Jia Zeng, Zhiqi Li, Jiazhi Yang, Hanming Deng, Hao Tian, Enze Xie, Jiangwei Xie, Li Chen, Tianyu Li, Yang Li, Yulu Gao, Xiaosong Jia, Si Liu, Jianping Shi, Dahua Lin, Yu Qiao

As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.

Autonomous Driving

2,870

Paper
Code

Incremental Few-Shot Semantic Segmentation via Embedding Adaptive-Update and Hyper-class Representation

no code implementations • 26 Jul 2022 • Guangchen Shi, Yirui Wu, Jun Liu, Shaohua Wan, Wenhai Wang, Tong Lu

Second, to resist overfitting issues caused by few training samples, a hyper-class embedding is learned by clustering all category embeddings for initialization and aligned with category embedding of the new class for enhancement, where learned knowledge assists to learn new knowledge, thus alleviating performance dependence on training data scale.

Few-Shot Semantic Segmentation Segmentation +1

Paper
Add Code

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

1 code implementation • 9 Jun 2022 • Jinguo Zhu, Xizhou Zhu, Wenhai Wang, Xiaohua Wang, Hongsheng Li, Xiaogang Wang, Jifeng Dai

To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models.

Image Captioning Image Classification +6

250

Paper
Code

Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality

1 code implementation • 20 May 2022 • Xiang Li, Wenhai Wang, Lingfeng Yang, Jian Yang

Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an elegant asymmetric encoder-decoder design, which significantly optimizes both the pre-training efficiency and fine-tuning accuracy.

Ranked #37 on Object Detection on COCO minival

Object Detection

231

Paper
Code

Vision Transformer Adapter for Dense Predictions

1 code implementation • 17 May 2022 • Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao

This work investigates a simple yet powerful dense prediction task adapter for Vision Transformer (ViT).

Ranked #4 on Semantic Segmentation on PASCAL Context

Instance Segmentation Panoptic Segmentation +1

1,118

Paper
Code

Hybrid Cloud-Edge Collaborative Data Anomaly Detection in Industrial Sensor Networks

no code implementations • 21 Apr 2022 • Tao Yang, Jinming Wang, Weijie Hao, Qiang Yang, Wenhai Wang

The sensor data detection model based on Gaussian and Bayesian algorithms can detect the anomalous sensor data in real-time and upload them to the cloud for further analysis, filtering the normal sensor data and reducing traffic load.

Anomaly Detection

Paper
Add Code

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

3 code implementations • 31 Mar 2022 • Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai

In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries.

Ranked #2 on Bird's-Eye View Semantic Segmentation on Lyft Level 5

3D Object Detection Autonomous Driving +2

2,870

Paper
Code

WegFormer: Transformers for Weakly Supervised Semantic Segmentation

no code implementations • 16 Mar 2022 • Chunmeng Liu, Enze Xie, Wenjia Wang, Wenhai Wang, Guangyao Li, Ping Luo

Although convolutional neural networks (CNNs) have achieved remarkable progress in weakly supervised semantic segmentation (WSSS), the effective receptive field of CNN is insufficient to capture global context information, leading to sub-optimal results.

Segmentation Weakly supervised Semantic Segmentation +1

Paper
Add Code

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

1 code implementation • 26 Nov 2021 • Changyao Tian, Wenhai Wang, Xizhou Zhu, Jifeng Dai, Yu Qiao

Deep learning-based models encounter challenges when processing long-tailed data in the real world.

Ranked #2 on Long-tail Learning on iNaturalist 2018 (using extra training data)

Image Classification Long-tail Learning +1

Paper
Code

FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

2 code implementations • 3 Nov 2021 • Zhe Chen, Jiahao Wang, Wenhai Wang, Guo Chen, Enze Xie, Ping Luo, Tong Lu

We propose an accurate and efficient scene text detection framework, termed FAST (i. e., faster arbitrarily-shaped text detector).

Ranked #2 on Scene Text Detection on MSRA-TD500

Image Classification Scene Text Detection +1

433

Paper
Code

ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter

no code implementations • 20 Oct 2021 • Humen Zhong, Jun Tang, Wenhai Wang, Zhibo Yang, Cong Yao, Tong Lu

Recent approaches for end-to-end text spotting have achieved promising results.

Text Detection Text Spotting

Paper
Add Code

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

1 code implementation • CVPR 2022 • Zhiqi Li, Wenhai Wang, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo, Tong Lu

Specifically, we supervise the attention modules in the mask decoder in a layer-wise manner.

Ranked #4 on Panoptic Segmentation on COCO test-dev

Instance Segmentation Panoptic Segmentation +1

196

Paper
Code

An empirical evaluation of attention-based multi-head models for improved turbofan engine remaining useful life prediction

1 code implementation • 4 Sep 2021 • Abiodun Ayodeji, Wenhai Wang, Jianzhong Su, Jianquan Yuan, Xinggao Liu

The results presented in this study demonstrate the importance of multi-head models and attention mechanisms to an improved understanding of the remaining useful life of industrial assets.

Time Series Analysis

Paper
Code

Learning Class-level Prototypes for Few-shot Learning

no code implementations • 25 Aug 2021 • Minglei Yuan, Wenhai Wang, Tao Wang, Chunhao Cai, Qian Xu, Tong Lu

Few-shot learning aims to recognize new categories using very few labeled samples.

Few-Shot Learning

Paper
Add Code

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

2 code implementations • 16 Aug 2021 • Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, Ling Shao

Unlike existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations.

Ranked #9 on Medical Image Segmentation on CVC-ColonDB

Medical Image Segmentation

1,646

Paper
Code

PVT v2: Improved Baselines with Pyramid Vision Transformer

16 code implementations • 25 Jun 2021 • Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

We hope this work will facilitate state-of-the-art Transformer researches in computer vision.

Ranked #23 on Object Detection on COCO-O

Image Classification Object Detection +1

29,758

Paper
Code

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

24 code implementations • NeurIPS 2021 • Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo

We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders.

Ranked #1 on Semantic Segmentation on COCO-Stuff full

C++ code Semantic Segmentation +1

124,984

Paper
Code

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond

1 code implementation • 5 May 2021 • Enze Xie, Wenhai Wang, Mingyu Ding, Ruimao Zhang, Ping Luo

Extensive experiments demonstrate the effectiveness of both PolarMask and PolarMask++, which achieve competitive results on instance segmentation in the challenging COCO dataset with single-model and single-scale training and testing, as well as new state-of-the-art results on rotate text detection and cell segmentation.

Ranked #81 on Instance Segmentation on COCO test-dev (using extra training data)

Cell Segmentation Instance Segmentation +5

869

Paper
Code

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

1 code implementation • 2 May 2021 • Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen

By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.

Scene Text Detection Text Detection +1

433

Paper
Code

An Introduction of mini-AlphaStar

1 code implementation • 14 Apr 2021 • Ruo-Ze Liu, Wenhai Wang, Yanjie Shen, Zhiqi Li, Yang Yu, Tong Lu

StarCraft II (SC2) is a real-time strategy game in which players produce and control multiple units to fight against opponent's units.

Starcraft Starcraft II

291

Paper
Code

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

1 code implementation • 22 Mar 2021 • Zhe Chen, Wenhai Wang, Enze Xie, Tong Lu, Ping Luo

(1) We divide input image into small patches and adopt TIN, successfully transferring image style with arbitrary high-resolution.

Style Transfer

176

Paper
Code

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

9 code implementations • ICCV 2021 • Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

Unlike the recently-proposed Transformer model (e. g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks.

Ranked #5 on Semantic Segmentation on SynPASS

Image Classification Instance Segmentation +3

27,790

Paper
Code

DetCo: Unsupervised Contrastive Learning for Object Detection

2 code implementations • ICCV 2021 • Enze Xie, Jian Ding, Wenhai Wang, Xiaohang Zhan, Hang Xu, Peize Sun, Zhenguo Li, Ping Luo

Unlike most recent methods that focused on improving accuracy of image classification, we present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches to learn discriminative representations for object detection.

Contrastive Learning Image Classification +2

264

Paper
Code

Segmenting Transparent Object in the Wild with Transformer

2 code implementations • 21 Jan 2021 • Enze Xie, Wenjia Wang, Wenhai Wang, Peize Sun, Hang Xu, Ding Liang, Ping Luo

This work presents a new fine-grained transparent object segmentation dataset, termed Trans10K-v2, extending Trans10K-v1, the first large-scale transparent object segmentation dataset.

Ranked #3 on Semantic Segmentation on Trans10K

Object Segmentation +2

1,185

Paper
Code

Polygon-free: Unconstrained Scene Text Detection with Box Annotations

1 code implementation • 26 Nov 2020 • Weijia Wu, Enze Xie, Ruimao Zhang, Wenhai Wang, Hong Zhou, Ping Luo

For example, without using polygon annotations, PSENet achieves an 80. 5% F-score on TotalText [3] (vs. 80. 9% of fully supervised counterpart), 31. 1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs.

Scene Text Detection Text Detection

Paper
Code

Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

5 code implementations • CVPR 2021 • Xiang Li, Wenhai Wang, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang

Such a property makes the distribution statistics of a bounding box highly correlated to its real localization quality.

Ranked #26 on Object Detection on COCO-O

Dense Object Detection object-detection

12,059

Paper
Code

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

2 code implementations • ECCV 2020 • Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo

Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.

Language Modelling Sentence +2

Paper
Code

Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

no code implementations • ECCV 2020 • Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, Ping Luo

The modules of HGG can be trained end-to-end with the keypoint detection network and is able to supervise the grouping process in a hierarchical manner.

Ranked #3 on Keypoint Detection on OCHuman

2D Human Pose Estimation Clustering +4

Paper
Add Code

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

7 code implementations • NeurIPS 2020 • Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang

Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations.

Ranked #93 on Object Detection on COCO test-dev

Dense Object Detection General Classification

27,790

Paper
Code

Scene Text Image Super-Resolution in the Wild

4 code implementations • ECCV 2020 • Wenjia Wang, Enze Xie, Xuebo Liu, Wenhai Wang, Ding Liang, Chunhua Shen, Xiang Bai

For example, it outperforms LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN.

Image Super-Resolution

415

Paper
Code

Segmenting Transparent Objects in the Wild

1 code implementation • ECCV 2020 • Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo

To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10, 428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.

Ranked #4 on Semantic Segmentation on Trans10K

Segmentation Semantic Segmentation +1

Paper
Code

False Data Injection Attacks and the Distributed Countermeasure in DC Microgrids

no code implementations • 7 Jan 2020 • Mengxiang Liu, Peng Cheng, Chengcheng Zhao, Ruilong Deng, Wenhai Wang, Jiming Chen

In this paper, we consider a hierarchical control based DC microgrid (DCmG) equipped with unknown input observer (UIO) based detectors, where the potential false data injection (FDI) attacks and the distributed countermeasure are investigated.

Paper
Add Code

PolarMask: Single Shot Instance Segmentation with Polar Representation

2 code implementations • CVPR 2020 • Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo

In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used as a mask prediction module for instance segmentation, by easily embedding it into most off-the-shelf detection methods.

Ranked #100 on Instance Segmentation on COCO test-dev

Distance regression Instance Segmentation +4

869

Paper
Code

TextSR: Content-Aware Text Super-Resolution Guided by Recognition

1 code implementation • 16 Sep 2019 • Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo

Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images.

Scene Text Recognition Super-Resolution

Paper
Code

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

6 code implementations • ICCV 2019 • Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen

Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications. In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing.

Ranked #8 on Scene Text Detection on SCUT-CTW1500

Scene Text Detection Segmentation +1

4,068

Paper
Code

Shape Robust Text Detection with Progressive Scale Expansion Network

19 code implementations • CVPR 2019 • Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao

Due to the fact that there are large geometrical margins among the minimal scale kernels, our method is effective to split the close text instances, making it easier to use segmentation-based methods to detect arbitrary-shaped text instances.

Ranked #12 on Scene Text Detection on SCUT-CTW1500

Optical Character Recognition (OCR) Scene Text Detection +1

38,458

Paper
Code

Selective Kernel Networks

20 code implementations • CVPR 2019 • Xiang Li, Wenhai Wang, Xiaolin Hu, Jian Yang

A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches.

Ranked #98 on Image Classification on CIFAR-100 (using extra training data)

Image Classification

29,758

Paper
Code

Shape Robust Text Detection with Progressive Scale Expansion Network

9 code implementations • 7 Jun 2018 • Xiang Li, Wenhai Wang, Wenbo Hou, Ruo-Ze Liu, Tong Lu, Jian Yang

To address these problems, we propose a novel Progressive Scale Expansion Network (PSENet), designed as a segmentation-based detector with multiple predictions for each text instance.

Ranked #12 on Scene Text Detection on ICDAR 2017 MLT

Curved Text Detection Text Detection

1,163

Paper
Code

Mixed Link Networks

1 code implementation • 6 Feb 2018 • Wenhai Wang, Xiang Li, Jian Yang, Tong Lu

Basing on the analysis by revealing the equivalence of modern networks, we find that both ResNet and DenseNet are essentially derived from the same "dense topology", yet they only differ in the form of connection -- addition (dubbed "inner link") vs. concatenation (dubbed "outer link").

Representation Learning

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.