1 code implementation • 24 May 2023 • Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo
In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities.
2 code implementations • 18 May 2023 • Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie zhou, Yu Qiao, Jifeng Dai
We hope this model can set a new baseline for generalist vision and language models.
no code implementations • 18 May 2023 • Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Wenhai Wang, Shouling Ji
Most previous approaches rely on the sentence-level retrieval and combination paradigm (retrieval of similar code snippets and use of the corresponding code and summary pairs) on the encoder side.
1 code implementation • 10 May 2023 • Kunchang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, LiMin Wang, Yu Qiao
In this study, we initiate an exploration into video understanding by introducing VideoChat, an end-to-end chat-centric video understanding system.
2 code implementations • 9 May 2023 • Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, LiMin Wang, Ping Luo, Jifeng Dai, Yu Qiao
Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, the proposed iGPT significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios where the number of objects is greater than 2.
1 code implementation • 23 Mar 2023 • Xiang Li, Ge Wu, Lingfeng Yang, Wenhai Wang, RenJie Song, Jian Yang
The various types of elements, deposited in the training history, are a large amount of wealth for improving learning deep models.
1 code implementation • 22 Jan 2023 • Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu
In this report, we present our champion solution to the WSDM2023 Toloka Visual Question Answering (VQA) Challenge.
1 code implementation • CVPR 2023 • Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li
Oriented at this, we revisit the key components within perception and prediction, and prioritize the tasks such that all these tasks contribute to planning.
1 code implementation • 3 Dec 2022 • Jintao Lin, Zhaoyang Liu, Wenhai Wang, Wayne Wu, LiMin Wang
Our VLG is first pre-trained on video and language datasets to learn a shared feature space, and then devises a flexible bi-modal attention head to collaborate high-level semantic concepts under different settings.
2 code implementations • CVPR 2023 • Hao Li, Jinguo Zhu, Xiaohu Jiang, Xizhou Zhu, Hongsheng Li, Chun Yuan, Xiaohua Wang, Yu Qiao, Xiaogang Wang, Wenhai Wang, Jifeng Dai
In this paper, we propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-language tasks with competitive performance.
2 code implementations • CVPR 2023 • Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao
Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state.
Ranked #1 on Instance Segmentation on COCO minival (using extra training data)
1 code implementation • 10 Nov 2022 • Jifeng Dai, Min Shi, Weiyun Wang, Sitong Wu, Linjie Xing, Wenhai Wang, Xizhou Zhu, Lewei Lu, Jie zhou, Xiaogang Wang, Yu Qiao, Xiaowei Hu
Although the novel feature transformation designs are often claimed as the source of gain, some backbones may benefit from advanced engineering techniques, which makes it hard to identify the real gain from the key feature transformation operators.
2 code implementations • 23 Sep 2022 • Ruo-Ze Liu, Zhen-Jia Pang, Zhou-Yu Meng, Wenhai Wang, Yang Yu, Tong Lu
In this work, we investigate a set of RL techniques for the full-length game of StarCraft II.
1 code implementation • 12 Sep 2022 • Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Enze Xie, Zhiqi Li, Hanming Deng, Hao Tian, Xizhou Zhu, Li Chen, Yulu Gao, Xiangwei Geng, Jia Zeng, Yang Li, Jiazhi Yang, Xiaosong Jia, Bohan Yu, Yu Qiao, Dahua Lin, Si Liu, Junchi Yan, Jianping Shi, Ping Luo
As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.
no code implementations • 26 Jul 2022 • Guangchen Shi, Yirui Wu, Jun Liu, Shaohua Wan, Wenhai Wang, Tong Lu
Second, to resist overfitting issues caused by few training samples, a hyper-class embedding is learned by clustering all category embeddings for initialization and aligned with category embedding of the new class for enhancement, where learned knowledge assists to learn new knowledge, thus alleviating performance dependence on training data scale.
1 code implementation • 9 Jun 2022 • Jinguo Zhu, Xizhou Zhu, Wenhai Wang, Xiaohua Wang, Hongsheng Li, Xiaogang Wang, Jifeng Dai
To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models.
1 code implementation • 20 May 2022 • Xiang Li, Wenhai Wang, Lingfeng Yang, Jian Yang
Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an elegant asymmetric encoder-decoder design, which significantly optimizes both the pre-training efficiency and fine-tuning accuracy.
Ranked #33 on Object Detection on COCO minival
1 code implementation • 17 May 2022 • Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao
This work investigates a simple yet powerful dense prediction task adapter for Vision Transformer (ViT).
Ranked #1 on Semantic Segmentation on COCO-Stuff full
no code implementations • 21 Apr 2022 • Tao Yang, Jinming Wang, Weijie Hao, Qiang Yang, Wenhai Wang
The sensor data detection model based on Gaussian and Bayesian algorithms can detect the anomalous sensor data in real-time and upload them to the cloud for further analysis, filtering the normal sensor data and reducing traffic load.
2 code implementations • 31 Mar 2022 • Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai
In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries.
Ranked #6 on 3D Object Detection on DAIR-V2X-I
no code implementations • 16 Mar 2022 • Chunmeng Liu, Enze Xie, Wenjia Wang, Wenhai Wang, Guangyao Li, Ping Luo
Although convolutional neural networks (CNNs) have achieved remarkable progress in weakly supervised semantic segmentation (WSSS), the effective receptive field of CNN is insufficient to capture global context information, leading to sub-optimal results.
Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation
1 code implementation • 26 Nov 2021 • Changyao Tian, Wenhai Wang, Xizhou Zhu, Jifeng Dai, Yu Qiao
Deep learning-based models encounter challenges when processing long-tailed data in the real world.
Ranked #2 on Long-tail Learning on Places-LT (using extra training data)
2 code implementations • 3 Nov 2021 • Zhe Chen, Jiahao Wang, Wenhai Wang, Guo Chen, Enze Xie, Ping Luo, Tong Lu
We propose an accurate and efficient scene text detection framework, termed FAST (i. e., faster arbitrarily-shaped text detector).
Ranked #1 on Scene Text Detection on MSRA-TD500
no code implementations • 20 Oct 2021 • Humen Zhong, Jun Tang, Wenhai Wang, Zhibo Yang, Cong Yao, Tong Lu
Recent approaches for end-to-end text spotting have achieved promising results.
1 code implementation • CVPR 2022 • Zhiqi Li, Wenhai Wang, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo, Tong Lu
Specifically, we supervise the attention modules in the mask decoder in a layer-wise manner.
Ranked #4 on Panoptic Segmentation on COCO test-dev
1 code implementation • 4 Sep 2021 • Abiodun Ayodeji, Wenhai Wang, Jianzhong Su, Jianquan Yuan, Xinggao Liu
The results presented in this study demonstrate the importance of multi-head models and attention mechanisms to an improved understanding of the remaining useful life of industrial assets.
no code implementations • 25 Aug 2021 • Minglei Yuan, Wenhai Wang, Tao Wang, Chunhao Cai, Qian Xu, Tong Lu
Few-shot learning aims to recognize new categories using very few labeled samples.
2 code implementations • 16 Aug 2021 • Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, Ling Shao
Unlike existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations.
Ranked #6 on Medical Image Segmentation on CVC-ColonDB
15 code implementations • 25 Jun 2021 • Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
We hope this work will facilitate state-of-the-art Transformer researches in computer vision.
Ranked #70 on Object Detection on COCO minival
14 code implementations • NeurIPS 2021 • Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo
We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders.
Ranked #2 on Semantic Segmentation on SELMA
1 code implementation • 5 May 2021 • Enze Xie, Wenhai Wang, Mingyu Ding, Ruimao Zhang, Ping Luo
Extensive experiments demonstrate the effectiveness of both PolarMask and PolarMask++, which achieve competitive results on instance segmentation in the challenging COCO dataset with single-model and single-scale training and testing, as well as new state-of-the-art results on rotate text detection and cell segmentation.
Ranked #78 on Instance Segmentation on COCO test-dev (using extra training data)
1 code implementation • 2 May 2021 • Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen
By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.
1 code implementation • 14 Apr 2021 • Ruo-Ze Liu, Wenhai Wang, Yanjie Shen, Zhiqi Li, Yang Yu, Tong Lu
StarCraft II (SC2) is a real-time strategy game in which players produce and control multiple units to fight against opponent's units.
1 code implementation • 22 Mar 2021 • Zhe Chen, Wenhai Wang, Enze Xie, Tong Lu, Ping Luo
(1) We divide input image into small patches and adopt TIN, successfully transferring image style with arbitrary high-resolution.
9 code implementations • ICCV 2021 • Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
Unlike the recently-proposed Transformer model (e. g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks.
Ranked #5 on Semantic Segmentation on SynPASS
2 code implementations • ICCV 2021 • Enze Xie, Jian Ding, Wenhai Wang, Xiaohang Zhan, Hang Xu, Peize Sun, Zhenguo Li, Ping Luo
Unlike most recent methods that focused on improving accuracy of image classification, we present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches to learn discriminative representations for object detection.
2 code implementations • 21 Jan 2021 • Enze Xie, Wenjia Wang, Wenhai Wang, Peize Sun, Hang Xu, Ding Liang, Ping Luo
This work presents a new fine-grained transparent object segmentation dataset, termed Trans10K-v2, extending Trans10K-v1, the first large-scale transparent object segmentation dataset.
Ranked #3 on Semantic Segmentation on Trans10K
1 code implementation • 26 Nov 2020 • Weijia Wu, Enze Xie, Ruimao Zhang, Wenhai Wang, Hong Zhou, Ping Luo
For example, without using polygon annotations, PSENet achieves an 80. 5% F-score on TotalText  (vs. 80. 9% of fully supervised counterpart), 31. 1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs.
5 code implementations • CVPR 2021 • Xiang Li, Wenhai Wang, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang
Such a property makes the distribution statistics of a bounding box highly correlated to its real localization quality.
Ranked #57 on Object Detection on COCO test-dev
2 code implementations • ECCV 2020 • Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo
Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.
no code implementations • ECCV 2020 • Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, Ping Luo
The modules of HGG can be trained end-to-end with the keypoint detection network and is able to supervise the grouping process in a hierarchical manner.
Ranked #3 on Keypoint Detection on OCHuman
7 code implementations • NeurIPS 2020 • Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang
Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations.
Ranked #106 on Object Detection on COCO test-dev
3 code implementations • ECCV 2020 • Wenjia Wang, Enze Xie, Xuebo Liu, Wenhai Wang, Ding Liang, Chunhua Shen, Xiang Bai
For example, it outperforms LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN.
1 code implementation • ECCV 2020 • Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo
To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10, 428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.
Ranked #4 on Semantic Segmentation on Trans10K
no code implementations • 7 Jan 2020 • Mengxiang Liu, Peng Cheng, Chengcheng Zhao, Ruilong Deng, Wenhai Wang, Jiming Chen
In this paper, we consider a hierarchical control based DC microgrid (DCmG) equipped with unknown input observer (UIO) based detectors, where the potential false data injection (FDI) attacks and the distributed countermeasure are investigated.
2 code implementations • CVPR 2020 • Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo
In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used as a mask prediction module for instance segmentation, by easily embedding it into most off-the-shelf detection methods.
Ranked #97 on Instance Segmentation on COCO test-dev
1 code implementation • 16 Sep 2019 • Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo
Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images.
6 code implementations • ICCV 2019 • Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen
Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications. In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing.
Ranked #6 on Scene Text Detection on SCUT-CTW1500
14 code implementations • CVPR 2019 • Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao
Due to the fact that there are large geometrical margins among the minimal scale kernels, our method is effective to split the close text instances, making it easier to use segmentation-based methods to detect arbitrary-shaped text instances.
Ranked #10 on Scene Text Detection on SCUT-CTW1500
16 code implementations • CVPR 2019 • Xiang Li, Wenhai Wang, Xiaolin Hu, Jian Yang
A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches.
Ranked #95 on Image Classification on CIFAR-100 (using extra training data)
10 code implementations • 7 Jun 2018 • Xiang Li, Wenhai Wang, Wenbo Hou, Ruo-Ze Liu, Tong Lu, Jian Yang
To address these problems, we propose a novel Progressive Scale Expansion Network (PSENet), designed as a segmentation-based detector with multiple predictions for each text instance.
Ranked #12 on Scene Text Detection on ICDAR 2017 MLT
1 code implementation • 6 Feb 2018 • Wenhai Wang, Xiang Li, Jian Yang, Tong Lu
Basing on the analysis by revealing the equivalence of modern networks, we find that both ResNet and DenseNet are essentially derived from the same "dense topology", yet they only differ in the form of connection -- addition (dubbed "inner link") vs. concatenation (dubbed "outer link").