no code implementations • 6 Jan 2025 • Jiawei Liu, Yuanzhi Zhu, Feiyu Gao, Zhibo Yang, Peng Wang, Junyang Lin, Xinggang Wang, Wenyu Liu
), the text in natural scene images needs to meet the following four key criteria: (1) Fidelity: the generated text should appear as realistic as a photograph and be completely accurate, with no errors in any of the strokes.
1 code implementation • 2 Jan 2025 • Jingfeng Yao, Xinggang Wang
The integrated system achieves state-of-the-art (SOTA) performance on ImageNet 256x256 generation with an FID score of 1. 35 while demonstrating remarkable training efficiency by reaching an FID score of 2. 11 in just 64 epochs--representing an over 21 times convergence speedup compared to the original DiT.
Ranked #2 on
Image Generation
on ImageNet 256x256
1 code implementation • 18 Dec 2024 • Ziyang Xu, Huangxuan Zhao, Wenyu Liu, Xinggang Wang
Extensive experiments demonstrate that GaraMoSt achieves the SOTA performance in accuracy, robustness, visual effects, and noise suppression, comprehensively surpassing MoSt-DSA and other natural scene VFI methods.
1 code implementation • 17 Dec 2024 • Haoyi Jiang, Liu Liu, Tianheng Cheng, Xinjie Wang, Tianwei Lin, Zhizhong Su, Wenyu Liu, Xinggang Wang
In this paper, we introduce GaussTR, a novel Gaussian Transformer that leverages alignment with foundation models to advance self-supervised 3D spatial understanding.
1 code implementation • 5 Dec 2024 • Yongkang Li, Tianheng Cheng, Wenyu Liu, Xinggang Wang
Mask-Adapter integrates seamlessly into open-vocabulary segmentation methods based on mask pooling in a plug-and-play manner, delivering more accurate classification results.
Ranked #1 on
Open Vocabulary Semantic Segmentation
on ADE20K-847
1 code implementation • 22 Nov 2024 • Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, Xinggang Wang
However, the numerous denoising steps in the robotic diffusion policy and the more dynamic, open-world nature of traffic scenes pose substantial challenges for generating diverse driving actions at a real-time speed.
1 code implementation • 29 Oct 2024 • Bo Jiang, Shaoyu Chen, Bencheng Liao, Xingyu Zhang, Wei Yin, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang
In contrast, Large Vision-Language Models (LVLMs) excel in scene understanding and reasoning.
1 code implementation • 14 Oct 2024 • Jingfeng Yao, Wang Cheng, Wenyu Liu, Xinggang Wang
(3) We develop a new supervision method that further accelerates the training process of DiT.
Ranked #29 on
Image Generation
on ImageNet 256x256
no code implementations • 9 Oct 2024 • Zeyu Zhang, Sixu Yan, Muzhi Han, Zaijin Wang, Xinggang Wang, Song-Chun Zhu, Hangxin Liu
We propose M^3Bench, a new benchmark of whole-body motion generation for mobile manipulation tasks.
no code implementations • 5 Oct 2024 • Dingwen Zhang, Liangbo Cheng, Yi Liu, Xinggang Wang, Junwei Han
These type-level mamba capsules are fed into the EM routing algorithm to get the high-layer mamba capsules, which greatly reduce the computation and parameters caused by the pixel-level capsule routing for part-whole relationships exploration.
1 code implementation • 3 Oct 2024 • Zongming Li, Tianheng Cheng, Shoufa Chen, Peize Sun, Haocheng Shen, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang
Firstly, we explore control encoding for AR models and propose a lightweight control encoder to transform spatial inputs (e. g., canny edges or depth maps) into control tokens.
1 code implementation • 21 Sep 2024 • Shuai Zhang, Guanjun Wu, Xinggang Wang, Bin Feng, Wenyu Liu
In this paper, we propose a novel representation that can reconstruct accurate meshes from sparse image input, named Dynamic 2D Gaussians (D-2DGS).
1 code implementation • 10 Aug 2024 • Bin Hu, Xinggang Wang, Wenyu Liu
To this end, this article introduces the recently emerged Masked Image Modeling (MIM) self-supervised learning method into person ReID, and effectively extracts high-quality global and local features through large-scale unsupervised pre-training by combining masked image modeling and discriminative contrastive learning, and then conducts supervised fine-tuning training in the person ReID task.
no code implementations • 25 Jul 2024 • Jiahao Guo, Ziyang Xu, Lianjun Wu, Fei Gao, Wenyu Liu, Xinggang Wang
Small Video Object Detection (SVOD) is a crucial subfield in modern computer vision, essential for early object discovery and detection.
1 code implementation • 25 Jul 2024 • Ziwei Cui, Jingfeng Yao, Lunbin Zeng, Juan Yang, Wenyu Liu, Xinggang Wang
We evaluate our method on the most challenging benchmark and achieve state-of-the-art results (0. 5080 mPQ) in cell nuclei instance segmentation with only 21. 6% FLOPs compared with the previous leading method.
Ranked #1 on
Panoptic Segmentation
on PanNuke
1 code implementation • 19 Jul 2024 • Yuanzhi Zhu, Jiawei Liu, Feiyu Gao, Wenyu Liu, Xinggang Wang, Peng Wang, Fei Huang, Cong Yao, Zhibo Yang
However, it is still challenging to render high-quality text images in real-world scenarios, as three critical criteria should be satisfied: (1) Fidelity: the generated text images should be photo-realistic and the contents are expected to be the same as specified in the given conditions; (2) Reasonability: the regions and contents of the generated text should cohere with the scene; (3) Utility: the generated text images can facilitate related tasks (e. g., text detection and recognition).
no code implementations • 17 Jul 2024 • Haijun Xiong, Bin Feng, Xinggang Wang, Wenyu Liu
Gait recognition is a biometric technology that distinguishes individuals by their walking patterns.
1 code implementation • 9 Jul 2024 • Ziyang Xu, Huangxuan Zhao, Ziwei Cui, Wenyu Liu, Chuansheng Zheng, Xinggang Wang
Artificial intelligence has become a crucial tool for medical image analysis.
no code implementations • 5 Jul 2024 • Shengxiang Ji, Guanjun Wu, Jiemin Fang, Jiazhong Cen, Taoran Yi, Wenyu Liu, Qi Tian, Xinggang Wang
However, there is a dearth of research focusing on segmentation within 4D representations.
1 code implementation • 4 Jul 2024 • Yiang Shi, Tianheng Cheng, Qian Zhang, Wenyu Liu, Xinggang Wang
Owing to the inherent flexibility of the point-based representation, OSP achieves strong performance compared with existing methods and excels in terms of training and inference adaptability.
1 code implementation • 28 Jun 2024 • Yuxuan Zhang, Tianheng Cheng, Rui Hu, Lei Liu, Heng Liu, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang
Surprisingly, we observe that: (1) multimodal prompts and (2) vision-language models with early fusion (e. g., BEIT-3) are beneficial for prompting SAM for accurate referring segmentation.
Ranked #3 on
Referring Expression Segmentation
on RefCOCO testA
no code implementations • 26 Jun 2024 • Taoran Yi, Jiemin Fang, Zanwei Zhou, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Xinggang Wang, Qi Tian
The main idea is to bind Gaussians to reasonable geometry, which evolves over the whole generation process.
no code implementations • 13 Jun 2024 • Zhengqi Zhao, Xiaohu Huang, Hao Zhou, Kun Yao, Errui Ding, Jingdong Wang, Xinggang Wang, Wenyu Liu, Bin Feng
The key to action counting is accurately locating each video's repetitive actions.
1 code implementation • 28 May 2024 • Bencheng Liao, Xinggang Wang, Lianghui Zhu, Qian Zhang, Chang Huang
Recently, linear complexity sequence modeling networks have achieved modeling capabilities similar to Vision Transformers on a variety of computer vision tasks, while using fewer FLOPs and less memory.
1 code implementation • 28 May 2024 • Lianghui Zhu, Zilong Huang, Bencheng Liao, Jun Hao Liew, Hanshu Yan, Jiashi Feng, Xinggang Wang
In this paper, we aim to incorporate the sub-quadratic modeling capability of Gated Linear Attention (GLA) into the 2D diffusion backbone.
2 code implementations • 17 May 2024 • Zhentao Liu, Huangxuan Zhao, Wenhui Qin, Zhenghong Zhou, Xinggang Wang, Wenping Wang, Xiaochun Lai, Chuansheng Zheng, Dinggang Shen, Zhiming Cui
With the help of contrast agent, time-resolved 2D DSA images deliver comprehensive insights into blood flow information and can be utilized to reconstruct 3D vessel structures.
1 code implementation • 8 May 2024 • Jingfeng Yao, Xinggang Wang, Yuehao Song, Huangxuan Zhao, Jun Ma, Yajie Chen, Wenyu Liu, Bo wang
The diagnosis and treatment of chest diseases play a crucial role in maintaining human health.
1 code implementation • 28 Mar 2024 • Shuai Zhang, Huangxuan Zhao, Zhenghong Zhou, Guanjun Wu, Chuansheng Zheng, Xinggang Wang, Wenyu Liu
To overcome these limitations, we propose TOGS, a Gaussian splatting method with opacity offset over time, which can effectively improve the rendering quality and speed of 4D DSA.
1 code implementation • 19 Mar 2024 • Yuehao Song, Xinggang Wang, Jingfeng Yao, Wenyu Liu, Jinglin Zhang, Xiangmin Xu
Gaze following aims to interpret human-scene interactions by predicting the person's focal point of gaze.
Ranked #1 on
Gaze Target Estimation
on VideoAttentionTarget
(using extra training data)
1 code implementation • 13 Mar 2024 • Jialv Zou, Bencheng Liao, Qian Zhang, Wenyu Liu, Xinggang Wang
Learning robust and scalable visual representations from massive multi-view video data remains a challenge in computer vision and autonomous driving.
1 code implementation • 22 Feb 2024 • Lianghui Zhu, Junwei Zhou, Yan Liu, Xin Hao, Wenyu Liu, Xinggang Wang
Weakly supervised visual recognition using inexact supervision is a critical yet challenging learning problem.
Image-level Supervised Instance Segmentation
object-detection
+3
1 code implementation • 20 Feb 2024 • Shaoyu Chen, Bo Jiang, Hao Gao, Bencheng Liao, Qing Xu, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang
Learning a human-like driving policy from large-scale driving demonstrations is promising, but the uncertainty and non-deterministic nature of planning make it challenging.
3 code implementations • CVPR 2024 • Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan
The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools.
Ranked #6 on
Zero-Shot Object Detection
on MSCOCO
(AP metric, using extra
training data)
14 code implementations • 17 Jan 2024 • Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, Xinggang Wang
The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images and it has great potential to be the next-generation backbone for vision foundation models.
no code implementations • 11 Jan 2024 • Guanjun Wu, Taoran Yi, Jiemin Fang, Wenyu Liu, Xinggang Wang
To extend HDR NeRF methods to wider applications, we propose a dynamic HDR NeRF framework, named HDR-HexPlane, which can learn 3D scenes from dynamic 2D images captured with various exposures.
no code implementations • 7 Dec 2023 • Yabo Chen, Jiemin Fang, YuYang Huang, Taoran Yi, Xiaopeng Zhang, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian
However, due to the high sparsity of the single input image, Zero-1-to-3 tends to produce geometry and appearance inconsistency across views, especially for complex objects.
2 code implementations • 26 Oct 2023 • Lianghui Zhu, Xinggang Wang, Xinlong Wang
To address this problem, we propose to fine-tune LLMs as scalable judges (JudgeLM) to evaluate LLMs efficiently and effectively in open-ended benchmarks.
1 code implementation • NeurIPS 2023 • Jialv Zou, Xinggang Wang, Jiahao Guo, Wenyu Liu, Qian Zhang, Chang Huang
In our work, we propose a novel perspective for circuit design by treating circuit components as point clouds and using Transformer-based point cloud perception methods to extract features from the circuit.
1 code implementation • CVPR 2024 • Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, Xinggang Wang
Representing and rendering dynamic scenes has been an important but challenging task.
1 code implementation • CVPR 2024 • Taoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, Xinggang Wang
In recent times, the generation of 3D assets from text prompts has shown impressive results.
1 code implementation • ICCV 2023 • Kan Wu, Houwen Peng, Zhenghong Zhou, Bin Xiao, Mengchen Liu, Lu Yuan, Hong Xuan, Michael Valenzuela, Xi, Chen, Xinggang Wang, Hongyang Chao, Han Hu
In this paper, we propose a novel cross-modal distillation method, called TinyCLIP, for large-scale language-image pre-trained models.
no code implementations • 5 Sep 2023 • Zhenghong Zhou, Huangxuan Zhao, Jiemin Fang, Dongqiao Xiang, Lei Chen, Lingxia Wu, Feihong Wu, Wenyu Liu, Chuansheng Zheng, Xinggang Wang
Additionally, 2D and 3D DSA imaging results can be generated from the reconstructed 4D DSA images.
1 code implementation • 31 Aug 2023 • Shuai Bai, Shusheng Yang, Jinze Bai, Peng Wang, Xingxuan Zhang, Junyang Lin, Xinggang Wang, Chang Zhou, Jingren Zhou
Large vision-language models (LVLMs) have recently witnessed rapid advancements, exhibiting a remarkable capacity for perceiving, understanding, and processing visual information by connecting visual receptor with large language models (LLMs).
1 code implementation • 13 Aug 2023 • Xiaohu Huang, Xinggang Wang, Zhidianqiu Jin, Bo Yang, Botao He, Bin Feng, Wenyu Liu
Graph convolutional networks have been widely applied in skeleton-based gait recognition.
2 code implementations • 10 Aug 2023 • Bencheng Liao, Shaoyu Chen, Yunchi Zhang, Bo Jiang, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang
We propose a unified permutation-equivalent modeling approach, \ie, modeling map element as a point set with a group of equivalent permutations, which accurately describes the shape of map element and stabilizes the learning process.
1 code implementation • CVPR 2024 • Haoyi Jiang, Tianheng Cheng, Naiyu Gao, Haoyang Zhang, Tianwei Lin, Wenyu Liu, Xinggang Wang
`3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal undertaking in autonomous driving, aiming to predict voxel occupancy within volumetric scenes.
Ranked #1 on
3D Semantic Scene Completion
on KITTI-360
3D Semantic Scene Completion from a single RGB image
Autonomous Driving
1 code implementation • 23 Jun 2023 • Jiaqi Ma, Tianheng Cheng, Guoli Wang, Qian Zhang, Xinggang Wang, Lefei Zhang
We then leverage degradation-aware visual prompts to establish a controllable and universal model for image restoration, called ProRes, which is applicable to an extensive range of image restoration tasks.
1 code implementation • 8 Jun 2023 • Ruijie Zhang, Qiaozhe Zhang, Yingzhuang Liu, Hao Xin, Yan Liu, Xinggang Wang
Whole slide image (WSI) refers to a type of high-resolution scanned tissue image, which is extensively employed in computer-assisted diagnosis (CAD).
2 code implementations • 8 Jun 2023 • Zelin Liu, Xinggang Wang, Cheng Wang, Wenyu Liu, Xiang Bai
By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack.
Ranked #6 on
Multi-Object Tracking
on MOT20
(using extra training data)
1 code implementation • 7 Jun 2023 • Jingfeng Yao, Xinggang Wang, Lang Ye, Wenyu Liu
In our work, we leverage vision foundation models to enhance the performance of natural image matting.
1 code implementation • 31 May 2023 • Haijun Xiong, Yunze Deng, Bin Feng, Xinggang Wang, Wenyu Liu
Gait recognition, a growing field in biological recognition technology, utilizes distinct walking patterns for accurate individual identification.
9 code implementations • 24 May 2023 • Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang
Recently, plain vision Transformers (ViTs) have shown impressive performance on various computer vision tasks, thanks to their strong modeling capacity and large-scale pretraining.
Ranked #2 on
Image Matting
on Distinctions-646
2 code implementations • 18 May 2023 • Peng Wang, Shijie Wang, Junyang Lin, Shuai Bai, Xiaohuan Zhou, Jingren Zhou, Xinggang Wang, Chang Zhou
In this work, we explore a scalable way for building a general representation model toward unlimited modalities.
Ranked #1 on
Semantic Segmentation
on ADE20K
(using extra training data)
1 code implementation • 16 May 2023 • Junyu Wang, Shijie Wang, Ruijie Zhang, Zengqiang Zheng, Wenyu Liu, Xinggang Wang
We present RND-SCI, a novel framework for compressive hyperspectral image (HSI) reconstruction.
1 code implementation • 19 Apr 2023 • Shaoyu Chen, Yunchi Zhang, Bencheng Liao, Jiafeng Xie, Tianheng Cheng, Wei Sui, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang
We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geometric patterns as unified point sequence representation, which can be extended to most map elements in the driving scene.
no code implementations • 19 Apr 2023 • Lin Niu, Jiawei Liu, Zhihang Yuan, Dawei Yang, Xinggang Wang, Wenyu Liu
PTQ optimizes the quantization parameters by different metrics to minimize the perturbation of quantization.
no code implementations • 7 Apr 2023 • Shaoyu Chen, Tianheng Cheng, Jiemin Fang, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang
Small object detection requires the detection head to scan a large number of positions on image feature maps, which is extremely hard for computation- and energy-efficient lightweight generic detectors.
1 code implementation • 3 Apr 2023 • Lianghui Zhu, Yingyue Li, Jiemin Fang, Yan Liu, Hao Xin, Wenyu Liu, Xinggang Wang
Thus a novel weight-based method is proposed to end-to-end estimate the importance of attention heads, while the self-attention maps are adaptively fused for high-quality CAM results that tend to have more complete objects.
1 code implementation • 3 Apr 2023 • Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu
In this paper, we identify that the challenge in quantizing activations in LLMs arises from varying ranges across channels, rather than solely the presence of outliers.
no code implementations • 30 Mar 2023 • Renhong Zhang, Tianheng Cheng, Shusheng Yang, Haoyi Jiang, Shuai Zhang, Jiancheng Lyu, Xin Li, Xiaowen Ying, Dashan Gao, Wenyu Liu, Xinggang Wang
To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices.
1 code implementation • 28 Mar 2023 • Cheng Wang, Guoli Wang, Qian Zhang, Peng Guo, Wenyu Liu, Xinggang Wang
Fortunately, we have identified two observations that help us achieve the best of both worlds: 1) query-based methods demonstrate superiority over dense proposal-based methods in open-world instance segmentation, and 2) learning localization cues is sufficient for open world instance segmentation.
no code implementations • 27 Mar 2023 • Yifu Zhang, Xinggang Wang, Xiaoqing Ye, Wei zhang, Jincheng Lu, Xiao Tan, Errui Ding, Peize Sun, Jingdong Wang
We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes, which alleviates the problems of object missing and fragmented trajectories.
no code implementations • 27 Mar 2023 • Taoran Yi, Jiemin Fang, Xinggang Wang, Wenyu Liu
Rendering moving human bodies at free viewpoints only from a monocular video is quite a challenging problem.
no code implementations • 23 Mar 2023 • Zhihang Yuan, Jiawei Liu, Jiaxiang Wu, Dawei Yang, Qiang Wu, Guangyu Sun, Wenyu Liu, Xinggang Wang, Bingzhe Wu
Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures.
2 code implementations • ICCV 2023 • Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang
In this paper, we propose VAD, an end-to-end vectorized paradigm for autonomous driving, which models the driving scene as a fully vectorized representation.
Ranked #8 on
Bench2Drive
on Bench2Drive
6 code implementations • 20 Mar 2023 • Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao
We launch EVA-02, a next-generation Transformer-based visual representation pre-trained to reconstruct strong and robust language-aligned vision features via masked image modeling.
1 code implementation • 15 Mar 2023 • Bencheng Liao, Shaoyu Chen, Bo Jiang, Tianheng Cheng, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang
Motivated by this, we propose to model the lane graph in a novel path-wise manner, which well preserves the continuity of the lane and encodes traffic information for planning.
no code implementations • CVPR 2023 • Hao Li, Dingwen Zhang, Nian Liu, Lechao Cheng, Yalun Dai, Chao Zhang, Xinggang Wang, Junwei Han
Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models by giving Saliency Prompt for queries/kernels.
1 code implementation • 27 Jan 2023 • Jie Zhu, Jiyang Qi, Mingyu Ding, Xiaokang Chen, Ping Luo, Xinggang Wang, Wenyu Liu, Leye Wang, Jingdong Wang
The study is mainly motivated by that random views, used in contrastive learning, and random masked (visible) patches, used in masked image modeling, are often about object parts.
1 code implementation • 26 Jan 2023 • Xiaohu Huang, Hao Zhou, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang, Xinggang Wang, Wenyu Liu, Bin Feng
In this paper, we propose a graph contrastive learning framework for skeleton-based action recognition (\textit{SkeletonGCL}) to explore the \textit{global} context across all sequences.
Ranked #16 on
Skeleton Based Action Recognition
on NTU RGB+D
1 code implementation • 24 Jan 2023 • Junyu Wang, Shijie Wang, Wenyu Liu, Zengqiang Zheng, Xinggang Wang
We present a simple, efficient, and scalable unfolding network, SAUNet, to simplify the network design with an adaptive alternate optimization framework for hyperspectral image (HSI) reconstruction.
1 code implementation • CVPR 2023 • Shusheng Yang, Yixiao Ge, Kun Yi, Dian Li, Ying Shan, XiaoHu Qie, Xinggang Wang
Both masked image modeling (MIM) and natural language supervision have facilitated the progress of transferable visual pre-training.
1 code implementation • ICCV 2023 • Ruiqi Wang, Xinggang Wang, Te Li, Rong Yang, Minhong Wan, Wenyu Liu
Category-level 6DoF object pose estimation intends to estimate the rotation, translation, and size of unseen objects.
1 code implementation • CVPR 2023 • Jiawei Liu, Lin Niu, Zhihang Yuan, Dawei Yang, Xinggang Wang, Wenyu Liu
It determines the quantization parameters by using the information of differences between network prediction before and after quantization.
no code implementations • 5 Dec 2022 • Bo Jiang, Shaoyu Chen, Xinggang Wang, Bencheng Liao, Tianheng Cheng, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang
Motion prediction is highly relevant to the perception of dynamic objects and static map elements in the scenarios of autonomous driving.
6 code implementations • CVPR 2023 • Yuxin Fang, Wen Wang, Binhui Xie, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao
We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data.
Ranked #1 on
Object Detection
on COCO-O
1 code implementation • CVPR 2023 • Tianheng Cheng, Xinggang Wang, Shaoyu Chen, Qian Zhang, Wenyu Liu
Most existing methods for weakly supervised instance segmentation focus on designing heuristic losses with priors from bounding boxes.
1 code implementation • 30 Aug 2022 • Bencheng Liao, Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Wenyu Liu, Chang Huang
High-definition (HD) map provides abundant and precise environmental information of the driving scene, serving as a fundamental and indispensable component for planning in autonomous driving system.
Ranked #8 on
3D Lane Detection
on OpenLane-V2 val
no code implementations • 7 Aug 2022 • Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, Wenyu Liu
To address the problem, we present an efficient approach to compute a marginal probability for each pair of objects in real time.
1 code implementation • 20 Jul 2022 • Shenyuan Gao, Chunluan Zhou, Chao Ma, Xinggang Wang, Junsong Yuan
However, the independent correlation computation in the attention mechanism could result in noisy and ambiguous attention weights, which inhibits further performance improvement.
Ranked #2 on
Visual Object Tracking
on OTB-100
1 code implementation • 5 Jul 2022 • Zhi Liu, Shaoyu Chen, Xiaojie Guo, Xinggang Wang, Tianheng Cheng, Hongmei Zhu, Qian Zhang, Wenyu Liu, Yi Zhang
In this work, we propose PolarBEV for vision-based uneven BEV representation learning.
1 code implementation • 22 Jun 2022 • Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Chang Huang, Wenyu Liu
Based on Polar Parametrization, we propose a surround-view 3D DEtection TRansformer, named PolarDETR.
1 code implementation • 13 Jun 2022 • Wenqiang Zhang, Tianheng Cheng, Xinggang Wang, Shaoyu Chen, Qian Zhang, Wenyu Liu
The query mechanism introduced in the DETR method is changing the paradigm of object detection and recently there are many query-based methods have obtained strong object detection performance.
1 code implementation • 9 Jun 2022 • Shaoyu Chen, Tianheng Cheng, Xinggang Wang, Wenming Meng, Qian Zhang, Wenyu Liu
GKT leverages the geometric priors to guide the transformer to focus on discriminative regions and unfolds kernel features to generate BEV representation.
1 code implementation • 30 May 2022 • Jiemin Fang, Taoran Yi, Xinggang Wang, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Matthias Nießner, Qi Tian
A multi-distance interpolation method is proposed and applied on voxel features to model both small and large motions.
3 code implementations • CVPR 2022 • Shusheng Yang, Xinggang Wang, Yu Li, Yuxin Fang, Jiemin Fang, Wenyu Liu, Xun Zhao, Ying Shan
To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision Transformer (TeViT) for video instance segmentation (VIS).
Ranked #37 on
Video Instance Segmentation
on OVIS validation
3 code implementations • CVPR 2022 • Wenqiang Zhang, Zilong Huang, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen
Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices.
1 code implementation • ICCV 2021 • Duowang Zhu, Xiaohu Huang, Xinggang Wang, Bo Yang, Botao He, Wenyu Liu, Bin Feng
Although gait recognition has drawn increasing research attention recently, since the silhouette differences are quite subtle in spatial domain, temporal feature representation is crucial for gait recognition.
Ranked #3 on
Gait Recognition
on OUMVLP
2 code implementations • ICCV 2023 • Yuxin Fang, Shusheng Yang, Shijie Wang, Yixiao Ge, Ying Shan, Xinggang Wang
We present an approach to efficiently and effectively adapt a masked image modeling (MIM) pre-trained vanilla Vision Transformer (ViT) for object detection, which is based on our two novel observations: (i) A MIM pre-trained vanilla ViT encoder can work surprisingly well in the challenging object-level recognition scenario even with randomly sampled partial observations, e. g., only 25% $\sim$ 50% of the input embeddings.
1 code implementation • CVPR 2022 • Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Wenqiang Zhang, Qian Zhang, Chang Huang, Wenyu Liu
For segmentation, we integrate AziNorm into KPConv.
2 code implementations • CVPR 2022 • Tianheng Cheng, Xinggang Wang, Shaoyu Chen, Wenqiang Zhang, Qian Zhang, Chang Huang, Zhaoxiang Zhang, Wenyu Liu
In this paper, we propose a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation.
Ranked #7 on
Real-time Instance Segmentation
on MSCOCO
no code implementations • 7 Feb 2022 • Yuxin Fang, Li Dong, Hangbo Bao, Xinggang Wang, Furu Wei
Given this corrupted image, an enhancer network learns to either recover all the original image pixels, or predict whether each visual token is replaced by a generator sample or not.
1 code implementation • 30 Nov 2021 • Jiemin Fang, Lingxi Xie, Xinggang Wang, Xiaopeng Zhang, Wenyu Liu, Qi Tian
Neural radiance fields (NeRF) have shown great potentials in representing 3D scenes and synthesizing novel views, but the computational overhead of NeRF at the inference stage is still heavy.
no code implementations • 15 Nov 2021 • Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai
To promote the development of occlusion understanding, we collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario.
10 code implementations • arXiv 2021 • Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo, Wenyu Liu, Xinggang Wang
ByteTrack also achieves state-of-the-art performance on MOT20, HiEve and BDD100K tracking benchmarks.
Ranked #1 on
Multiple Object Tracking
on BDD100K val
5 code implementations • 25 Aug 2021 • Dong Wu, Manwen Liao, Weitian Zhang, Xinggang Wang, Xiang Bai, Wenqing Cheng, Wenyu Liu
A panoptic driving perception system is an essential part of autonomous driving.
Ranked #3 on
Drivable Area Detection
on BDD100K val
1 code implementation • ICCV 2021 • Shaoyu Chen, Jiemin Fang, Qian Zhang, Wenyu Liu, Xinggang Wang
Instance segmentation on point clouds is a fundamental task in 3D scene perception.
Ranked #4 on
3D Instance Segmentation
on S3DIS
(mCov metric, using extra
training data)
no code implementations • 5 Aug 2021 • Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenyu Liu, Wenjun Zeng
We estimate 3D poses from the voxel representation by predicting whether each voxel contains a particular body joint.
Ranked #7 on
3D Multi-Person Pose Estimation
on Campus
no code implementations • 5 Jul 2021 • Yuxin Fang, Xinggang Wang, Rui Wu, Wenyu Liu
Recent studies indicate that hierarchical Vision Transformer with a macro architecture of interleaved non-overlapped window-based self-attention \& shifted-window operation is able to achieve state-of-the-art performance in various visual recognition tasks, and challenges the ubiquitous convolutional neural networks (CNNs) using densely slid kernels.
1 code implementation • ICLR 2022 • Haohang Xu, Jiemin Fang, Xiaopeng Zhang, Lingxi Xie, Xinggang Wang, Wenrui Dai, Hongkai Xiong, Qi Tian
Here bag of instances indicates a set of similar samples constructed by the teacher and are grouped within a bag, and the goal of distillation is to aggregate compact representations over the student with respect to instances in a bag.
1 code implementation • 22 Jun 2021 • Shusheng Yang, Yuxin Fang, Xinggang Wang, Yu Li, Ying Shan, Bin Feng, Wenyu Liu
Recently, query based deep networks catch lots of attention owing to their end-to-end pipeline and competitive results on several fundamental computer vision tasks, such as object detection, semantic segmentation, and instance segmentation.
2 code implementations • NeurIPS 2021 • Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu
Can Transformer perform 2D object- and region-level recognition from a pure sequence-to-sequence perspective with minimal knowledge about the 2D spatial structure?
Ranked #30 on
Object Detection
on COCO-O
3 code implementations • CVPR 2022 • Jiemin Fang, Lingxi Xie, Xinggang Wang, Xiaopeng Zhang, Wenyu Liu, Qi Tian
Transformers have offered a new methodology of designing neural networks for visual recognition.
5 code implementations • ICCV 2021 • Yuxin Fang, Shusheng Yang, Xinggang Wang, Yu Li, Chen Fang, Ying Shan, Bin Feng, Wenyu Liu
The key insight of QueryInst is to leverage the intrinsic one-to-one correspondence in object queries across different stages, as well as one-to-one correspondence between mask RoI features and object queries in the same stage.
Ranked #13 on
Object Detection
on COCO-O
1 code implementation • ICCV 2021 • Shusheng Yang, Yuxin Fang, Xinggang Wang, Yu Li, Chen Fang, Ying Shan, Bin Feng, Wenyu Liu
For temporal information modeling in VIS, we present a novel crossover learning scheme that uses the instance feature in the current frame to pixel-wisely localize the same instance in other frames.
Ranked #28 on
Video Instance Segmentation
on YouTube-VIS validation
no code implementations • CVPR 2021 • Xinggang Wang, Jiapei Feng, Bin Hu, Qi Ding, Longjin Ran, Xiaoxin Chen, Wenyu Liu
Humans have a strong class-agnostic object segmentation ability and can outline boundaries of unknown objects precisely, which motivates us to propose a box-supervised class-agnostic object segmentation (BoxCaseg) based solution for weakly-supervised instance segmentation.
Ranked #5 on
Box-supervised Instance Segmentation
on COCO test-dev
(using extra training data)
no code implementations • 2 Apr 2021 • Zilong Huang, Wentian Hao, Xinggang Wang, Mingyuan Tao, Jianqiang Huang, Wenyu Liu, Xian-Sheng Hua
Despite their success for semantic segmentation, convolutional neural networks are ill-equipped for incremental learning, \ie, adapting the original segmentation model as new classes are available but the initial training data is not retained.
1 code implementation • 25 Mar 2021 • Xinggang Wang, Zhaojin Huang, Bencheng Liao, Lichao Huang, Yongchao Gong, Chang Huang
Based on deep networks, video object detection is actively studied for pushing the limits of detection speed and accuracy.
no code implementations • CVPR 2021 • Qiang Zhou, Shiyin Wang, Yitong Wang, Zilong Huang, Xinggang Wang
Besides, an Amodal Human Perception dataset (AHP) is collected to settle the task of human de-occlusion.
no code implementations • 19 Mar 2021 • Haoyang Li, Xinggang Wang
Given the great success of Deep Neural Networks(DNNs) and the black-box nature of it, the interpretability of these models becomes an important issue. The majority of previous research works on the post-hoc interpretation of a trained model. But recently, adversarial training shows that it is possible for a model to have an interpretable input-gradient through training. However, adversarial training lacks efficiency for interpretability. To resolve this problem, we construct an approximation of the adversarial perturbations and discover a connection between adversarial training and amplitude modulation.
no code implementations • 18 Mar 2021 • Jiaxin Zhang, Wei Sui, Xinggang Wang, Wenming Meng, Hongmei Zhu, Qian Zhang
Second, the poses predicted by CNNs are further improved by minimizing photometric errors via gradient updates of poses during inference phases.
2 code implementations • 2 Feb 2021 • Jiyang Qi, Yan Gao, Yao Hu, Xinggang Wang, Xiaoyu Liu, Xiang Bai, Serge Belongie, Alan Yuille, Philip H. S. Torr, Song Bai
On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16. 3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.
Ranked #33 on
Video Instance Segmentation
on YouTube-VIS validation
no code implementations • 13 Jan 2021 • Mengting Chen, Xinggang Wang, Heng Luo, Yifeng Geng, Wenyu Liu
By applying the proposed feature matching block in different layers of the few-shot recognition network, multi-scale information among the compared images can be incorporated into the final cascaded matching feature, which boosts the recognition performance further and generalizes better by learning on relationships.
1 code implementation • 21 Dec 2020 • Jie Qin, Jiemin Fang, Qian Zhang, Wenyu Liu, Xingang Wang, Xinggang Wang
Especially, CutMix uses a simple but effective method to improve the classifiers by randomly cropping a patch from one image and pasting it on another image.
1 code implementation • 13 Dec 2020 • Wenqiang Zhang, Jiemin Fang, Xinggang Wang, Wenyu Liu
Human pose estimation from image and video is a vital task in many multimedia applications.
no code implementations • 13 Oct 2020 • Xinggang Wang, Aijun Sun, Junbo Ge
Myofibroblasts, ECM and lumen (intima)/vasa vasorum (VV) (adventitia) constitute granulation tissue repair.
no code implementations • 20 Aug 2020 • Xinggang Wang, Junbo Ge
When blood micro cluster flows over a very short distance or the same transection of the artery, previous studies did not consider the conversion between 1/2\r{ho}v^2 and P. Therefore, low shear stress aggravates atherosclerosis is an appearance, and the essence is that these areas with smaller blood velocity have much bigger hydrostatic pressure, which aggravates atherosclerosis.
no code implementations • 21 Jul 2020 • Xinggang Wang, Junbo Ge
It is the first time that lipid rich plaques with lots of foam cells, extracellular lipids and collagen fibers formed in vitro.
1 code implementation • ECCV 2020 • Tianheng Cheng, Xinggang Wang, Lichao Huang, Wenyu Liu
Besides, it is not surprising to observe that BMask R-CNN obtains more obvious improvement when the evaluation criterion requires better localization (e. g., AP$_{75}$) as shown in Fig. 1.
1 code implementation • 17 Jul 2020 • Jiwei Xu, Xinggang Wang, Bin Feng, Wenyu Liu
Text-independent speaker verification is an important artificial intelligence problem that has a wide spectrum of applications, such as criminal investigation, payment certification, and interest-based customer services.
2 code implementations • 21 Jun 2020 • Jiemin Fang, Yuzhu Sun, Qian Zhang, Kangjian Peng, Yuan Li, Wenyu Liu, Xinggang Wang
In this paper, we propose a Fast Network Adaptation (FNA++) method, which can adapt both the architecture and parameters of a seed network (e. g. an ImageNet pre-trained network) to become a network with different depths, widths, or kernel sizes via a parameter remapping technique, making it possible to use NAS for segmentation and detection tasks a lot more efficiently.
33 code implementations • 4 Apr 2020 • Yifu Zhang, Chunyu Wang, Xinggang Wang, Wen-Jun Zeng, Wenyu Liu
Formulating MOT as multi-task learning of object detection and re-ID in a single network is appealing since it allows joint optimization of the two tasks and enjoys high computation efficiency.
Ranked #1 on
Multi-Object Tracking
on 2DMOT15
(using extra training data)
1 code implementation • medRxiv 2020 • Chuansheng Zheng, Xianbo Deng, Qing Fu, Qiang Zhou, Jiapei Feng, Hui Ma, Wenyu Liu, Xinggang Wang
Our weakly-supervised deep learning model can accurately predict the COVID-19 infectious probability in chest CT volumes without the need for annotating the lesions for training.
1 code implementation • 24 Feb 2020 • Zilong Huang, Yunchao Wei, Xinggang Wang, Wenyu Liu, Thomas S. Huang, Humphrey Shi
Aggregating features in terms of different convolutional blocks or contextual embeddings has been proven to be an effective way to strengthen feature representations for semantic segmentation.
no code implementations • ICLR 2020 • Jiemin Fang, Yuzhu Sun, Kangjian Peng, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang
In our experiments, we conduct FNA on MobileNetV2 to obtain new networks for both segmentation and detection that clearly out-perform existing networks designed both manually and by NAS.
1 code implementation • 31 Dec 2019 • Mengting Chen, Yuxin Fang, Xinggang Wang, Heng Luo, Yifeng Geng, Xin-Yu Zhang, Chang Huang, Wenyu Liu, Bo wang
The learning problem of the sample generation (i. e., diversity transfer) is solved via minimizing an effective meta-classification loss in a single-stage network, instead of the generative loss in previous works.
2 code implementations • 12 Dec 2019 • Shengkai Wu, Xiaoping Li, Xinggang Wang
The detection confidence is then used as the input of the subsequent NMS and COCO AP computation, which will substantially improve the localization accuracy of models.
42 code implementations • 20 Aug 2019 • Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao
High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.
Ranked #1 on
Object Detection
on COCO test-dev
(Hardware Burden metric)
no code implementations • 15 Aug 2019 • Shengkai Wu, Jinrong Yang, Xinggang Wang, Xiaoping Li
The IoU-balanced localization loss decreases the gradient of examples with low IoU and increases the gradient of examples with high IoU, which can improve the localization accuracy of models.
no code implementations • 11 Jul 2019 • Hao Luo, Lichao Huang, Han Shen, Yuan Li, Chang Huang, Xinggang Wang
Without any bells and whistles, our method obtains 80. 3\% mAP on the ImageNet VID dataset, which is superior over the previous state-of-the-arts.
1 code implementation • 2 Jul 2019 • Qiang Zhou, Zilong Huang, Lichao Huang, Yongchao Gong, Han Shen, Chang Huang, Wenyu Liu, Xinggang Wang
Video object segmentation (VOS) aims at pixel-level object tracking given only the annotations in the first frame.
Ranked #1 on
Visual Object Tracking
on YouTube-VOS 2018
(Jaccard (Seen) metric)
1 code implementation • CVPR 2020 • Jiemin Fang, Yuzhu Sun, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang
We revisit the search space design in most previous NAS methods and find the number and widths of blocks are set manually.
Ranked #91 on
Neural Architecture Search
on ImageNet
39 code implementations • 9 Apr 2019 • Ke Sun, Yang Zhao, Borui Jiang, Tianheng Cheng, Bin Xiao, Dong Liu, Yadong Mu, Xinggang Wang, Wenyu Liu, Jingdong Wang
The proposed approach achieves superior results to existing single-model networks on COCO object detection.
Ranked #7 on
Semantic Segmentation
on LIP val
no code implementations • 25 Mar 2019 • Zonglin Yang, Xinggang Wang
Capsule network has shown various advantages over convolutional neural network (CNN).
no code implementations • CVPR 2019 • Xin Lei, Liangyu He, Yixuan Tan, Ken Xingze Wang, Xinggang Wang, Yihan Du, Shanhui Fan, Zongfu Yu
Visual object recognition under situations in which the direct line-of-sight is blocked, such as when it is occluded around the corner, is of practical importance in a wide range of applications.
3 code implementations • CVPR 2019 • Zhaojin Huang, Lichao Huang, Yongchao Gong, Chang Huang, Xinggang Wang
In this paper, we study this problem and propose Mask Scoring R-CNN which contains a network block to learn the quality of the predicted instance masks.
Ranked #77 on
Instance Segmentation
on COCO minival
1 code implementation • 17 Jan 2019 • Jiemin Fang, Yukang Chen, Xinbang Zhang, Qian Zhang, Chang Huang, Gaofeng Meng, Wenyu Liu, Xinggang Wang
In our implementations, architectures are first searched on a small dataset, e. g., CIFAR-10.
4 code implementations • ICCV 2019 • Zilong Huang, Xinggang Wang, Yunchao Wei, Lichao Huang, Humphrey Shi, Wenyu Liu, Thomas S. Huang
Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11x less GPU memory usage.
Ranked #7 on
Semantic Segmentation
on FoodSeg103
(using extra training data)
no code implementations • 13 Nov 2018 • Hao Luo, Wenxuan Xie, Xinggang Wang, Wen-Jun Zeng
Trackers are in general more efficient than detectors but bear the risk of drifting.
no code implementations • ECCV 2018 • Peng Tang, Xinggang Wang, Angtian Wang, Yongluan Yan, Wenyu Liu, Junzhou Huang, Alan Yuille
The Convolutional Neural Network (CNN) based region proposal generation method (i. e. region proposal network), trained using bounding box annotations, is an essential component in modern fully supervised object detectors.