Search Results for author: Mengchen Liu

Found 36 papers, 18 papers with code

CvT: Introducing Convolutions to Vision Transformers

14 code implementations • ICCV 2021 • Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang

We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

Ranked #3 on Image Classification on Flowers-102 (using extra training data)

Image Classification

124,527

Paper
Code

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

2 code implementations • 21 Jul 2022 • Kan Wu, Jinnian Zhang, Houwen Peng, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan

It achieves a top-1 accuracy of 84. 8% on ImageNet-1k with only 21M parameters, being comparable to Swin-B pretrained on ImageNet-21k while using 4. 2 times fewer parameters.

Ranked #133 on Image Classification on ImageNet

Image Classification Knowledge Distillation

29,671

Paper
Code

Dynamic Head: Unifying Object Detection Heads with Attentions

3 code implementations • CVPR 2021 • Xiyang Dai, Yinpeng Chen, Bin Xiao, Dongdong Chen, Mengchen Liu, Lu Yuan, Lei Zhang

In this paper, we present a novel dynamic head framework to unify object detection heads with attentions.

Ranked #2 on Object Detection on COCO 2017 val (AP75 metric)

Object object-detection +1

27,708

Paper
Code

Dynamic Convolution: Attention over Convolution Kernels

5 code implementations • CVPR 2020 • Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dong-Dong Chen, Lu Yuan, Zicheng Liu

Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited representation capability.

Ranked #905 on Image Classification on ImageNet

Image Classification Keypoint Detection

10,797

Paper
Code

MiniViT: Compressing Vision Transformers with Weight Multiplexing

2 code implementations • CVPR 2022 • Jinnian Zhang, Houwen Peng, Kan Wu, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan

The central idea of MiniViT is to multiplex the weights of consecutive transformer blocks.

Ranked #209 on Image Classification on ImageNet (using extra training data)

Image Classification

1,558

Paper
Code

TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance

1 code implementation • ICCV 2023 • Kan Wu, Houwen Peng, Zhenghong Zhou, Bin Xiao, Mengchen Liu, Lu Yuan, Hong Xuan, Michael Valenzuela, Xi, Chen, Xinggang Wang, Hongyang Chao, Han Hu

In this paper, we propose a novel cross-modal distillation method, called TinyCLIP, for large-scale language-image pre-trained models.

1,558

Paper
Code

Mobile-Former: Bridging MobileNet and Transformer

4 code implementations • CVPR 2022 • Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Xiaoyi Dong, Lu Yuan, Zicheng Liu

This structure leverages the advantages of MobileNet at local processing and transformer at global interaction.

object-detection Object Detection

1,183

Paper
Code

Florence: A New Foundation Model for Computer Vision

1 code implementation • 22 Nov 2021 • Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.

Ranked #1 on Action Recognition In Videos on Kinetics-600

Action Classification Action Recognition In Videos +12

367

Paper
Code

MicroNet: Improving Image Recognition with Extremely Low FLOPs

1 code implementation • ICCV 2021 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos

This paper aims at addressing the problem of substantial performance degradation at extremely low computational cost (e. g. 5M FLOPs on ImageNet classification).

328

Paper
Code

Dynamic ReLU

2 code implementations • ECCV 2020 • Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dong-Dong Chen, Lu Yuan, Zicheng Liu

Rectified linear units (ReLU) are commonly used in deep neural networks.

204

Paper
Code

BEVT: BERT Pretraining of Video Transformers

1 code implementation • CVPR 2022 • Rui Wang, Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Yu-Gang Jiang, Luowei Zhou, Lu Yuan

This design is motivated by two observations: 1) transformers learned on image datasets provide decent spatial priors that can ease the learning of video transformers, which are often times computationally-intensive if trained from scratch; 2) discriminative clues, i. e., spatial and temporal information, needed to make correct predictions vary among different videos due to large intra-class and inter-class variations.

Ranked #8 on Action Recognition on Diving-48

Action Recognition Representation Learning

153

Paper
Code

Reduce Information Loss in Transformers for Pluralistic Image Inpainting

1 code implementation • CVPR 2022 • Qiankun Liu, Zhentao Tan, Dongdong Chen, Qi Chu, Xiyang Dai, Yinpeng Chen, Mengchen Liu, Lu Yuan, Nenghai Yu

The indices of quantized pixels are used as tokens for the inputs and prediction targets of transformer.

Ranked #6 on Seeing Beyond the Visible on KITTI360-EX

Image Inpainting Quantization +1

147

Paper
Code

Revisiting Dynamic Convolution via Matrix Decomposition

1 code implementation • ICLR 2021 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Ye Yu, Lu Yuan, Zicheng Liu, Mei Chen, Nuno Vasconcelos

It has two limitations: (a) it increases the number of convolutional weights by K-times, and (b) the joint optimization of dynamic attention and static convolution kernels is challenging.

Dimensionality Reduction

129

Paper
Code

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

4 code implementations • CVPR 2023 • Rui Wang, Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Lu Yuan, Yu-Gang Jiang

For the choice of teacher models, we observe that students taught by video teachers perform better on temporally-heavy video tasks, while image teachers transfer stronger spatial representations for spatially-heavy video tasks.

Ranked #1 on Self-Supervised Action Recognition on HMDB51

Action Classification Representation Learning +1

Paper
Code

Should All Proposals be Treated Equally in Object Detection?

1 code implementation • 7 Jul 2022 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Pei Yu, Jing Yin, Lu Yuan, Zicheng Liu, Nuno Vasconcelos

We formulate this as a learning problem where the goal is to assign operators to proposals, in the detection head, so that the total computational cost is constrained and the precision is maximized.

Object Object Detection

Paper
Code

Stronger NAS with Weaker Predictors

1 code implementation • NeurIPS 2021 • Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan

We propose a paradigm shift from fitting the whole architecture space using one strong predictor, to progressively fitting a search path towards the high-performance sub-space through a set of weaker predictors.

Neural Architecture Search

Paper
Code

Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations

1 code implementation • 27 Feb 2023 • Ziyu Jiang, Yinpeng Chen, Mengchen Liu, Dongdong Chen, Xiyang Dai, Lu Yuan, Zicheng Liu, Zhangyang Wang

This motivates us to shift the paradigm from combining loss at the end, to choosing the proper learning method per network layer.

Contrastive Learning Few-Shot Learning

Paper
Code

Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search

1 code implementation • 15 Mar 2024 • Hongyuan Yu, Cheng Wan, Mengchen Liu, Dongdong Chen, Bin Xiao, Xiyang Dai

Manually replacing convolution layers with multi-head self-attention is non-trivial due to the costly overhead in memory to maintain high resolution.

Autonomous Driving Image Segmentation +2

Paper
Code

Towards Better Analysis of Machine Learning Models: A Visual Analytics Perspective

no code implementations • 4 Feb 2017 • Shixia Liu, Xiting Wang, Mengchen Liu, Jun Zhu

Interactive model analysis, the process of understanding, diagnosing, and refining a machine learning model with the help of interactive visualization, is very important for users to efficiently solve real-world artificial intelligence and data mining problems.

BIG-bench Machine Learning

Paper
Add Code

Towards Better Analysis of Deep Convolutional Neural Networks

no code implementations • 24 Apr 2016 • Mengchen Liu, Jiaxin Shi, Zhen Li, Chongxuan Li, Jun Zhu, Shixia Liu

Deep convolutional neural networks (CNNs) have achieved breakthrough performance in many pattern recognition tasks such as image classification.

Image Classification

Paper
Add Code

Analyzing the Noise Robustness of Deep Neural Networks

no code implementations • 9 Oct 2018 • Mengchen Liu, Shixia Liu, Hang Su, Kelei Cao, Jun Zhu

Deep neural networks (DNNs) are vulnerable to maliciously generated adversarial examples.

Paper
Add Code

Analyzing the Noise Robustness of Deep Neural Networks

no code implementations • 26 Jan 2020 • Kelei Cao, Mengchen Liu, Hang Su, Jing Wu, Jun Zhu, Shixia Liu

The key is to compare and analyze the datapaths of both the adversarial and normal examples.

Adversarial Attack

Paper
Add Code

DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search

no code implementations • ECCV 2020 • Xiyang Dai, Dong-Dong Chen, Mengchen Liu, Yinpeng Chen, Lu Yuan

One common way is searching on a smaller proxy dataset (e. g., CIFAR-10) and then transferring to the target task (e. g., ImageNet).

Neural Architecture Search

Paper
Add Code

Diagnosing Concept Drift with Visual Analytics

no code implementations • 28 Jul 2020 • Weikai Yang, Zhen Li, Mengchen Liu, Yafeng Lu, Kelei Cao, Ross Maciejewski, Shixia Liu

Concept drift is a phenomenon in which the distribution of a data stream changes over time in unforeseen ways, causing prediction models built on historical data to become inaccurate.

text-classification Text Classification

Paper
Add Code

Weak NAS Predictor Is All You Need

no code implementations • 1 Jan 2021 • Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan

Rather than expecting a single strong predictor to model the whole space, we seek a progressive line of weak predictors that can connect a path to the best architecture, thus greatly simplifying the learning task of each predictor.

Neural Architecture Search

Paper
Add Code

MicroNet: Towards Image Recognition with Extremely Low FLOPs

no code implementations • 24 Nov 2020 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos

In this paper, we present MicroNet, which is an efficient convolutional neural network using extremely low computational cost (e. g. 6 MFLOPs on ImageNet classification).

Paper
Add Code

Residual Mixture of Experts

no code implementations • 20 Apr 2022 • Lemeng Wu, Mengchen Liu, Yinpeng Chen, Dongdong Chen, Xiyang Dai, Lu Yuan

In this paper, we propose Residual Mixture of Experts (RMoE), an efficient training pipeline for MoE vision transformers on downstream tasks, such as segmentation and detection.

object-detection Object Detection

Paper
Add Code

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding

no code implementations • CVPR 2023 • Lingchen Meng, Xiyang Dai, Yinpeng Chen, Pengchuan Zhang, Dongdong Chen, Mengchen Liu, JianFeng Wang, Zuxuan Wu, Lu Yuan, Yu-Gang Jiang

Detection Hub further achieves SoTA performance on UODB benchmark with wide variety of datasets.

Object object-detection +1

Paper
Add Code

Visual Analysis of Neural Architecture Spaces for Summarizing Design Principles

no code implementations • 20 Aug 2022 • Jun Yuan, Mengchen Liu, Fengyuan Tian, Shixia Liu

To ease this process, we develop ArchExplorer, a visual analysis method for understanding a neural architecture space and summarizing design principles.

Paper
Add Code

Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling

no code implementations • 25 Aug 2022 • Rui Wang, Zuxuan Wu, Dongdong Chen, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Luowei Zhou, Lu Yuan, Yu-Gang Jiang

To avoid significant computational cost incurred by computing self-attention between the large number of local patches in videos, we propose to use very few global tokens (e. g., 6) for a whole video in Transformers to exchange information with 3D-CNNs with a cross-attention mechanism.

Video Recognition

Paper
Add Code

Self-Supervised Learning based on Heat Equation

no code implementations • 23 Nov 2022 • Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Youzuo Lin

When transferring to object detection with frozen backbone, QB-Heat outperforms MoCo-v2 and supervised pre-training on ImageNet by 7. 9 and 4. 5 AP respectively.

Image Classification object-detection +2

Paper
Add Code

Image as First-Order Norm+Linear Autoregression: Unveiling Mathematical Invariance

no code implementations • 25 May 2023 • Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Youzuo Lin

This paper introduces a novel mathematical property applicable to diverse images, referred to as FINOLA (First-Order Norm+Linear Autoregressive).

Image Classification Image Reconstruction +3

Paper
Add Code

Foundation Models Meet Visualizations: Challenges and Opportunities

no code implementations • 9 Oct 2023 • Weikai Yang, Mengchen Liu, Zheng Wang, Shixia Liu

Recent studies have indicated that foundation models, such as BERT and GPT, excel in adapting to a variety of downstream tasks.

Fairness

Paper
Add Code

On the Hidden Waves of Image

no code implementations • 19 Oct 2023 • Yinpeng Chen, Dongdong Chen, Xiyang Dai, Mengchen Liu, Lu Yuan, Zicheng Liu, Youzuo Lin

We term this phenomenon hidden waves, as it reveals that, although the speeds of the set of wave equations and autoregressive coefficient matrices are latent, they are both learnable and shared across images.

Paper
Add Code

Fully Authentic Visual Question Answering Dataset from Online Communities

no code implementations • 27 Nov 2023 • Chongyan Chen, Mengchen Liu, Noel Codella, Yunsheng Li, Lu Yuan, Danna Gurari

Visual Question Answering (VQA) entails answering questions about images.

Question Answering Visual Question Answering

Paper
Add Code

An Evaluation of GPT-4V and Gemini in Online VQA

no code implementations • 17 Dec 2023 • Mengchen Liu, Chongyan Chen, Danna Gurari

While there is much excitement about the potential of large multimodal models (LMM), a comprehensive evaluation is critical to establish their true capabilities and limitations.

Question Answering Visual Question Answering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.