Search Results for author: Yuxin Fang

Found 12 papers, 10 papers with code

EVA-CLIP: Improved Training Techniques for CLIP at Scale

3 code implementations • 27 Mar 2023 • Quan Sun, Yuxin Fang, Ledell Wu, Xinlong Wang, Yue Cao

Our approach incorporates new techniques for representation learning, optimization, and augmentation, enabling EVA-CLIP to achieve superior performance compared to previous CLIP models with the same number of parameters but significantly smaller training costs.

Ranked #4 on Zero-Shot Transfer Image Classification on Food-101

Image Classification Representation Learning +2

1,961

Paper
Code

EVA-02: A Visual Representation for Neon Genesis

6 code implementations • 20 Mar 2023 • Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao

We launch EVA-02, a next-generation Transformer-based visual representation pre-trained to reconstruct strong and robust language-aligned vision features via masked image modeling.

29,735

Paper
Code

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

6 code implementations • CVPR 2023 • Yuxin Fang, Wen Wang, Binhui Xie, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao

We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data.

Ranked #1 on Self-Supervised Image Classification (with CLIP) on ImageNet (zero-shot)

Action Classification Action Recognition +9

29,735

Paper
Code

Temporally Efficient Vision Transformer for Video Instance Segmentation

3 code implementations • CVPR 2022 • Shusheng Yang, Xinggang Wang, Yu Li, Yuxin Fang, Jiemin Fang, Wenyu Liu, Xun Zhao, Ying Shan

To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision Transformer (TeViT) for video instance segmentation (VIS).

Ranked #35 on Video Instance Segmentation on OVIS validation

Instance Segmentation Semantic Segmentation +1

400

Paper
Code

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection

2 code implementations • ICCV 2023 • Yuxin Fang, Shusheng Yang, Shijie Wang, Yixiao Ge, Ying Shan, Xinggang Wang

We present an approach to efficiently and effectively adapt a masked image modeling (MIM) pre-trained vanilla Vision Transformer (ViT) for object detection, which is based on our two novel observations: (i) A MIM pre-trained vanilla ViT encoder can work surprisingly well in the challenging object-level recognition scenario even with randomly sampled partial observations, e. g., only 25% $\sim$ 50% of the input embeddings.

Instance Segmentation Object +2

1,961

Paper
Code

Corrupted Image Modeling for Self-Supervised Visual Pre-Training

no code implementations • 7 Feb 2022 • Yuxin Fang, Li Dong, Hangbo Bao, Xinggang Wang, Furu Wei

Given this corrupted image, an enhancer network learns to either recover all the original image pixels, or predict whether each visual token is replaced by a generator sample or not.

Image Classification Semantic Segmentation

Paper
Add Code

What Makes for Hierarchical Vision Transformer?

no code implementations • 5 Jul 2021 • Yuxin Fang, Xinggang Wang, Rui Wu, Wenyu Liu

Recent studies indicate that hierarchical Vision Transformer with a macro architecture of interleaved non-overlapped window-based self-attention \& shifted-window operation is able to achieve state-of-the-art performance in various visual recognition tasks, and challenges the ubiquitous convolutional neural networks (CNNs) using densely slid kernels.

Instance Segmentation object-detection +3

Paper
Add Code

Tracking Instances as Queries

1 code implementation • 22 Jun 2021 • Shusheng Yang, Yuxin Fang, Xinggang Wang, Yu Li, Ying Shan, Bin Feng, Wenyu Liu

Recently, query based deep networks catch lots of attention owing to their end-to-end pipeline and competitive results on several fundamental computer vision tasks, such as object detection, semantic segmentation, and instance segmentation.

Instance Segmentation object-detection +4

400

Paper
Code

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

2 code implementations • NeurIPS 2021 • Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu

Can Transformer perform 2D object- and region-level recognition from a pure sequence-to-sequence perspective with minimal knowledge about the 2D spatial structure?

Ranked #30 on Object Detection on COCO-O

Object object-detection +1

124,889

Paper
Code

Instances as Queries

5 code implementations • ICCV 2021 • Yuxin Fang, Shusheng Yang, Xinggang Wang, Yu Li, Chen Fang, Ying Shan, Bin Feng, Wenyu Liu

The key insight of QueryInst is to leverage the intrinsic one-to-one correspondence in object queries across different stages, as well as one-to-one correspondence between mask RoI features and object queries in the same stage.

Ranked #13 on Object Detection on COCO-O (using extra training data)

Instance Segmentation Object +4

27,765

Paper
Code

Crossover Learning for Fast Online Video Instance Segmentation

1 code implementation • ICCV 2021 • Shusheng Yang, Yuxin Fang, Xinggang Wang, Yu Li, Chen Fang, Ying Shan, Bin Feng, Wenyu Liu

For temporal information modeling in VIS, we present a novel crossover learning scheme that uses the instance feature in the current frame to pixel-wisely localize the same instance in other frames.

Ranked #34 on Video Instance Segmentation on OVIS validation

Instance Segmentation Semantic Segmentation +2

Paper
Code

Diversity Transfer Network for Few-Shot Learning

1 code implementation • 31 Dec 2019 • Mengting Chen, Yuxin Fang, Xinggang Wang, Heng Luo, Yifeng Geng, Xin-Yu Zhang, Chang Huang, Wenyu Liu, Bo wang

The learning problem of the sample generation (i. e., diversity transfer) is solved via minimizing an effective meta-classification loss in a single-stage network, instead of the generative loss in previous works.

Few-Shot Learning

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.