Search Results for author: Yuxin Fang

Found 12 papers, 10 papers with code

EVA-CLIP: Improved Training Techniques for CLIP at Scale

3 code implementations27 Mar 2023 Quan Sun, Yuxin Fang, Ledell Wu, Xinlong Wang, Yue Cao

Our approach incorporates new techniques for representation learning, optimization, and augmentation, enabling EVA-CLIP to achieve superior performance compared to previous CLIP models with the same number of parameters but significantly smaller training costs.

Image Classification Representation Learning +2

EVA-02: A Visual Representation for Neon Genesis

6 code implementations20 Mar 2023 Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao

We launch EVA-02, a next-generation Transformer-based visual representation pre-trained to reconstruct strong and robust language-aligned vision features via masked image modeling.

Temporally Efficient Vision Transformer for Video Instance Segmentation

3 code implementations CVPR 2022 Shusheng Yang, Xinggang Wang, Yu Li, Yuxin Fang, Jiemin Fang, Wenyu Liu, Xun Zhao, Ying Shan

To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision Transformer (TeViT) for video instance segmentation (VIS).

Instance Segmentation Semantic Segmentation +1

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection

2 code implementations ICCV 2023 Yuxin Fang, Shusheng Yang, Shijie Wang, Yixiao Ge, Ying Shan, Xinggang Wang

We present an approach to efficiently and effectively adapt a masked image modeling (MIM) pre-trained vanilla Vision Transformer (ViT) for object detection, which is based on our two novel observations: (i) A MIM pre-trained vanilla ViT encoder can work surprisingly well in the challenging object-level recognition scenario even with randomly sampled partial observations, e. g., only 25% $\sim$ 50% of the input embeddings.

Instance Segmentation Object +2

Corrupted Image Modeling for Self-Supervised Visual Pre-Training

no code implementations7 Feb 2022 Yuxin Fang, Li Dong, Hangbo Bao, Xinggang Wang, Furu Wei

Given this corrupted image, an enhancer network learns to either recover all the original image pixels, or predict whether each visual token is replaced by a generator sample or not.

Image Classification Semantic Segmentation

What Makes for Hierarchical Vision Transformer?

no code implementations5 Jul 2021 Yuxin Fang, Xinggang Wang, Rui Wu, Wenyu Liu

Recent studies indicate that hierarchical Vision Transformer with a macro architecture of interleaved non-overlapped window-based self-attention \& shifted-window operation is able to achieve state-of-the-art performance in various visual recognition tasks, and challenges the ubiquitous convolutional neural networks (CNNs) using densely slid kernels.

Instance Segmentation object-detection +3

Tracking Instances as Queries

1 code implementation22 Jun 2021 Shusheng Yang, Yuxin Fang, Xinggang Wang, Yu Li, Ying Shan, Bin Feng, Wenyu Liu

Recently, query based deep networks catch lots of attention owing to their end-to-end pipeline and competitive results on several fundamental computer vision tasks, such as object detection, semantic segmentation, and instance segmentation.

Instance Segmentation object-detection +4

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

2 code implementations NeurIPS 2021 Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu

Can Transformer perform 2D object- and region-level recognition from a pure sequence-to-sequence perspective with minimal knowledge about the 2D spatial structure?

Object object-detection +1

Instances as Queries

5 code implementations ICCV 2021 Yuxin Fang, Shusheng Yang, Xinggang Wang, Yu Li, Chen Fang, Ying Shan, Bin Feng, Wenyu Liu

The key insight of QueryInst is to leverage the intrinsic one-to-one correspondence in object queries across different stages, as well as one-to-one correspondence between mask RoI features and object queries in the same stage.

Ranked #13 on Object Detection on COCO-O (using extra training data)

Instance Segmentation Object +4

Crossover Learning for Fast Online Video Instance Segmentation

1 code implementation ICCV 2021 Shusheng Yang, Yuxin Fang, Xinggang Wang, Yu Li, Chen Fang, Ying Shan, Bin Feng, Wenyu Liu

For temporal information modeling in VIS, we present a novel crossover learning scheme that uses the instance feature in the current frame to pixel-wisely localize the same instance in other frames.

Instance Segmentation Semantic Segmentation +2

Diversity Transfer Network for Few-Shot Learning

1 code implementation31 Dec 2019 Mengting Chen, Yuxin Fang, Xinggang Wang, Heng Luo, Yifeng Geng, Xin-Yu Zhang, Chang Huang, Wenyu Liu, Bo wang

The learning problem of the sample generation (i. e., diversity transfer) is solved via minimizing an effective meta-classification loss in a single-stage network, instead of the generative loss in previous works.

Few-Shot Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.