Search Results for author: Shusheng Yang

Found 13 papers, 12 papers with code

Qwen Technical Report

2 code implementations • 28 Sep 2023 • Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou, Tianhang Zhu

Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans.

Ranked #3 on Multi-Label Text Classification on CC3M-TagMask

Language Modelling Large Language Model +2

11,708

Paper
Code

TouchStone: Evaluating Vision-Language Models by Language Models

1 code implementation • 31 Aug 2023 • Shuai Bai, Shusheng Yang, Jinze Bai, Peng Wang, Xingxuan Zhang, Junyang Lin, Xinggang Wang, Chang Zhou, Jingren Zhou

Large vision-language models (LVLMs) have recently witnessed rapid advancements, exhibiting a remarkable capacity for perceiving, understanding, and processing visual information by connecting visual receptor with large language models (LLMs).

Visual Storytelling

Paper
Code

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

1 code implementation • 24 Aug 2023 • Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, Jingren Zhou

In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images.

Ranked #3 on Visual Question Answering on MM-Vet

Chart Question Answering Image Captioning +6

4,011

Paper
Code

ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers

4 code implementations • 24 May 2023 • Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang

Recently, plain vision Transformers (ViTs) have shown impressive performance on various computer vision tasks, thanks to their strong modeling capacity and large-scale pretraining.

Ranked #2 on Image Matting on Distinctions-646

Image Matting

126,923

Paper
Code

MobileInst: Video Instance Segmentation on the Mobile

no code implementations • 30 Mar 2023 • Renhong Zhang, Tianheng Cheng, Shusheng Yang, Haoyi Jiang, Shuai Zhang, Jiancheng Lyu, Xin Li, Xiaowen Ying, Dashan Gao, Wenyu Liu, Xinggang Wang

To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices.

Decoder Instance Segmentation +3

Paper
Add Code

RILS: Masked Visual Reconstruction in Language Semantic Space

1 code implementation • CVPR 2023 • Shusheng Yang, Yixiao Ge, Kun Yi, Dian Li, Ying Shan, XiaoHu Qie, Xinggang Wang

Both masked image modeling (MIM) and natural language supervision have facilitated the progress of transferable visual pre-training.

Sentence

Paper
Code

Masked Image Modeling with Denoising Contrast

1 code implementation • 19 May 2022 • Kun Yi, Yixiao Ge, Xiaotong Li, Shusheng Yang, Dian Li, Jianping Wu, Ying Shan, XiaoHu Qie

Since the development of self-supervised visual representation learning from contrastive learning to masked image modeling (MIM), there is no significant difference in essence, that is, how to design proper pretext tasks for vision dictionary look-up.

Contrastive Learning Denoising +6

Paper
Code

Temporally Efficient Vision Transformer for Video Instance Segmentation

3 code implementations • CVPR 2022 • Shusheng Yang, Xinggang Wang, Yu Li, Yuxin Fang, Jiemin Fang, Wenyu Liu, Xun Zhao, Ying Shan

To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision Transformer (TeViT) for video instance segmentation (VIS).

Ranked #35 on Video Instance Segmentation on OVIS validation

Instance Segmentation Semantic Segmentation +1

401

Paper
Code

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection

2 code implementations • ICCV 2023 • Yuxin Fang, Shusheng Yang, Shijie Wang, Yixiao Ge, Ying Shan, Xinggang Wang

We present an approach to efficiently and effectively adapt a masked image modeling (MIM) pre-trained vanilla Vision Transformer (ViT) for object detection, which is based on our two novel observations: (i) A MIM pre-trained vanilla ViT encoder can work surprisingly well in the challenging object-level recognition scenario even with randomly sampled partial observations, e. g., only 25% $\sim$ 50% of the input embeddings.

Instance Segmentation Object +2

2,016

Paper
Code

Relational Surrogate Loss Learning

1 code implementation • ICLR 2022 • Tao Huang, Zekang Li, Hua Lu, Yong Shan, Shusheng Yang, Yang Feng, Fei Wang, Shan You, Chang Xu

Evaluation metrics in machine learning are often hardly taken as loss functions, as they could be non-differentiable and non-decomposable, e. g., average precision and F1 score.

Image Classification Machine Reading Comprehension +3

Paper
Code

Tracking Instances as Queries

1 code implementation • 22 Jun 2021 • Shusheng Yang, Yuxin Fang, Xinggang Wang, Yu Li, Ying Shan, Bin Feng, Wenyu Liu

Recently, query based deep networks catch lots of attention owing to their end-to-end pipeline and competitive results on several fundamental computer vision tasks, such as object detection, semantic segmentation, and instance segmentation.

Instance Segmentation object-detection +4

401

Paper
Code

Instances as Queries

5 code implementations • ICCV 2021 • Yuxin Fang, Shusheng Yang, Xinggang Wang, Yu Li, Chen Fang, Ying Shan, Bin Feng, Wenyu Liu

The key insight of QueryInst is to leverage the intrinsic one-to-one correspondence in object queries across different stages, as well as one-to-one correspondence between mask RoI features and object queries in the same stage.

Ranked #13 on Object Detection on COCO-O

Instance Segmentation Object +4

28,164

Paper
Code

Crossover Learning for Fast Online Video Instance Segmentation

1 code implementation • ICCV 2021 • Shusheng Yang, Yuxin Fang, Xinggang Wang, Yu Li, Chen Fang, Ying Shan, Bin Feng, Wenyu Liu

For temporal information modeling in VIS, we present a novel crossover learning scheme that uses the instance feature in the current frame to pixel-wisely localize the same instance in other frames.

Ranked #34 on Video Instance Segmentation on OVIS validation

Instance Segmentation Semantic Segmentation +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.