Search Results for author: Xiaosong Zhang

Found 14 papers, 10 papers with code

Generative Multimodal Models are In-Context Learners

1 code implementation CVPR 2024 Quan Sun, Yufeng Cui, Xiaosong Zhang, Fan Zhang, Qiying Yu, Zhengxiong Luo, Yueze Wang, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang

The human ability to easily solve multimodal tasks in context (i. e., with only a few demonstrations or simple instructions), is what current multimodal systems have largely struggled to imitate.

In-Context Learning Question Answering +2

CapsFusion: Rethinking Image-Text Data at Scale

1 code implementation CVPR 2024 Qiying Yu, Quan Sun, Xiaosong Zhang, Yufeng Cui, Fan Zhang, Yue Cao, Xinlong Wang, Jingjing Liu

To provide higher-quality and more scalable multimodal pretraining data, we propose CapsFusion, an advanced framework that leverages large language models to consolidate and refine information from both web-based image-text pairs and synthetic captions.

World Knowledge

Emu: Generative Pretraining in Multimodality

2 code implementations11 Jul 2023 Quan Sun, Qiying Yu, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Yueze Wang, Hongcheng Gao, Jingjing Liu, Tiejun Huang, Xinlong Wang

We present Emu, a Transformer-based multimodal foundation model, which can seamlessly generate images and texts in multimodal context.

Image Captioning Temporal/Casual QA +4

SegGPT: Segmenting Everything In Context

1 code implementation6 Apr 2023 Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang

We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.

 Ranked #1 on Few-Shot Semantic Segmentation on PASCAL-5i (5-Shot) (using extra training data)

Few-Shot Semantic Segmentation In-Context Learning +5

SegGPT: Towards Segmenting Everything in Context

no code implementations ICCV 2023 Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang

We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.

Few-Shot Semantic Segmentation In-Context Learning +4

HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling

1 code implementation30 May 2022 Xiaosong Zhang, Yunjie Tian, Wei Huang, Qixiang Ye, Qi Dai, Lingxi Xie, Qi Tian

A key idea of efficient implementation is to discard the masked image patches (or tokens) throughout the target network (encoder), which requires the encoder to be a plain vision transformer (e. g., ViT), albeit hierarchical vision transformers (e. g., Swin Transformer) have potentially better properties in formulating vision inputs.

Transfer Learning

Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

3 code implementations ICCV 2023 Feng Liu, Xiaosong Zhang, Zhiliang Peng, Zonghao Guo, Fang Wan, Xiangyang Ji, Qixiang Ye

Except for the backbone networks, however, other components such as the detector head and the feature pyramid network (FPN) remain trained from scratch, which hinders fully tapping the potential of representation models.

Decoder Few-Shot Object Detection +3

Long-tailed Distribution Adaptation

1 code implementation6 Oct 2021 Zhiliang Peng, Wei Huang, Zonghao Guo, Xiaosong Zhang, Jianbin Jiao, Qixiang Ye

We propose to jointly optimize empirical risks of the unbalanced and balanced domains and approximate their domain divergence by intra-class and inter-class distances, with the aim to adapt models trained on the long-tailed distribution to general distributions in an interpretable way.

Domain Adaptation Instance Segmentation +3

Beyond Bounding-Box: Convex-Hull Feature Adaptation for Oriented and Densely Packed Object Detection

2 code implementations CVPR 2021 Zonghao Guo, Chang Liu, Xiaosong Zhang, Jianbin Jiao, Xiangyang Ji, Qixiang Ye

Detecting oriented and densely packed objects remains challenging for spatial feature aliasing caused by the intersection of reception fields between objects.

Ranked #34 on Object Detection In Aerial Images on DOTA (using extra training data)

object-detection Object Detection In Aerial Images

FreeAnchor: Learning to Match Anchors for Visual Object Detection

4 code implementations NeurIPS 2019 Xiaosong Zhang, Fang Wan, Chang Liu, Rongrong Ji, Qixiang Ye

In this study, we propose a learning-to-match approach to break IoU restriction, allowing objects to match anchors in a flexible manner.

Object object-detection +1

Adversarial Samples on Android Malware Detection Systems for IoT Systems

no code implementations12 Feb 2019 Xiaolei Liu, Xiaojiang Du, Xiaosong Zhang, Qingxin Zhu, Mohsen Guizani

An automated testing framework is needed to help these learning-based malware detection systems for IoT devices perform security analysis.

Android Malware Detection Malware Detection

A Black-box Attack on Neural Networks Based on Swarm Evolutionary Algorithm

no code implementations26 Jan 2019 Xiaolei Liu, Yuheng Luo, Xiaosong Zhang, Qingxin Zhu

Our experimental results show that both the MNIST images and the CIFAR-10 images can be perturbed to successful generate a black-box attack with 100\% probability on average.

Evolutionary Algorithms

Weighted-Sampling Audio Adversarial Example Attack

no code implementations26 Jan 2019 Xiaolei Liu, Xiaosong Zhang, Kun Wan, Qingxin Zhu, Yufei Ding

In this paper, we propose~\textit{weighted-sampling audio adversarial examples}, focusing on the numbers and the weights of distortion to reinforce the attack.

Adversarial Attack Automatic Speech Recognition +3

Cannot find the paper you are looking for? You can Submit a new open access paper.