Search Results for author: Zhiliang Peng

Found 13 papers, 11 papers with code

iPREFER: An Intelligent Parameter Extractor based on Features for BSIM-CMG Models

no code implementations • 11 Apr 2024 • Zhiliang Peng, Yicheng Wang, Zhengwu Yuan, Xingsheng Wang

This paper introduces an innovative parameter extraction method for BSIM-CMG compact models, seamlessly integrating curve feature extraction and machine learning techniques.

Paper
Add Code

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

1 code implementation • 4 Oct 2023 • Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei

These limitations keep them far from the ultimate goal of "image as a foreign language in image generation."

Image Generation

18,319

Paper
Code

Kosmos-2: Grounding Multimodal Large Language Models to the World

2 code implementations • 26 Jun 2023 • Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e. g., bounding boxes) and grounding text to the visual world.

Ranked #11 on Visual Question Answering on ViP-Bench

Image Captioning In-Context Learning +8

18,319

Paper
Code

Generic-to-Specific Distillation of Masked Autoencoders

1 code implementation • CVPR 2023 • Wei Huang, Zhiliang Peng, Li Dong, Furu Wei, Jianbin Jiao, Qixiang Ye

Lightweight ViT models limited by the model capacity, however, benefit little from those pre-training mechanisms.

Image Classification Knowledge Distillation +3

Paper
Code

Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks

no code implementations • CVPR 2023 • Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei

A big convergence of language, vision, and multimodal pretraining is emerging.

Cross-Modal Retrieval Image Captioning +10

Paper
Add Code

A Unified View of Masked Image Modeling

1 code implementation • 19 Oct 2022 • Zhiliang Peng, Li Dong, Hangbo Bao, Qixiang Ye, Furu Wei

Masked image modeling has demonstrated great potential to eliminate the label-hungry problem of training large-scale vision Transformers, achieving impressive performance on various downstream tasks.

Image Classification Segmentation +1

Paper
Code

Foundation Transformers

4 code implementations • 12 Oct 2022 • Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei

A big convergence of model architectures across language, vision, speech, and multimodal is emerging.

Language Modelling Machine Translation +1

18,315

Paper
Code

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

2 code implementations • 22 Aug 2022 • Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei

A big convergence of language, vision, and multimodal pretraining is emerging.

Ranked #1 on Visual Reasoning on NLVR2 Test

Cross-Modal Retrieval Image Captioning +11

18,322

Paper
Code

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

2 code implementations • 12 Aug 2022 • Zhiliang Peng, Li Dong, Hangbo Bao, Qixiang Ye, Furu Wei

The large-size BEiT v2 obtains 87. 3% top-1 accuracy for ImageNet-1K (224 size) fine-tuning, and 56. 7% mIoU on ADE20K for semantic segmentation.

Ranked #27 on Self-Supervised Image Classification on ImageNet

Knowledge Distillation Representation Learning +2

18,321

Paper
Code

Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

3 code implementations • ICCV 2023 • Feng Liu, Xiaosong Zhang, Zhiliang Peng, Zonghao Guo, Fang Wan, Xiangyang Ji, Qixiang Ye

Except for the backbone networks, however, other components such as the detector head and the feature pyramid network (FPN) remain trained from scratch, which hinders fully tapping the potential of representation models.

Ranked #3 on Few-Shot Object Detection on MS-COCO (30-shot)

Few-Shot Object Detection Object +2

Paper
Code

Long-tailed Distribution Adaptation

1 code implementation • 6 Oct 2021 • Zhiliang Peng, Wei Huang, Zonghao Guo, Xiaosong Zhang, Jianbin Jiao, Qixiang Ye

We propose to jointly optimize empirical risks of the unbalanced and balanced domains and approximate their domain divergence by intra-class and inter-class distances, with the aim to adapt models trained on the long-tailed distribution to general distributions in an interpretable way.

Domain Adaptation Instance Segmentation +3

Paper
Code

Conformer: Local Features Coupling Global Representations for Visual Recognition

4 code implementations • ICCV 2021 • Zhiliang Peng, Wei Huang, Shanzhi Gu, Lingxi Xie, YaoWei Wang, Jianbin Jiao, Qixiang Ye

Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations.

Ranked #325 on Image Classification on ImageNet

Image Classification Instance Segmentation +4

3,153

Paper
Code

TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

2 code implementations • ICCV 2021 • Wei Gao, Fang Wan, Xingjia Pan, Zhiliang Peng, Qi Tian, Zhenjun Han, Bolei Zhou, Qixiang Ye

TS-CAM finally couples the patch tokens with the semantic-agnostic attention map to achieve semantic-aware localization.

Object Weakly-Supervised Object Localization

131

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.