Search Results for author: Weifeng Lin

Found 6 papers, 5 papers with code

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

1 code implementation23 Sep 2024 Weifeng Lin, Xinyu Wei, Renrui Zhang, Le Zhuo, Shitian Zhao, Siyuan Huang, Junlin Xie, Yu Qiao, Peng Gao, Hongsheng Li

Furthermore, we adopt Diffusion Transformers (DiT) as our foundation model and extend its capabilities with a flexible any resolution mechanism, enabling the model to dynamically process images based on the aspect ratio of the input, closely aligning with human perceptual processes.

Image Restoration Text-to-Image Generation

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining

2 code implementations5 Aug 2024 Dongyang Liu, Shitian Zhao, Le Zhuo, Weifeng Lin, Yu Qiao, Hongsheng Li, Peng Gao

We present Lumina-mGPT, a family of multimodal autoregressive models capable of various vision and language tasks, particularly excelling in generating flexible photorealistic images from text descriptions.

Decoder Depth Estimation +3

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

1 code implementation29 Mar 2024 Weifeng Lin, Xinyu Wei, Ruichuan An, Peng Gao, Bocheng Zou, Yulin Luo, Siyuan Huang, Shanghang Zhang, Hongsheng Li

In this paper, we introduce the Draw-and-Understand project: a new model, a multi-domain dataset, and a challenging benchmark for visual prompting.

Instruction Following Language Modelling +5

Hierarchical Side-Tuning for Vision Transformers

no code implementations9 Oct 2023 Weifeng Lin, Ziheng Wu, Wentao Yang, Mingxin Huang, Jun Huang, Lianwen Jin

In this paper, we introduce Hierarchical Side-Tuning (HST), an innovative PETL method facilitating the transfer of ViT models to diverse downstream tasks.

Image Classification Instance Segmentation +5

Scale-Aware Modulation Meet Transformer

1 code implementation ICCV 2023 Weifeng Lin, Ziheng Wu, Jiayu Chen, Jun Huang, Lianwen Jin

Specifically, SMT with 11. 5M / 2. 4GFLOPs and 32M / 7. 7GFLOPs can achieve 82. 2% and 84. 3% top-1 accuracy on ImageNet-1K, respectively.

object-detection Object Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.