Search Results for author: Qinglong Zhang

Found 9 papers, 4 papers with code

Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight

no code implementations22 Jul 2024 Ziyuan Huang, Kaixiang Ji, Biao Gong, Zhiwu Qing, Qinglong Zhang, Kecheng Zheng, Jian Wang, Jingdong Chen, Ming Yang

This paper introduces Chain-of-Sight, a vision-language bridge module that accelerates the pre-training of Multimodal Large Language Models (MLLMs).

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

no code implementations NeurIPS 2023 Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo

In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities.

Image Captioning Language Modelling +3

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

2 code implementations9 May 2023 Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Zeqiang Lai, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, LiMin Wang, Ping Luo, Jifeng Dai, Yu Qiao

Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, the proposed iGPT significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios where the number of objects is greater than 2.

Language Modelling

FedKNOW: Federated Continual Learning with Signature Task Knowledge Integration at Edge

no code implementations4 Dec 2022 Yaxin Luopan, Rui Han, Qinglong Zhang, Chi Harold Liu, Guoren Wang

Upon training for a new task, the gradient integrator ensures the prevention of catastrophic forgetting and mitigation of negative knowledge transfer by effectively combining signature tasks identified from the past local tasks and other clients' current tasks through the global model.

Continual Learning Transfer Learning

LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

no code implementations18 Dec 2021 Rui Han, Qinglong Zhang, Chi Harold Liu, Guoren Wang, Jian Tang, Lydia Y. Chen

The prior art sheds light on exploring the accuracy-resource tradeoff by scaling the model sizes in accordance to resource dynamics.

Knowledge Distillation Model Compression +1

ResT: An Efficient Transformer for Visual Recognition

5 code implementations NeurIPS 2021 Qinglong Zhang, YuBin Yang

This paper presents an efficient multi-scale vision Transformer, called ResT, that capably served as a general-purpose backbone for image recognition.

Diversity Image Classification

Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

1 code implementation25 Mar 2021 Qinglong Zhang, Lu Rao, YuBin Yang

In each group, the sub-activations are summed and de-noised as an initial mask.

Cannot find the paper you are looking for? You can Submit a new open access paper.