Search Results for author: Qinglong Zhang

Found 8 papers, 4 papers with code

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

2 code implementations21 Dec 2023 Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai

However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs.

 Ranked #1 on Zero-Shot Video Retrieval on MSR-VTT-full (using extra training data)

Image Retrieval Image-to-Text Retrieval +10

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

no code implementations NeurIPS 2023 Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo

In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities.

Image Captioning Language Modelling +3

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

2 code implementations9 May 2023 Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Zeqiang Lai, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, LiMin Wang, Ping Luo, Jifeng Dai, Yu Qiao

Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, the proposed iGPT significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios where the number of objects is greater than 2.

Language Modelling

FedKNOW: Federated Continual Learning with Signature Task Knowledge Integration at Edge

no code implementations4 Dec 2022 Yaxin Luopan, Rui Han, Qinglong Zhang, Chi Harold Liu, Guoren Wang

Upon training for a new task, the gradient integrator ensures the prevention of catastrophic forgetting and mitigation of negative knowledge transfer by effectively combining signature tasks identified from the past local tasks and other clients' current tasks through the global model.

Continual Learning Transfer Learning

LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

no code implementations18 Dec 2021 Rui Han, Qinglong Zhang, Chi Harold Liu, Guoren Wang, Jian Tang, Lydia Y. Chen

The prior art sheds light on exploring the accuracy-resource tradeoff by scaling the model sizes in accordance to resource dynamics.

Knowledge Distillation Model Compression +1

ResT: An Efficient Transformer for Visual Recognition

5 code implementations NeurIPS 2021 Qinglong Zhang, YuBin Yang

This paper presents an efficient multi-scale vision Transformer, called ResT, that capably served as a general-purpose backbone for image recognition.

Image Classification

Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

1 code implementation25 Mar 2021 Qinglong Zhang, Lu Rao, YuBin Yang

In each group, the sub-activations are summed and de-noised as an initial mask.

Cannot find the paper you are looking for? You can Submit a new open access paper.