Search Results for author: Qinglong Zhang

Found 8 papers, 4 papers with code

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

no code implementations • 25 Feb 2024 • Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao, Mingyu Ding, Ping Luo

Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI.

Ranked #72 on Visual Question Answering on MM-Vet

Code Generation Multimodal Reasoning +1

Paper
Add Code

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

2 code implementations • 21 Dec 2023 • Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai

However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs.

Ranked #1 on Zero-Shot Video Retrieval on MSR-VTT-full (using extra training data)

Image Retrieval Image-to-Text Retrieval +10

827

Paper
Code

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

no code implementations • NeurIPS 2023 • Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo

In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities.

Image Captioning Language Modelling +3

Paper
Add Code

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

2 code implementations • 9 May 2023 • Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Zeqiang Lai, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, LiMin Wang, Ping Luo, Jifeng Dai, Yu Qiao

Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, the proposed iGPT significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios where the number of objects is greater than 2.

Language Modelling

3,121

Paper
Code

FedKNOW: Federated Continual Learning with Signature Task Knowledge Integration at Edge

no code implementations • 4 Dec 2022 • Yaxin Luopan, Rui Han, Qinglong Zhang, Chi Harold Liu, Guoren Wang

Upon training for a new task, the gradient integrator ensures the prevention of catastrophic forgetting and mitigation of negative knowledge transfer by effectively combining signature tasks identified from the past local tasks and other clients' current tasks through the global model.

Continual Learning Transfer Learning

Paper
Add Code

LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

no code implementations • 18 Dec 2021 • Rui Han, Qinglong Zhang, Chi Harold Liu, Guoren Wang, Jian Tang, Lydia Y. Chen

The prior art sheds light on exploring the accuracy-resource tradeoff by scaling the model sizes in accordance to resource dynamics.

Knowledge Distillation Model Compression +1

Paper
Add Code

ResT: An Efficient Transformer for Visual Recognition

5 code implementations • NeurIPS 2021 • Qinglong Zhang, YuBin Yang

This paper presents an efficient multi-scale vision Transformer, called ResT, that capably served as a general-purpose backbone for image recognition.

Ranked #375 on Image Classification on ImageNet

Image Classification

10,834

Paper
Code

Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks

1 code implementation • 25 Mar 2021 • Qinglong Zhang, Lu Rao, YuBin Yang

In each group, the sub-activations are summed and de-noised as an initial mask.

105

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.