Search Results for author: Deyao Zhu

Found 14 papers, 9 papers with code

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

1 code implementation14 Oct 2023 Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong, Mohamed Elhoseiny

Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others.

Language Modelling Large Language Model +4

Exploring Open-Vocabulary Semantic Segmentation without Human Labels

no code implementations1 Jun 2023 Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Mohamed Elhoseiny, Sean Chang Culatana

Although acquired extensive knowledge of visual concepts, it is non-trivial to exploit knowledge from these VL models to the task of semantic segmentation, as they are usually trained at an image level.

Open Vocabulary Semantic Segmentation Segmentation +3

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

5 code implementations20 Apr 2023 Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny

Our work, for the first time, uncovers that properly aligning the visual features with an advanced large language model can possess numerous advanced multi-modal abilities demonstrated by GPT-4, such as detailed image description generation and website creation from hand-drawn drafts.

Language Modelling Large Language Model +3

Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions

1 code implementation9 Apr 2023 Jun Chen, Deyao Zhu, Kilichbek Haydarov, Xiang Li, Mohamed Elhoseiny

Video captioning aims to convey dynamic scenes from videos using natural language, facilitating the understanding of spatiotemporal information within our environment.

Video Captioning

ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions

1 code implementation12 Mar 2023 Deyao Zhu, Jun Chen, Kilichbek Haydarov, Xiaoqian Shen, Wenxuan Zhang, Mohamed Elhoseiny

By keeping acquiring new visual information from BLIP-2's answers, ChatCaptioner is able to generate more enriched image descriptions.

Image Captioning Question Answering +1

Guiding Online Reinforcement Learning with Action-Free Offline Pretraining

1 code implementation30 Jan 2023 Deyao Zhu, Yuhui Wang, Jürgen Schmidhuber, Mohamed Elhoseiny

In this paper, we investigate the potential of using action-free offline datasets to improve online reinforcement learning, name this problem Reinforcement Learning with Action-Free Offline Pretraining (AFP-RL).

Offline RL reinforcement-learning +1

Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

1 code implementation9 Jun 2022 Deyao Zhu, Li Erran Li, Mohamed Elhoseiny

In some complex environments with continuous state-action spaces, sparse rewards, and/or long temporal horizons, learning a good policy in the original environments can be difficult.

D4RL Model-based Reinforcement Learning +3

CausalDyna: Improving Generalization of Dyna-style Reinforcement Learning via Counterfactual-Based Data Augmentation

no code implementations29 Sep 2021 Deyao Zhu, Li Erran Li, Mohamed Elhoseiny

Deep reinforcement learning agents trained in real-world environments with a limited diversity of object properties to learn manipulation tasks tend to suffer overfitting and fail to generalize to unseen testing environments.

counterfactual Data Augmentation +3

RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition

1 code implementation CVPR 2022 Jun Chen, Aniket Agarwal, Sherif Abdelkarim, Deyao Zhu, Mohamed Elhoseiny

This paper shows that modeling an effective message-passing flow through an attention mechanism can be critical to tackling the compositionality and long-tail challenges in VRR.

Image Captioning Object Recognition +5

Motion Forecasting with Unlikelihood Training

no code implementations1 Jan 2021 Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny

We propose a new objective, unlikelihood training, which forces generated trajectories that conflicts with contextual information to be assigned a lower probability by our model.

Decoder Motion Forecasting +1

Cannot find the paper you are looking for? You can Submit a new open access paper.