Search Results for author: Xiaoqian Shen

Found 9 papers, 7 papers with code

StoryGPT-V: Large Language Models as Consistent Story Visualizers

1 code implementation4 Dec 2023 Xiaoqian Shen, Mohamed Elhoseiny

Therefore, we introduce \textbf{StoryGPT-V}, which leverages the merits of the latent diffusion (LDM) and LLM to produce images with consistent and high-quality characters grounded on given story descriptions.

Language Modelling Large Language Model +2

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

1 code implementation14 Oct 2023 Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong, Mohamed Elhoseiny

Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others.

Language Modelling Large Language Model +4

Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations

no code implementations30 Aug 2023 Kilichbek Haydarov, Xiaoqian Shen, Avinash Madasu, Mahmoud Salem, Li-Jia Li, Gamaleldin Elsayed, Mohamed Elhoseiny

We introduce Affective Visual Dialog, an emotion explanation and reasoning task as a testbed for research on understanding the formation of emotions in visually grounded conversations.

Explanation Generation Question Answering +1

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

5 code implementations20 Apr 2023 Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny

Our work, for the first time, uncovers that properly aligning the visual features with an advanced large language model can possess numerous advanced multi-modal abilities demonstrated by GPT-4, such as detailed image description generation and website creation from hand-drawn drafts.

Language Modelling Large Language Model +3

MoStGAN-V: Video Generation with Temporal Motion Styles

1 code implementation CVPR 2023 Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny

Video generation remains a challenging task due to spatiotemporal complexity and the requirement of synthesizing diverse motions with temporal consistency.

Video Generation

ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions

1 code implementation12 Mar 2023 Deyao Zhu, Jun Chen, Kilichbek Haydarov, Xiaoqian Shen, Wenxuan Zhang, Mohamed Elhoseiny

By keeping acquiring new visual information from BLIP-2's answers, ChatCaptioner is able to generate more enriched image descriptions.

Image Captioning Question Answering +1

Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification

1 code implementation2 Mar 2022 Kai Yi, Xiaoqian Shen, Yunhao Gou, Mohamed Elhoseiny

The main question we address in this paper is how to scale up visual recognition of unseen classes, also known as zero-shot learning, to tens of thousands of categories as in the ImageNet-21K benchmark.

Image Classification Zero-Shot Image Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.