Search Results for author: Xiaoqian Shen

Found 9 papers, 7 papers with code

MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

no code implementations • 4 Apr 2024 • Kirolos Ataallah, Xiaoqian Shen, Eslam Abdelrahman, Essam Sleiman, Deyao Zhu, Jian Ding, Mohamed Elhoseiny

This paper introduces MiniGPT4-Video, a multimodal Large Language Model (LLM) designed specifically for video understanding.

Language Modelling Large Language Model +1

Paper
Add Code

StoryGPT-V: Large Language Models as Consistent Story Visualizers

1 code implementation • 4 Dec 2023 • Xiaoqian Shen, Mohamed Elhoseiny

Therefore, we introduce \textbf{StoryGPT-V}, which leverages the merits of the latent diffusion (LDM) and LLM to produce images with consistent and high-quality characters grounded on given story descriptions.

Language Modelling Large Language Model +2

Paper
Code

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

1 code implementation • 14 Oct 2023 • Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong, Mohamed Elhoseiny

Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others.

Ranked #10 on Visual Question Answering on BenchLMM

Language Modelling Large Language Model +4

24,857

Paper
Code

Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations

no code implementations • 30 Aug 2023 • Kilichbek Haydarov, Xiaoqian Shen, Avinash Madasu, Mahmoud Salem, Li-Jia Li, Gamaleldin Elsayed, Mohamed Elhoseiny

We introduce Affective Visual Dialog, an emotion explanation and reasoning task as a testbed for research on understanding the formation of emotions in visually grounded conversations.

Explanation Generation Question Answering +1

Paper
Add Code

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

5 code implementations • 20 Apr 2023 • Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny

Our work, for the first time, uncovers that properly aligning the visual features with an advanced large language model can possess numerous advanced multi-modal abilities demonstrated by GPT-4, such as detailed image description generation and website creation from hand-drawn drafts.

Ranked #9 on Visual Question Answering on BenchLMM

Language Modelling Large Language Model +3

24,857

Paper
Code

HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

1 code implementation • ICCV 2023 • Eslam Mohamed BAKR, Pengzhan Sun, Xiaoqian Shen, Faizan Farooq Khan, Li Erran Li, Mohamed Elhoseiny

A human evaluation aligned with 95% of our evaluations on average was conducted to probe the effectiveness of HRS-Bench.

Fairness Text-to-Image Generation

Paper
Code

MoStGAN-V: Video Generation with Temporal Motion Styles

1 code implementation • CVPR 2023 • Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny

Video generation remains a challenging task due to spatiotemporal complexity and the requirement of synthesizing diverse motions with temporal consistency.

Video Generation

Paper
Code

ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions

1 code implementation • 12 Mar 2023 • Deyao Zhu, Jun Chen, Kilichbek Haydarov, Xiaoqian Shen, Wenxuan Zhang, Mohamed Elhoseiny

By keeping acquiring new visual information from BLIP-2's answers, ChatCaptioner is able to generate more enriched image descriptions.

Image Captioning Question Answering +1

431

Paper
Code

Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification

1 code implementation • 2 Mar 2022 • Kai Yi, Xiaoqian Shen, Yunhao Gou, Mohamed Elhoseiny

The main question we address in this paper is how to scale up visual recognition of unseen classes, also known as zero-shot learning, to tens of thousands of categories as in the ImageNet-21K benchmark.

Image Classification Zero-Shot Image Classification +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.