Search Results for author: Xuehai He

Found 20 papers, 14 papers with code

MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

1 code implementation3 Oct 2023 Kaizhi Zheng, Xuehai He, Xin Eric Wang

The effectiveness of Multimodal Large Language Models (MLLMs) demonstrates a profound capability in multimodal understanding.

Image Generation multimodal generation +2

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

1 code implementation NeurIPS 2023 Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang

When combined with a downstream image generation model, LayoutGPT outperforms text-to-image models/systems by 20-40% and achieves comparable performance as human users in designing visual layouts for numerical and spatial correctness.

Indoor Scene Synthesis Text-to-Image Generation

Multimodal Graph Transformer for Multimodal Question Answering

no code implementations30 Apr 2023 Xuehai He, Xin Eric Wang

Despite the success of Transformer models in vision and language tasks, they often learn knowledge from enormous data implicitly and cannot utilize structured input data directly.

Question Answering

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

1 code implementation9 Dec 2022 Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang

In this work, we improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions.

Attribute Image Generation

ComCLIP: Training-Free Compositional Image and Text Matching

1 code implementation25 Nov 2022 Kenan Jiang, Xuehai He, Ruize Xu, Xin Eric Wang

Contrastive Language-Image Pretraining (CLIP) has demonstrated great zero-shot performance for matching images and text.

Image-text matching Retrieval +2

CPL: Counterfactual Prompt Learning for Vision and Language Models

no code implementations19 Oct 2022 Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP.

counterfactual Visual Question Answering

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents

no code implementations28 Aug 2022 Kaizhi Zheng, Kaiwen Zhou, Jing Gu, Yue Fan, Jialu Wang, Zonglin Di, Xuehai He, Xin Eric Wang

Building a conversational embodied agent to execute real-life tasks has been a long-standing yet quite challenging research goal, as it requires effective human-agent communication, multi-modal understanding, long-range sequential decision making, etc.

Action Generation Common Sense Reasoning +1

Parameter-efficient Model Adaptation for Vision Transformers

2 code implementations29 Mar 2022 Xuehai He, Chunyuan Li, Pengchuan Zhang, Jianwei Yang, Xin Eric Wang

In this paper, we aim to study parameter-efficient model adaptation strategies for vision transformers on the image classification task.

Benchmarking Classification +2

Towards Visual Question Answering on Pathology Images

1 code implementation ACL 2021 Xuehai He, Zhuo Cai, Wenlan Wei, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie

In this paper, we aim to develop a pathological visual question answering framework to analyze pathology images and answer medical questions related to these images.

Decision Making Question Answering +1

Pathological Visual Question Answering

no code implementations6 Oct 2020 Xuehai He, Zhuo Cai, Wenlan Wei, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie

To deal with the issue that a publicly available pathology VQA dataset is lacking, we create PathVQA dataset.

Question Answering Self-Supervised Learning +1

Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining Paradigms

1 code implementation19 Jun 2020 Xingyi Yang, Xuehai He, Yuxiao Liang, Yue Yang, Shanghang Zhang, Pengtao Xie

There has not been a clear understanding on what properties of data and tasks render one approach outperforms the other.

Self-Supervised Learning Transfer Learning

MedDialog: Two Large-scale Medical Dialogue Datasets

1 code implementation arXiv 2020 Xuehai He, Shu Chen, Zeqian Ju, Xiangyu Dong, Hongchao Fang, Sicheng Wang, Yue Yang, Jiaqi Zeng, Ruisi Zhang, Ruoyu Zhang, Meng Zhou, Penghui Zhu, Pengtao Xie

Medical dialogue systems are promising in assisting in telemedicine to increase access to healthcare services, improve the quality of patient care, and reduce medical costs.

Vocal Bursts Valence Prediction

COVID-CT-Dataset: A CT Scan Dataset about COVID-19

19 code implementations30 Mar 2020 Xingyi Yang, Xuehai He, Jinyu Zhao, Yichen Zhang, Shanghang Zhang, Pengtao Xie

Using this dataset, we develop diagnosis methods based on multi-task learning and self-supervised learning, that achieve an F1 of 0. 90, an AUC of 0. 98, and an accuracy of 0. 89.

Computed Tomography (CT) COVID-19 Diagnosis +2

PathVQA: 30000+ Questions for Medical Visual Question Answering

6 code implementations7 Mar 2020 Xuehai He, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie

To achieve this goal, the first step is to create a visual question answering (VQA) dataset where the AI agent is presented with a pathology image together with a question and is asked to give the correct answer.

Medical Visual Question Answering Question Answering +1

Cannot find the paper you are looking for? You can Submit a new open access paper.