Search Results for author: Xuehai He

Found 20 papers, 14 papers with code

Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning

no code implementations • 14 Oct 2023 • Jiachen Li, Qiaozi Gao, Michael Johnston, Xiaofeng Gao, Xuehai He, Suhaila Shakiah, Hangjie Shi, Reza Ghanadan, William Yang Wang

Inspired by their success in language tasks, existing research has leveraged LLMs in embodied instruction following and task planning.

In-Context Learning Instruction Following +1

Paper
Add Code

MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

1 code implementation • 3 Oct 2023 • Kaizhi Zheng, Xuehai He, Xin Eric Wang

The effectiveness of Multimodal Large Language Models (MLLMs) demonstrates a profound capability in multimodal understanding.

Image Generation multimodal generation +2

815

Paper
Code

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

1 code implementation • NeurIPS 2023 • Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang

When combined with a downstream image generation model, LayoutGPT outperforms text-to-image models/systems by 20-40% and achieves comparable performance as human users in designing visual layouts for numerical and spatial correctness.

Indoor Scene Synthesis Text-to-Image Generation

241

Paper
Code

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

1 code implementation • 18 May 2023 • Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation.

Image-text matching Text Matching +1

Paper
Code

Multimodal Graph Transformer for Multimodal Question Answering

no code implementations • 30 Apr 2023 • Xuehai He, Xin Eric Wang

Despite the success of Transformer models in vision and language tasks, they often learn knowledge from enormous data implicitly and cannot utilize structured input data directly.

Question Answering

Paper
Add Code

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

1 code implementation • 9 Dec 2022 • Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang

In this work, we improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions.

Attribute Image Generation

293

Paper
Code

ComCLIP: Training-Free Compositional Image and Text Matching

1 code implementation • 25 Nov 2022 • Kenan Jiang, Xuehai He, Ruize Xu, Xin Eric Wang

Contrastive Language-Image Pretraining (CLIP) has demonstrated great zero-shot performance for matching images and text.

Image-text matching Retrieval +2

Paper
Code

CPL: Counterfactual Prompt Learning for Vision and Language Models

no code implementations • 19 Oct 2022 • Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP.

counterfactual Visual Question Answering

Paper
Add Code

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents

no code implementations • 28 Aug 2022 • Kaizhi Zheng, Kaiwen Zhou, Jing Gu, Yue Fan, Jialu Wang, Zonglin Di, Xuehai He, Xin Eric Wang

Building a conversational embodied agent to execute real-life tasks has been a long-standing yet quite challenging research goal, as it requires effective human-agent communication, multi-modal understanding, long-range sequential decision making, etc.

Action Generation Common Sense Reasoning +1

Paper
Add Code

Parameter-efficient Model Adaptation for Vision Transformers

2 code implementations • 29 Mar 2022 • Xuehai He, Chunyuan Li, Pengchuan Zhang, Jianwei Yang, Xin Eric Wang

In this paper, we aim to study parameter-efficient model adaptation strategies for vision transformers on the image classification task.

Benchmarking Classification +2

Paper
Code

On the Generation of Medical Dialogs for COVID-19

1 code implementation • ACL 2021 • Meng Zhou, Zechen Li, Bowen Tan, Guangtao Zeng, Wenmian Yang, Xuehai He, Zeqian Ju, Subrato Chakravorty, Shu Chen, Xingyi Yang, Yichen Zhang, Qingyang Wu, Zhou Yu, Kun Xu, Eric Xing, Pengtao Xie

Training complex dialog generation models on small datasets bears high risk of overfitting.

Multi-Task Learning

153

Paper
Code

Towards Visual Question Answering on Pathology Images

1 code implementation • ACL 2021 • Xuehai He, Zhuo Cai, Wenlan Wei, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie

In this paper, we aim to develop a pathological visual question answering framework to analyze pathology images and answer medical questions related to these images.

Decision Making Question Answering +1

126

Paper
Code

Learning by Ignoring, with Application to Domain Adaptation

no code implementations • 28 Dec 2020 • Xingchen Zhao, Xuehai He, Pengtao Xie

We propose a novel machine learning framework referred to as learning by ignoring (LBI).

BIG-bench Machine Learning Domain Adaptation

Paper
Add Code

Pathological Visual Question Answering

no code implementations • 6 Oct 2020 • Xuehai He, Zhuo Cai, Wenlan Wei, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie

To deal with the issue that a publicly available pathology VQA dataset is lacking, we create PathVQA dataset.

Question Answering Self-Supervised Learning +1

Paper
Add Code

Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining Paradigms

1 code implementation • 19 Jun 2020 • Xingyi Yang, Xuehai He, Yuxiao Liang, Yue Yang, Shanghang Zhang, Pengtao Xie

There has not been a clear understanding on what properties of data and tasks render one approach outperforms the other.

Self-Supervised Learning Transfer Learning

Paper
Code

On the Generation of Medical Dialogues for COVID-19

1 code implementation • 11 May 2020 • Wenmian Yang, Guangtao Zeng, Bowen Tan, Zeqian Ju, Subrato Chakravorty, Xuehai He, Shu Chen, Xingyi Yang, Qingyang Wu, Zhou Yu, Eric Xing, Pengtao Xie

On these two datasets, we train several dialogue generation models based on Transformer, GPT, and BERT-GPT.

Dialogue Generation Transfer Learning

153

Paper
Code

Sample-Efficient Deep Learning for COVID-19 Diagnosis Based on CT Scans

1 code implementation • medRxiv 2020 • Xuehai He, Xingyi Yang, Shanghang Zhang, Jinyu Zhao, Yichen Zhang, Eric Xing, Pengtao Xie

Besides, these works require a large number of CTs to train accurate diagnosis models, which are difficult to obtain.

COVID-19 Diagnosis Self-Supervised Learning +1

1,078

Paper
Code

MedDialog: Two Large-scale Medical Dialogue Datasets

1 code implementation • arXiv 2020 • Xuehai He, Shu Chen, Zeqian Ju, Xiangyu Dong, Hongchao Fang, Sicheng Wang, Yue Yang, Jiaqi Zeng, Ruisi Zhang, Ruoyu Zhang, Meng Zhou, Penghui Zhu, Pengtao Xie

Medical dialogue systems are promising in assisting in telemedicine to increase access to healthcare services, improve the quality of patient care, and reduce medical costs.

Vocal Bursts Valence Prediction

511

Paper
Code

COVID-CT-Dataset: A CT Scan Dataset about COVID-19

19 code implementations • 30 Mar 2020 • Xingyi Yang, Xuehai He, Jinyu Zhao, Yichen Zhang, Shanghang Zhang, Pengtao Xie

Using this dataset, we develop diagnosis methods based on multi-task learning and self-supervised learning, that achieve an F1 of 0. 90, an AUC of 0. 98, and an accuracy of 0. 89.

Computed Tomography (CT) COVID-19 Diagnosis +2

1,615

Paper
Code

PathVQA: 30000+ Questions for Medical Visual Question Answering

6 code implementations • 7 Mar 2020 • Xuehai He, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie

To achieve this goal, the first step is to create a visual question answering (VQA) dataset where the AI agent is presented with a pathology image together with a question and is asked to give the correct answer.

Medical Visual Question Answering Question Answering +1

126

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.