no code implementations • 14 Oct 2023 • Jiachen Li, Qiaozi Gao, Michael Johnston, Xiaofeng Gao, Xuehai He, Suhaila Shakiah, Hangjie Shi, Reza Ghanadan, William Yang Wang
Inspired by their success in language tasks, existing research has leveraged LLMs in embodied instruction following and task planning.
1 code implementation • 3 Oct 2023 • Kaizhi Zheng, Xuehai He, Xin Eric Wang
The effectiveness of Multimodal Large Language Models (MLLMs) demonstrates a profound capability in multimodal understanding.
1 code implementation • NeurIPS 2023 • Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang
When combined with a downstream image generation model, LayoutGPT outperforms text-to-image models/systems by 20-40% and achieves comparable performance as human users in designing visual layouts for numerical and spatial correctness.
1 code implementation • 18 May 2023 • Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang
Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation.
no code implementations • 30 Apr 2023 • Xuehai He, Xin Eric Wang
Despite the success of Transformer models in vision and language tasks, they often learn knowledge from enormous data implicitly and cannot utilize structured input data directly.
1 code implementation • 9 Dec 2022 • Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang
In this work, we improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions.
1 code implementation • 25 Nov 2022 • Kenan Jiang, Xuehai He, Ruize Xu, Xin Eric Wang
Contrastive Language-Image Pretraining (CLIP) has demonstrated great zero-shot performance for matching images and text.
no code implementations • 19 Oct 2022 • Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang
Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP.
no code implementations • 28 Aug 2022 • Kaizhi Zheng, Kaiwen Zhou, Jing Gu, Yue Fan, Jialu Wang, Zonglin Di, Xuehai He, Xin Eric Wang
Building a conversational embodied agent to execute real-life tasks has been a long-standing yet quite challenging research goal, as it requires effective human-agent communication, multi-modal understanding, long-range sequential decision making, etc.
2 code implementations • 29 Mar 2022 • Xuehai He, Chunyuan Li, Pengchuan Zhang, Jianwei Yang, Xin Eric Wang
In this paper, we aim to study parameter-efficient model adaptation strategies for vision transformers on the image classification task.
1 code implementation • ACL 2021 • Meng Zhou, Zechen Li, Bowen Tan, Guangtao Zeng, Wenmian Yang, Xuehai He, Zeqian Ju, Subrato Chakravorty, Shu Chen, Xingyi Yang, Yichen Zhang, Qingyang Wu, Zhou Yu, Kun Xu, Eric Xing, Pengtao Xie
Training complex dialog generation models on small datasets bears high risk of overfitting.
1 code implementation • ACL 2021 • Xuehai He, Zhuo Cai, Wenlan Wei, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie
In this paper, we aim to develop a pathological visual question answering framework to analyze pathology images and answer medical questions related to these images.
no code implementations • 28 Dec 2020 • Xingchen Zhao, Xuehai He, Pengtao Xie
We propose a novel machine learning framework referred to as learning by ignoring (LBI).
no code implementations • 6 Oct 2020 • Xuehai He, Zhuo Cai, Wenlan Wei, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie
To deal with the issue that a publicly available pathology VQA dataset is lacking, we create PathVQA dataset.
1 code implementation • 19 Jun 2020 • Xingyi Yang, Xuehai He, Yuxiao Liang, Yue Yang, Shanghang Zhang, Pengtao Xie
There has not been a clear understanding on what properties of data and tasks render one approach outperforms the other.
1 code implementation • 11 May 2020 • Wenmian Yang, Guangtao Zeng, Bowen Tan, Zeqian Ju, Subrato Chakravorty, Xuehai He, Shu Chen, Xingyi Yang, Qingyang Wu, Zhou Yu, Eric Xing, Pengtao Xie
On these two datasets, we train several dialogue generation models based on Transformer, GPT, and BERT-GPT.
1 code implementation • medRxiv 2020 • Xuehai He, Xingyi Yang, Shanghang Zhang, Jinyu Zhao, Yichen Zhang, Eric Xing, Pengtao Xie
Besides, these works require a large number of CTs to train accurate diagnosis models, which are difficult to obtain.
1 code implementation • arXiv 2020 • Xuehai He, Shu Chen, Zeqian Ju, Xiangyu Dong, Hongchao Fang, Sicheng Wang, Yue Yang, Jiaqi Zeng, Ruisi Zhang, Ruoyu Zhang, Meng Zhou, Penghui Zhu, Pengtao Xie
Medical dialogue systems are promising in assisting in telemedicine to increase access to healthcare services, improve the quality of patient care, and reduce medical costs.
19 code implementations • 30 Mar 2020 • Xingyi Yang, Xuehai He, Jinyu Zhao, Yichen Zhang, Shanghang Zhang, Pengtao Xie
Using this dataset, we develop diagnosis methods based on multi-task learning and self-supervised learning, that achieve an F1 of 0. 90, an AUC of 0. 98, and an accuracy of 0. 89.
6 code implementations • 7 Mar 2020 • Xuehai He, Yichen Zhang, Luntian Mou, Eric Xing, Pengtao Xie
To achieve this goal, the first step is to create a visual question answering (VQA) dataset where the AI agent is presented with a pathology image together with a question and is asked to give the correct answer.