Contrastive Language-Image Pretraining (CLIP) has demonstrated great zero-shot performance for image-text matching because of its holistic use of natural language supervision that covers large-scale, open-world visual concepts.
Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP.
Building a conversational embodied agent to execute real-life tasks has been a long-standing yet quite challenging research goal, as it requires effective human-agent communication, multi-modal understanding, long-range sequential decision making, etc.
In this paper, we aim to study parameter-efficient fine-tuning strategies for Vision Transformers on vision tasks.
1 code implementation • • Meng Zhou, Zechen Li, Bowen Tan, Guangtao Zeng, Wenmian Yang, Xuehai He, Zeqian Ju, Subrato Chakravorty, Shu Chen, Xingyi Yang, Yichen Zhang, Qingyang Wu, Zhou Yu, Kun Xu, Eric Xing, Pengtao Xie
Training complex dialog generation models on small datasets bears high risk of overfitting.
In this paper, we aim to develop a pathological visual question answering framework to analyze pathology images and answer medical questions related to these images.
To deal with the issue that a publicly available pathology VQA dataset is lacking, we create PathVQA dataset.
There has not been a clear understanding on what properties of data and tasks render one approach outperforms the other.
On these two datasets, we train several dialogue generation models based on Transformer, GPT, and BERT-GPT.
Besides, these works require a large number of CTs to train accurate diagnosis models, which are difficult to obtain.
Medical dialogue systems are promising in assisting in telemedicine to increase access to healthcare services, improve the quality of patient care, and reduce medical costs.
Using this dataset, we develop diagnosis methods based on multi-task learning and self-supervised learning, that achieve an F1 of 0. 90, an AUC of 0. 98, and an accuracy of 0. 89.
To achieve this goal, the first step is to create a visual question answering (VQA) dataset where the AI agent is presented with a pathology image together with a question and is asked to give the correct answer.