1 code implementation • 25 Apr 2024 • An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, JianFeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang
Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image.
Ranked #47 on Visual Question Answering on MM-Vet
1 code implementation • 6 Mar 2024 • Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, Julian McAuley
This paper introduces BLaIR, a series of pretrained sentence embedding models specialized for recommendation scenarios.
2 code implementations • 13 Nov 2023 • An Yan, Zhengyuan Yang, Wanrong Zhu, Kevin Lin, Linjie Li, JianFeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang
We first benchmark MM-Navigator on our collected iOS screen dataset.
no code implementations • 2 Nov 2023 • Xinlu Zhang, Yujie Lu, Weizhi Wang, An Yan, Jun Yan, Lianke Qin, Heng Wang, Xifeng Yan, William Yang Wang, Linda Ruth Petzold
Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details.
1 code implementation • 25 Oct 2023 • Jessica Echterhoff, An Yan, Kyungtae Han, Amr Abdelraouf, Rohit Gupta, Julian McAuley
In the context of human-assisted or autonomous driving, explainability models can help user acceptance and understanding of decisions made by the autonomous vehicle, which can be used to rationalize and explain driver or vehicle behavior.
no code implementations • 21 Oct 2023 • Zexue He, Yu Wang, An Yan, Yao Liu, Eric Y. Chang, Amilcare Gentili, Julian McAuley, Chun-Nan Hsu
Curated datasets for healthcare are often limited due to the need of human annotations from experts.
no code implementations • 4 Oct 2023 • An Yan, Yu Wang, Yiwu Zhong, Zexue He, Petros Karypis, Zihan Wang, chengyu dong, Amilcare Gentili, Chun-Nan Hsu, Jingbo Shang, Julian McAuley
Medical image classification is a critical problem for healthcare, with the potential to alleviate the workload of doctors and facilitate diagnoses of patients.
1 code implementation • ICCV 2023 • An Yan, Yu Wang, Yiwu Zhong, chengyu dong, Zexue He, Yujie Lu, William Wang, Jingbo Shang, Julian McAuley
Recent advances in foundation models present new opportunities for interpretable visual recognition -- one can first query Large Language Models (LLMs) to obtain a set of attributes that describe each class, then apply vision-language models to classify images via these attributes.
no code implementations • 5 Jul 2023 • Jessica Echterhoff, An Yan, Julian McAuley
It is time-consuming to find the best product among many similar alternatives.
no code implementations • 15 May 2023 • Zexue He, An Yan, Amilcare Gentili, Julian McAuley, Chun-Nan Hsu
Based on our analysis, we define a disambiguation rewriting task to regenerate an input to be unambiguous while preserving information about the original content.
no code implementations • 11 Oct 2022 • An Yan, Jiacheng Li, Wanrong Zhu, Yujie Lu, William Yang Wang, Julian McAuley
However, the application of its text encoder solely for text understanding has been less explored.
1 code implementation • 7 Oct 2022 • Wanrong Zhu, An Yan, Yujie Lu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang
Recent advances in text-to-image synthesis make it possible to visualize machine imaginations for a given context.
no code implementations • 30 Jun 2022 • An Yan, Zhankui He, Jiacheng Li, Tianyang Zhang, Julian McAuley
In this paper, to further enrich explanations, we propose a new task named personalized showcases, in which we provide both textual and visual information to explain our recommendations.
1 code implementation • Findings (EMNLP) 2021 • An Yan, Zexue He, Xing Lu, Jiang Du, Eric Chang, Amilcare Gentili, Julian McAuley, Chun-Nan Hsu
Radiology report generation aims at generating descriptive text from radiology images automatically, which may present an opportunity to improve radiology reporting and interpretation.
no code implementations • 10 Jun 2021 • Wanrong Zhu, Xin Eric Wang, An Yan, Miguel Eckstein, William Yang Wang
Automatic evaluations for natural language generation (NLG) conventionally rely on token-level or embedding-level comparisons with text references.
no code implementations • EACL 2021 • An Yan, Xin Eric Wang, Tsu-Jui Fu, William Yang Wang
Recent advances in language and vision push forward the research of captioning a single image to describing visual differences between image pairs.
1 code implementation • EACL 2021 • Wanrong Zhu, Xin Eric Wang, Tsu-Jui Fu, An Yan, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang
Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment.
Ranked #4 on Vision and Language Navigation on Touchdown Dataset (using extra training data)
2 code implementations • 24 Oct 2019 • An Yan, Xin Eric Wang, Jiangtao Feng, Lei LI, William Yang Wang
Commanding a robot to navigate with natural language instructions is a long-term goal for grounded language understanding and robotics.
2 code implementations • 27 Aug 2019 • An Yan, Shuo Cheng, Wang-Cheng Kang, Mengting Wan, Julian McAuley
Sequential patterns play an important role in building modern recommender systems.
no code implementations • 21 Jun 2019 • An Yan, Bill Howe
Emerging transportation modes, including car-sharing, bike-sharing, and ride-hailing, are transforming urban mobility but have been shown to reinforce socioeconomic inequities.
no code implementations • CVPR 2019 • An Yan, Yali Wang, Zhifeng Li, Yu Qiao
Recent studies have witnessed the successes of using 3D CNNs for video action recognition.
Ranked #2 on Skeleton Based Action Recognition on J-HMDB
no code implementations • 13 Jul 2017 • An Yan, Michael J. Lee, Andrew J. Ko
Learners regularly abandon online coding tutorials when they get bored or frustrated, but there are few techniques for anticipating this abandonment to intervene.