Search Results for author: An Yan

Found 22 papers, 10 papers with code

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

1 code implementation • 25 Apr 2024 • An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, JianFeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang

Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image.

Ranked #47 on Visual Question Answering on MM-Vet

Visual Grounding Visual Question Answering +1

Paper
Code

Bridging Language and Items for Retrieval and Recommendation

1 code implementation • 6 Mar 2024 • Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, Julian McAuley

This paper introduces BLaIR, a series of pretrained sentence embedding models specialized for recommendation scenarios.

Retrieval Sentence +2

Paper
Code

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

2 code implementations • 13 Nov 2023 • An Yan, Zhengyuan Yang, Wanrong Zhu, Kevin Lin, Linjie Li, JianFeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang

We first benchmark MM-Navigator on our collected iOS screen dataset.

Action Localization

106

Paper
Code

GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks

no code implementations • 2 Nov 2023 • Xinlu Zhang, Yujie Lu, Weizhi Wang, An Yan, Jun Yan, Lianke Qin, Heng Wang, Xifeng Yan, William Yang Wang, Linda Ruth Petzold

Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details.

Image Generation

Paper
Add Code

Driving through the Concept Gridlock: Unraveling Explainability Bottlenecks in Automated Driving

1 code implementation • 25 Oct 2023 • Jessica Echterhoff, An Yan, Kyungtae Han, Amr Abdelraouf, Rohit Gupta, Julian McAuley

In the context of human-assisted or autonomous driving, explainability models can help user acceptance and understanding of decisions made by the autonomous vehicle, which can be used to rationalize and explain driver or vehicle behavior.

Autonomous Driving

Paper
Code

MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation

no code implementations • 21 Oct 2023 • Zexue He, Yu Wang, An Yan, Yao Liu, Eric Y. Chang, Amilcare Gentili, Julian McAuley, Chun-Nan Hsu

Curated datasets for healthcare are often limited due to the need of human annotations from experts.

Benchmarking Language Modelling

Paper
Add Code

Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models

no code implementations • 4 Oct 2023 • An Yan, Yu Wang, Yiwu Zhong, Zexue He, Petros Karypis, Zihan Wang, chengyu dong, Amilcare Gentili, Chun-Nan Hsu, Jingbo Shang, Julian McAuley

Medical image classification is a critical problem for healthcare, with the potential to alleviate the workload of doctors and facilitate diagnoses of patients.

Image Classification Language Modelling +1

Paper
Add Code

Learning Concise and Descriptive Attributes for Visual Recognition

1 code implementation • ICCV 2023 • An Yan, Yu Wang, Yiwu Zhong, chengyu dong, Zexue He, Yujie Lu, William Wang, Jingbo Shang, Julian McAuley

Recent advances in foundation models present new opportunities for interpretable visual recognition -- one can first query Large Language Models (LLMs) to obtain a set of attributes that describe each class, then apply vision-language models to classify images via these attributes.

Descriptive

Paper
Code

Comparing Apples to Apples: Generating Aspect-Aware Comparative Sentences from User Reviews

no code implementations • 5 Jul 2023 • Jessica Echterhoff, An Yan, Julian McAuley

It is time-consuming to find the best product among many similar alternatives.

Paper
Add Code

"Nothing Abnormal": Disambiguating Medical Reports via Contrastive Knowledge Infusion

no code implementations • 15 May 2023 • Zexue He, An Yan, Amilcare Gentili, Julian McAuley, Chun-Nan Hsu

Based on our analysis, we define a disambiguation rewriting task to regenerate an input to be unambiguous while preserving information about the original content.

Paper
Add Code

CLIP also Understands Text: Prompting CLIP for Phrase Understanding

no code implementations • 11 Oct 2022 • An Yan, Jiacheng Li, Wanrong Zhu, Yujie Lu, William Yang Wang, Julian McAuley

However, the application of its text encoder solely for text understanding has been less explored.

Clustering Transfer Learning

Paper
Add Code

Visualize Before You Write: Imagination-Guided Open-Ended Text Generation

1 code implementation • 7 Oct 2022 • Wanrong Zhu, An Yan, Yujie Lu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

Recent advances in text-to-image synthesis make it possible to visualize machine imaginations for a given context.

Concept-To-Text Generation Image Generation +1

Paper
Code

Personalized Showcases: Generating Multi-Modal Explanations for Recommendations

no code implementations • 30 Jun 2022 • An Yan, Zhankui He, Jiacheng Li, Tianyang Zhang, Julian McAuley

In this paper, to further enrich explanations, we propose a new task named personalized showcases, in which we provide both textual and visual information to explain our recommendations.

Contrastive Learning

Paper
Add Code

Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation

1 code implementation • Findings (EMNLP) 2021 • An Yan, Zexue He, Xing Lu, Jiang Du, Eric Chang, Amilcare Gentili, Julian McAuley, Chun-Nan Hsu

Radiology report generation aims at generating descriptive text from radiology images automatically, which may present an opportunity to improve radiology reporting and interpretation.

Contrastive Learning Descriptive +2

Paper
Code

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation

no code implementations • 10 Jun 2021 • Wanrong Zhu, Xin Eric Wang, An Yan, Miguel Eckstein, William Yang Wang

Automatic evaluations for natural language generation (NLG) conventionally rely on token-level or embedding-level comparisons with text references.

nlg evaluation Text Generation

Paper
Add Code

L2C: Describing Visual Differences Needs Semantic Understanding of Individuals

no code implementations • EACL 2021 • An Yan, Xin Eric Wang, Tsu-Jui Fu, William Yang Wang

Recent advances in language and vision push forward the research of captioning a single image to describing visual differences between image pairs.

Image Captioning

Paper
Add Code

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

1 code implementation • EACL 2021 • Wanrong Zhu, Xin Eric Wang, Tsu-Jui Fu, An Yan, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang

Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment.

Ranked #4 on Vision and Language Navigation on Touchdown Dataset (using extra training data)

Style Transfer Text Style Transfer +1

Paper
Code

Cross-Lingual Vision-Language Navigation

2 code implementations • 24 Oct 2019 • An Yan, Xin Eric Wang, Jiangtao Feng, Lei LI, William Yang Wang

Commanding a robot to navigate with natural language instructions is a long-term goal for grounded language understanding and robotics.

Domain Adaptation Navigate +2

Paper
Code

CosRec: 2D Convolutional Neural Networks for Sequential Recommendation

2 code implementations • 27 Aug 2019 • An Yan, Shuo Cheng, Wang-Cheng Kang, Mengting Wan, Julian McAuley

Sequential patterns play an important role in building modern recommender systems.

Sequential Recommendation

Paper
Code

FairST: Equitable Spatial and Temporal Demand Prediction for New Mobility Systems

no code implementations • 21 Jun 2019 • An Yan, Bill Howe

Emerging transportation modes, including car-sharing, bike-sharing, and ride-hailing, are transforming urban mobility but have been shown to reinforce socioeconomic inequities.

Fairness

Paper
Add Code

PA3D: Pose-Action 3D Machine for Video Recognition

no code implementations • CVPR 2019 • An Yan, Yali Wang, Zhifeng Li, Yu Qiao

Recent studies have witnessed the successes of using 3D CNNs for video action recognition.

Ranked #2 on Skeleton Based Action Recognition on J-HMDB

Action Recognition Optical Flow Estimation +3

Paper
Add Code

Predicting Abandonment in Online Coding Tutorials

no code implementations • 13 Jul 2017 • An Yan, Michael J. Lee, Andrew J. Ko

Learners regularly abandon online coding tutorials when they get bored or frustrated, but there are few techniques for anticipating this abandonment to intervene.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.