Search Results for author: Yanzhe Zhang

Found 14 papers, 9 papers with code

Attacking Vision-Language Computer Agents via Pop-ups

1 code implementation4 Nov 2024 Yanzhe Zhang, Tao Yu, Diyi Yang

Autonomous agents powered by large vision and language models (VLM) have demonstrated significant potential in completing daily computer tasks, such as browsing the web to book travel and operating desktop software, which requires agents to understand these interfaces.

Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping

no code implementations21 Oct 2024 Ryan Li, Yanzhe Zhang, Diyi Yang

Sketches are a natural and accessible medium for UI designers to conceptualize early-stage ideas.

Benchmarking

Distilling an End-to-End Voice Assistant Without Instruction Training Data

no code implementations3 Oct 2024 William Held, Ella Li, Michael Ryan, Weiyan Shi, Yanzhe Zhang, Diyi Yang

We show that our Distilled Voice Assistant (DiVA) generalizes to Spoken Question Answering, Classification, and Translation.

Question Answering

TRINS: Towards Multimodal Language Models that Can Read

no code implementations CVPR 2024 Ruiyi Zhang, Yanzhe Zhang, Jian Chen, Yufan Zhou, Jiuxiang Gu, Changyou Chen, Tong Sun

In this work, we introduce TRINS: a Text-Rich image INStruction dataset, with the objective of enhancing the reading ability of the multimodal large language model.

Language Modelling Large Language Model +1

Best Practices and Lessons Learned on Synthetic Data

no code implementations11 Apr 2024 Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai

The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs.

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

no code implementations5 Mar 2024 Chenglei Si, Yanzhe Zhang, Ryan Li, Zhengyuan Yang, Ruibo Liu, Diyi Yang

Specifically, we manually curate 484 diverse real-world webpages as test cases and develop a set of automatic evaluation metrics to assess how well current multimodal LLMs can generate the code implementations that directly render into the given reference webpages, given the screenshots as input.

Benchmarking Code Generation

A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration

1 code implementation3 Oct 2023 Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, Diyi Yang

On specific subjects in MMLU, selecting a team of agents in the team optimization stage improves accuracy by up to 25. 0% in DyLAN.

Arithmetic Reasoning Code Generation +5

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

1 code implementation29 Jun 2023 Yanzhe Zhang, Ruiyi Zhang, Jiuxiang Gu, Yufan Zhou, Nedim Lipka, Diyi Yang, Tong Sun

Instruction tuning unlocks the superior capability of Large Language Models (LLM) to interact with humans.

16k Image Captioning +3

Auditing Gender Presentation Differences in Text-to-Image Models

1 code implementation7 Feb 2023 Yanzhe Zhang, Lu Jiang, Greg Turk, Diyi Yang

Text-to-image models, which can generate high-quality images based on textual input, have recently enabled various content-creation tools.

Robustness of Demonstration-based Learning Under Limited Data Scenario

1 code implementation19 Oct 2022 Hongxin Zhang, Yanzhe Zhang, Ruiyi Zhang, Diyi Yang

Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.

Few-shot NER

Continual Sequence Generation with Adaptive Compositional Modules

2 code implementations ACL 2022 Yanzhe Zhang, Xuezhi Wang, Diyi Yang

Continual learning is essential for real-world deployment when there is a need to quickly adapt the model to new tasks without forgetting knowledge of old tasks.

Continual Learning Transfer Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.