1 code implementation • 4 Nov 2024 • Yanzhe Zhang, Tao Yu, Diyi Yang
Autonomous agents powered by large vision and language models (VLM) have demonstrated significant potential in completing daily computer tasks, such as browsing the web to book travel and operating desktop software, which requires agents to understand these interfaces.
no code implementations • 21 Oct 2024 • Ryan Li, Yanzhe Zhang, Diyi Yang
Sketches are a natural and accessible medium for UI designers to conceptualize early-stage ideas.
no code implementations • 3 Oct 2024 • William Held, Ella Li, Michael Ryan, Weiyan Shi, Yanzhe Zhang, Diyi Yang
We show that our Distilled Voice Assistant (DiVA) generalizes to Spoken Question Answering, Classification, and Translation.
no code implementations • CVPR 2024 • Ruiyi Zhang, Yanzhe Zhang, Jian Chen, Yufan Zhou, Jiuxiang Gu, Changyou Chen, Tong Sun
In this work, we introduce TRINS: a Text-Rich image INStruction dataset, with the objective of enhancing the reading ability of the multimodal large language model.
no code implementations • 11 Apr 2024 • Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai
The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs.
no code implementations • 5 Mar 2024 • Chenglei Si, Yanzhe Zhang, Ryan Li, Zhengyuan Yang, Ruibo Liu, Diyi Yang
Specifically, we manually curate 484 diverse real-world webpages as test cases and develop a set of automatic evaluation metrics to assess how well current multimodal LLMs can generate the code implementations that directly render into the given reference webpages, given the screenshots as input.
1 code implementation • 3 Oct 2023 • Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, Diyi Yang
On specific subjects in MMLU, selecting a team of agents in the team optimization stage improves accuracy by up to 25. 0% in DyLAN.
1 code implementation • 29 Jun 2023 • Yanzhe Zhang, Ruiyi Zhang, Jiuxiang Gu, Yufan Zhou, Nedim Lipka, Diyi Yang, Tong Sun
Instruction tuning unlocks the superior capability of Large Language Models (LLM) to interact with humans.
1 code implementation • 17 Feb 2023 • Albert Lu, Hongxin Zhang, Yanzhe Zhang, Xuezhi Wang, Diyi Yang
The limits of open-ended generative models are unclear, yet increasingly important.
1 code implementation • 7 Feb 2023 • Yanzhe Zhang, Lu Jiang, Greg Turk, Diyi Yang
Text-to-image models, which can generate high-quality images based on textual input, have recently enabled various content-creation tools.
1 code implementation • 19 Oct 2022 • Hongxin Zhang, Yanzhe Zhang, Ruiyi Zhang, Diyi Yang
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.
1 code implementation • Findings (ACL) 2022 • Aaron Reich, Jiaao Chen, Aastha Agrawal, Yanzhe Zhang, Diyi Yang
We found that state-of-the-art NER systems trained on CoNLL 2003 training data drop performance dramatically on our challenging set.
2 code implementations • ACL 2022 • Yanzhe Zhang, Xuezhi Wang, Diyi Yang
Continual learning is essential for real-world deployment when there is a need to quickly adapt the model to new tasks without forgetting knowledge of old tasks.
1 code implementation • NAACL 2021 • Yufan Huang, Yanzhe Zhang, Jiaao Chen, Xuezhi Wang, Diyi Yang
Continual learning has become increasingly important as it enables NLP models to constantly learn and gain knowledge over time.