no code implementations • 6 Sep 2024 • Maria Wang, Srinivas Sunkara, Gilles Baechler, Jason Lin, Yun Zhu, Fedir Zubach, Lei Shu, Jindong Chen
In contrast to existing UI benchmarks that focus on multi-step web navigation and task completion, our dataset evaluates information extraction, multimodal retrieval and composition of information from many web pages.
2 code implementations • 7 Feb 2024 • Gilles Baechler, Srinivas Sunkara, Maria Wang, Fedir Zubach, Hassan Mansoor, Vincent Etter, Victor Cărbune, Jason Lin, Jindong Chen, Abhanshu Sharma
At the heart of this mixture is a novel screen annotation task in which the model has to identify the type and location of UI elements.
Ranked #3 on Visual Question Answering (VQA) on InfographicVQA (using extra training data)
1 code implementation • COLING 2022 • Srinivas Sunkara, Maria Wang, Lijuan Liu, Gilles Baechler, Yu-Chung Hsiao, Jindong, Chen, Abhanshu Sharma, James Stout
Improving the accessibility and automation capabilities of mobile devices can have a significant positive impact on the daily lives of countless users.
1 code implementation • 16 Sep 2022 • Yu-Chung Hsiao, Fedir Zubach, Gilles Baechler, Victor Carbune, Jason Lin, Maria Wang, Srinivas Sunkara, Yun Zhu, Jindong Chen
We present a new benchmark and dataset, ScreenQA, for screen content understanding via question answering.
no code implementations • ACL 2021 • Xiaoxue Zang, Lijuan Liu, Maria Wang, Yang song, Hao Zhang, Jindong Chen
Based on this dataset, we propose two tasks to facilitate research on image-text modeling: a photo-sharing intent prediction task that predicts whether one intends to share a photo in the next conversation turn, and a photo retrieval task that retrieves the most relevant photo according to the dialogue context.
Ranked #5 on Image Retrieval on PhotoChat