no code implementations • 6 Sep 2024 • Maria Wang, Srinivas Sunkara, Gilles Baechler, Jason Lin, Yun Zhu, Fedir Zubach, Lei Shu, Jindong Chen
In contrast to existing UI benchmarks that focus on multi-step web navigation and task completion, our dataset evaluates information extraction, multimodal retrieval and composition of information from many web pages.
no code implementations • 19 Mar 2024 • Victor Carbune, Hassan Mansoor, Fangyu Liu, Rahul Aralikatte, Gilles Baechler, Jindong Chen, Abhanshu Sharma
We propose a technique to transfer capabilities from LLMs to VLMs.
Ranked #1 on Chart Question Answering on ChartQA (using extra training data)
Chart Question Answering Optical Character Recognition (OCR)
2 code implementations • 7 Feb 2024 • Gilles Baechler, Srinivas Sunkara, Maria Wang, Fedir Zubach, Hassan Mansoor, Vincent Etter, Victor Cărbune, Jason Lin, Jindong Chen, Abhanshu Sharma
At the heart of this mixture is a novel screen annotation task in which the model has to identify the type and location of UI elements.
Ranked #3 on Visual Question Answering (VQA) on InfographicVQA (using extra training data)
1 code implementation • COLING 2022 • Srinivas Sunkara, Maria Wang, Lijuan Liu, Gilles Baechler, Yu-Chung Hsiao, Jindong, Chen, Abhanshu Sharma, James Stout
Improving the accessibility and automation capabilities of mobile devices can have a significant positive impact on the daily lives of countless users.
1 code implementation • 16 Sep 2022 • Yu-Chung Hsiao, Fedir Zubach, Gilles Baechler, Victor Carbune, Jason Lin, Maria Wang, Srinivas Sunkara, Yun Zhu, Jindong Chen
We present a new benchmark and dataset, ScreenQA, for screen content understanding via question answering.
no code implementations • 13 Jul 2022 • Gilles Baechler, Michalina Pacholska, Arnaud Latty, Adam Scholefield, Martin Vetterli
Lippmann photography provides a great opportunity to demonstrate several fundamental concepts in signal processing.
1 code implementation • 11 Jan 2022 • Gang Li, Gilles Baechler, Manuel Tragut, Yang Li
The layout of a mobile screen is a critical data source for UI design research and semantic understanding of the screen.
no code implementations • 6 Nov 2018 • Laurent Valentin Jospin, Gilles Baechler, Adam Scholefield
Polarizing filters provide a powerful way to separate diffuse and specular reflection; however, traditional methods rely on several captures and require proper alignment of the filters.