no code implementations • 31 May 2025 • Mina Huh, Zihui Xue, Ujjaini Das, Kumar Ashutosh, Kristen Grauman, Amy Pavel
People use videos to learn new recipes, exercises, and crafts.
no code implementations • 10 Feb 2025 • Aadit Barua, Karim Benharrak, Meng Chen, Mina Huh, Amy Pavel
Lotus first creates an abstractive short-form video by generating both a short-form script and its corresponding speech, then matching long-form video clips to the generated narration.
1 code implementation • 30 Sep 2024 • Yi-Hao Peng, Faria Huq, Yue Jiang, Jason Wu, Amanda Xin Yue Li, Jeffrey Bigham, Amy Pavel
Enabling machines to understand structured visuals like slides and user interfaces is essential for making them accessible to people with disabilities.
no code implementations • 12 Aug 2024 • Mina Huh, Fangyuan Xu, Yi-Hao Peng, Chongyan Chen, Hansika Murugu, Danna Gurari, Eunsol Choi, Amy Pavel
Vision language models can now generate long-form answers to questions about images - long-form visual question answers (LFVQA).
1 code implementation • NAACL 2021 • Prakhar Gupta, Jeffrey P. Bigham, Yulia Tsvetkov, Amy Pavel
Dialogue systems pretrained with large language models generate locally coherent responses, but lack the fine-grained control over responses necessary to achieve specific goals.
no code implementations • 14 Jul 2020 • Kundan Krishna, Amy Pavel, Benjamin Schloss, Jeffrey P. Bigham, Zachary C. Lipton
In this exploratory study, we describe a new dataset consisting of conversation transcripts, post-visit summaries, corresponding supporting evidence (in the transcript), and structured labels.
2 code implementations • WS 2019 • Prakhar Gupta, Shikib Mehri, Tiancheng Zhao, Amy Pavel, Maxine Eskenazi, Jeffrey P. Bigham
The aim of this paper is to mitigate the shortcomings of automatic evaluation of open-domain dialog systems through multi-reference evaluation.
no code implementations • 13 Dec 2016 • Vincent Sitzmann, Ana Serrano, Amy Pavel, Maneesh Agrawala, Diego Gutierrez, Belen Masia, Gordon Wetzstein
Understanding how people explore immersive virtual environments is crucial for many applications, such as designing virtual reality (VR) content, developing new compression algorithms, or learning computational models of saliency or visual attention.