no code implementations • 25 Dec 2024 • Dulhan Jayalath, James Bradley Wendt, Nicholas Monath, Sandeep Tata, Beliz Gunel
Long-range tasks require reasoning over long inputs.
no code implementations • 21 Jul 2024 • EunJeong Hwang, Yichao Zhou, James Bradley Wendt, Beliz Gunel, Nguyen Vo, Jing Xie, Sandeep Tata
Large language models (LLMs) often struggle with processing extensive input contexts, which can lead to redundant, inaccurate, or incoherent summaries.
no code implementations • 7 Jun 2024 • EunJeong Hwang, Yichao Zhou, Beliz Gunel, James Bradley Wendt, Sandeep Tata
No existing dataset adequately tests how well language models can incrementally update entity summaries - a crucial ability as these models rapidly advance.
1 code implementation • 23 Apr 2024 • Nirupan Ananthamurugan, Dat Duong, Philip George, Ankita Gupta, Sandeep Tata, Beliz Gunel
Summarizing comparative opinions about entities (e. g., hotels, phones) from a set of source reviews, often referred to as contrastive summarization, can considerably aid users in decision making.
no code implementations • 25 Mar 2024 • Beliz Gunel, James B. Wendt, Jing Xie, Yichao Zhou, Nguyen Vo, Zachary Fisher, Sandeep Tata
Users often struggle with decision-making between two options (A vs B), as it usually requires time-consuming research across multiple web pages.
no code implementations • 20 Dec 2022 • Jing Xie, James B. Wendt, Yichao Zhou, Seth Ebner, Sandeep Tata
Many business workflows require extracting important fields from form-like documents (e. g. bank statements, bills of lading, purchase orders, etc.).
no code implementations • 15 Nov 2022 • Zilong Wang, Yichao Zhou, Wei Wei, Chen-Yu Lee, Sandeep Tata
Understanding visually-rich business documents to extract structured data and automate business workflows has been receiving attention both in academia and industry.
no code implementations • 28 Oct 2022 • Yichao Zhou, James B. Wendt, Navneet Potti, Jing Xie, Sandeep Tata
A key bottleneck in building automatic extraction models for visually rich documents like invoices is the cost of acquiring the several thousand high-quality labeled documents that are needed to train a model with acceptable accuracy.
no code implementations • 7 Jan 2022 • Beliz Gunel, Navneet Potti, Sandeep Tata, James B. Wendt, Marc Najork, Jing Xie
Automating information extraction from form-like documents at scale is a pressing need due to its potential impact on automating business workflows across many industries like financial services, insurance, and healthcare.
2 code implementations • 7 Jan 2021 • Yichao Zhou, Ying Sheng, Nguyen Vo, Nick Edmonds, Sandeep Tata
There has been a steady need to precisely extract structured knowledge from the web (i. e. HTML documents).
no code implementations • 21 Oct 2020 • Bill Yuchen Lin, Ying Sheng, Nguyen Vo, Sandeep Tata
By combining these stages, FreeDOM is able to generalize to unseen sites after training on a small number of seed sites from that vertical without requiring expensive hand-crafted features over visual renderings of the page.
1 code implementation • ACL 2020 • Bodhisattwa Majumder, Navneet Potti, Sandeep Tata, James B. Wendt, Qi Zhao, Marc Najork
We propose a novel approach using representation learning for tackling the problem of extracting structured information from form-like document images.
no code implementations • 23 May 2020 • Abbas Kazerouni, Qi Zhao, Jing Xie, Sandeep Tata, Marc Najork
Furthermore, there is usually only a small amount of initial training data available when building machine-learned models to solve such problems.
no code implementations • 12 Mar 2011 • Jun Rao, Eugene J. Shekita, Sandeep Tata
Compared to an eventually consistent datastore, we show that Spinnaker can be as fast or even faster on reads and only 5% to 10% slower on writes.
Databases Distributed, Parallel, and Cluster Computing