Search Results for author: Sandeep Tata

Found 11 papers, 3 papers with code

CASPR: Automated Evaluation Metric for Contrastive Summarization

1 code implementation23 Apr 2024 Nirupan Ananthamurugan, Dat Duong, Philip George, Ankita Gupta, Sandeep Tata, Beliz Gunel

Summarizing comparative opinions about entities (e. g., hotels, phones) from a set of source reviews, often referred to as contrastive summarization, can considerably aid users in decision making.

Decision Making Natural Language Inference

STRUM-LLM: Attributed and Structured Contrastive Summarization

no code implementations25 Mar 2024 Beliz Gunel, James B. Wendt, Jing Xie, Yichao Zhou, Nguyen Vo, Zachary Fisher, Sandeep Tata

Users often struggle with decision-making between two options (A vs B), as it usually requires time-consuming research across multiple web pages.

Attribute Decision Making

An Augmentation Strategy for Visually Rich Documents

no code implementations20 Dec 2022 Jing Xie, James B. Wendt, Yichao Zhou, Seth Ebner, Sandeep Tata

Many business workflows require extracting important fields from form-like documents (e. g. bank statements, bills of lading, purchase orders, etc.).

Data Augmentation

VRDU: A Benchmark for Visually-rich Document Understanding

no code implementations15 Nov 2022 Zilong Wang, Yichao Zhou, Wei Wei, Chen-Yu Lee, Sandeep Tata

Understanding visually-rich business documents to extract structured data and automate business workflows has been receiving attention both in academia and industry.

document understanding

Radically Lower Data-Labeling Costs for Visually Rich Document Extraction Models

no code implementations28 Oct 2022 Yichao Zhou, James B. Wendt, Navneet Potti, Jing Xie, Sandeep Tata

A key bottleneck in building automatic extraction models for visually rich documents like invoices is the cost of acquiring the several thousand high-quality labeled documents that are needed to train a model with acceptable accuracy.

Active Learning

Data-Efficient Information Extraction from Form-Like Documents

no code implementations7 Jan 2022 Beliz Gunel, Navneet Potti, Sandeep Tata, James B. Wendt, Marc Najork, Jing Xie

Automating information extraction from form-like documents at scale is a pressing need due to its potential impact on automating business workflows across many industries like financial services, insurance, and healthcare.

Transfer Learning

Simplified DOM Trees for Transferable Attribute Extraction from the Web

2 code implementations7 Jan 2021 Yichao Zhou, Ying Sheng, Nguyen Vo, Nick Edmonds, Sandeep Tata

There has been a steady need to precisely extract structured knowledge from the web (i. e. HTML documents).

Attribute Attribute Extraction +1

FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents

no code implementations21 Oct 2020 Bill Yuchen Lin, Ying Sheng, Nguyen Vo, Sandeep Tata

By combining these stages, FreeDOM is able to generalize to unseen sites after training on a small number of seed sites from that vertical without requiring expensive hand-crafted features over visual renderings of the page.

Representation Learning for Information Extraction from Form-like Documents

1 code implementation ACL 2020 Bodhisattwa Majumder, Navneet Potti, Sandeep Tata, James B. Wendt, Qi Zhao, Marc Najork

We propose a novel approach using representation learning for tackling the problem of extracting structured information from form-like document images.

Representation Learning

Active Learning for Skewed Data Sets

no code implementations23 May 2020 Abbas Kazerouni, Qi Zhao, Jing Xie, Sandeep Tata, Marc Najork

Furthermore, there is usually only a small amount of initial training data available when building machine-learned models to solve such problems.

Active Learning

Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore

no code implementations12 Mar 2011 Jun Rao, Eugene J. Shekita, Sandeep Tata

Compared to an eventually consistent datastore, we show that Spinnaker can be as fast or even faster on reads and only 5% to 10% slower on writes.

Databases Distributed, Parallel, and Cluster Computing

Cannot find the paper you are looking for? You can Submit a new open access paper.