Search Results for author: Colin Lockard

Found 11 papers, 3 papers with code

Extracting Shopping Interest-Related Product Types from the Web

no code implementations23 May 2023 Yinghao Li, Colin Lockard, Prashant Shiralkar, Chao Zhang

To establish such connections, we propose to extract PTs from the Web pages containing hand-crafted PT recommendations for SIs.

Node Classification

Label-Efficient Self-Training for Attribute Extraction from Semi-Structured Web Documents

no code implementations27 Aug 2022 Ritesh Sarkhel, Binxuan Huang, Colin Lockard, Prashant Shiralkar

Prior works rely on a few human-labeled web pages from each target website or thousands of human-labeled web pages from some seed websites to train a transferable extraction model that generalizes on unseen target websites.

Attribute Attribute Extraction

PLAtE: A Large-scale Dataset for List Page Web Extraction

no code implementations24 May 2022 Aidan San, Yuan Zhuang, Jan Bakus, Colin Lockard, David Ciemiewicz, Sandeep Atluri, Yangfeng Ji, Kevin Small, Heba Elfardy

Recently, neural models have been leveraged to significantly improve the performance of information extraction from semi-structured websites.

Attribute Attribute Extraction

DOM-LM: Learning Generalizable Representations for HTML Documents

1 code implementation25 Jan 2022 Xiang Deng, Prashant Shiralkar, Colin Lockard, Binxuan Huang, Huan Sun

We argue that the text and HTML structure together convey important semantics of the content and therefore warrant a special treatment for their representation learning.

Attribute Attribute Extraction +3

TCN: Table Convolutional Network for Web Table Interpretation

1 code implementation17 Feb 2021 Daheng Wang, Prashant Shiralkar, Colin Lockard, Binxuan Huang, Xin Luna Dong, Meng Jiang

Existing work linearize table cells and heavily rely on modifying deep language models such as BERT which only captures related cells information in the same table.

Representation Learning Table annotation +1

Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web

no code implementations ACL 2020 Xin Luna Dong, Hannaneh Hajishirzi, Colin Lockard, Prashant Shiralkar

In this tutorial we take a holistic view toward information extraction, exploring the commonalities in the challenges and solutions developed to address these different forms of text.

document understanding Entity Linking

ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages

no code implementations14 May 2020 Colin Lockard, Prashant Shiralkar, Xin Luna Dong, Hannaneh Hajishirzi

In this work, we propose a solution for "zero-shot" open-domain relation extraction from webpages with a previously unseen template, including from websites with little overlap with existing sources of knowledge for distant supervision and websites in entirely new subject verticals.

Relation Relation Extraction

OpenCeres: When Open Information Extraction Meets the Semi-Structured Web

no code implementations NAACL 2019 Colin Lockard, Prashant Shiralkar, Xin Luna Dong

In this paper, we define the problem of OpenIE from semi-structured websites to extract such facts, and present an approach for solving it.

Open Information Extraction Relation Extraction

OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference

1 code implementation NAACL 2019 Dongxu Zhang, Subhabrata Mukherjee, Colin Lockard, Xin Luna Dong, Andrew McCallum

In this paper, we consider advancing web-scale knowledge extraction and alignment by integrating OpenIE extractions in the form of (subject, predicate, object) triples with Knowledge Bases (KB).

Open Information Extraction Relation

Semi-Supervised Event Extraction with Paraphrase Clusters

no code implementations NAACL 2018 James Ferguson, Colin Lockard, Daniel S. Weld, Hannaneh Hajishirzi

Supervised event extraction systems are limited in their accuracy due to the lack of available training data.

Event Extraction

CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web

no code implementations12 Apr 2018 Colin Lockard, Xin Luna Dong, Arash Einolghozati, Prashant Shiralkar

In this paper we present a new method for automatic extraction from semi-structured websites based on distant supervision.

Relation Relation Extraction

Cannot find the paper you are looking for? You can Submit a new open access paper.