no code implementations • 23 May 2023 • Yinghao Li, Colin Lockard, Prashant Shiralkar, Chao Zhang
To establish such connections, we propose to extract PTs from the Web pages containing hand-crafted PT recommendations for SIs.
no code implementations • 27 Aug 2022 • Ritesh Sarkhel, Binxuan Huang, Colin Lockard, Prashant Shiralkar
Prior works rely on a few human-labeled web pages from each target website or thousands of human-labeled web pages from some seed websites to train a transferable extraction model that generalizes on unseen target websites.
no code implementations • 24 May 2022 • Aidan San, Yuan Zhuang, Jan Bakus, Colin Lockard, David Ciemiewicz, Sandeep Atluri, Yangfeng Ji, Kevin Small, Heba Elfardy
Recently, neural models have been leveraged to significantly improve the performance of information extraction from semi-structured websites.
1 code implementation • 25 Jan 2022 • Xiang Deng, Prashant Shiralkar, Colin Lockard, Binxuan Huang, Huan Sun
We argue that the text and HTML structure together convey important semantics of the content and therefore warrant a special treatment for their representation learning.
Ranked #2 on
Attribute Extraction
on SWDE
1 code implementation • 17 Feb 2021 • Daheng Wang, Prashant Shiralkar, Colin Lockard, Binxuan Huang, Xin Luna Dong, Meng Jiang
Existing work linearize table cells and heavily rely on modifying deep language models such as BERT which only captures related cells information in the same table.
no code implementations • ACL 2020 • Xin Luna Dong, Hannaneh Hajishirzi, Colin Lockard, Prashant Shiralkar
In this tutorial we take a holistic view toward information extraction, exploring the commonalities in the challenges and solutions developed to address these different forms of text.
no code implementations • 14 May 2020 • Colin Lockard, Prashant Shiralkar, Xin Luna Dong, Hannaneh Hajishirzi
In this work, we propose a solution for "zero-shot" open-domain relation extraction from webpages with a previously unseen template, including from websites with little overlap with existing sources of knowledge for distant supervision and websites in entirely new subject verticals.
no code implementations • NAACL 2019 • Colin Lockard, Prashant Shiralkar, Xin Luna Dong
In this paper, we define the problem of OpenIE from semi-structured websites to extract such facts, and present an approach for solving it.
1 code implementation • NAACL 2019 • Dongxu Zhang, Subhabrata Mukherjee, Colin Lockard, Xin Luna Dong, Andrew McCallum
In this paper, we consider advancing web-scale knowledge extraction and alignment by integrating OpenIE extractions in the form of (subject, predicate, object) triples with Knowledge Bases (KB).
no code implementations • NAACL 2018 • James Ferguson, Colin Lockard, Daniel S. Weld, Hannaneh Hajishirzi
Supervised event extraction systems are limited in their accuracy due to the lack of available training data.
no code implementations • 12 Apr 2018 • Colin Lockard, Xin Luna Dong, Arash Einolghozati, Prashant Shiralkar
In this paper we present a new method for automatic extraction from semi-structured websites based on distant supervision.