TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
ContextNER	EDGAR10-Q Dataset	EDGAR T5 Large	rougeL F1	49.23	# 1
ContextNER	EDGAR10-Q Dataset	ChatGPT	rougeL F1	30.31	# 2
ContextNER	EDGAR10-Q Dataset	Rule Based Phrase Generation	rougeL F1	27.59	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/zero-shot-open-information-extraction-using/contextner-on-edgar10-q-dataset)](https://paperswithcode.com/sota/contextner-on-edgar10-q-dataset?p=zero-shot-open-information-extraction-using)`

Context-NER : Contextual Phrase Generation at Scale

16 Sep 2021 · Himanshu Gupta, Shreyas Verma, Santosh Mashetty, Swaroop Mishra ·

Named Entity Recognition (NER) has seen significant progress in recent years, with numerous state-of-the-art (SOTA) models achieving high performance. However, very few studies have focused on the generation of entities' context. In this paper, we introduce CONTEXT-NER, a task that aims to generate the relevant context for entities in a sentence, where the context is a phrase describing the entity but not necessarily present in the sentence. To facilitate research in this task, we also present the EDGAR10-Q dataset, which consists of annual and quarterly reports from the top 1500 publicly traded companies. The dataset is the largest of its kind, containing 1M sentences, 2.8M entities, and an average of 35 tokens per sentence, making it a challenging dataset. We propose a baseline approach that combines a phrase generation algorithm with inferencing using a 220M language model, achieving a ROUGE-L score of 27% on the test split. Additionally, we perform a one-shot inference with ChatGPT, which obtains a 30% ROUGE-L, highlighting the difficulty of the dataset. We also evaluate models such as T5 and BART, which achieve a maximum ROUGE-L of 49% after supervised finetuning on EDGAR10-Q. We also find that T5-large, when pre-finetuned on EDGAR10-Q, achieve SOTA results on downstream finance tasks such as Headline, FPB, and FiQA SA, outperforming vanilla version by 10.81 points. To our surprise, this 66x smaller pre-finetuned model also surpasses the finance-specific LLM BloombergGPT-50B by 15 points. We hope that our dataset and generated artifacts will encourage further research in this direction, leading to the development of more sophisticated language models for financial text analysis

PDF Abstract

Code

Add Remove Mark official

him1411/edgar10q-dataset official

Tasks

Add Remove

ContextNER

Language Modelling

Machine Reading Comprehension

named-entity-recognition

Named Entity Recognition

Named Entity Recognition (NER)

NER

Open Information Extraction

Question Generation

Question-Generation

Reading Comprehension

Sentence

Datasets

Introduced in the Paper:

EDGAR10-Q Dataset

Used in the Paper:

WikiCoref

Results from the Paper

Edit

Ranked #1 on ContextNER on EDGAR10-Q Dataset

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
ContextNER	EDGAR10-Q Dataset	EDGAR T5 Large	rougeL F1	49.23	# 1	Compare
ContextNER	EDGAR10-Q Dataset	ChatGPT	rougeL F1	30.31	# 2	Compare
ContextNER	EDGAR10-Q Dataset	Rule Based Phrase Generation	rougeL F1	27.59	# 3	Compare

Methods

Add Remove

Test

Edit Social Preview

Context-NER : Contextual Phrase Generation at Scale

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove