Token Classification
29 papers with code • 10 benchmarks • 9 datasets
Benchmarks
These leaderboards are used to track progress in Token Classification
Trend | Dataset | Best Model | Paper | Code | Compare |
---|
Most implemented papers
WangchanBERTa: Pretraining transformer-based Thai Language Models
However, for a relatively low-resource language such as Thai, the choices of models are limited to training a BERT-based model based on a much smaller dataset or finetuning multi-lingual models, both of which yield suboptimal downstream performance.
Detecting Label Errors in Token Classification Data
Mislabeled examples are a common issue in real-world data, particularly for tasks like token classification where many labels must be chosen on a fine-grained basis.
Label Supervised LLaMA Finetuning
We evaluate this approach with Label Supervised LLaMA (LS-LLaMA), based on LLaMA-2-7B, a relatively small-scale LLM, and can be finetuned on a single GeForce RTX4090 GPU.
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
However, BIO-tagging scheme relies on the correct order of model inputs, which is not guaranteed in real-world NER on scanned VrDs where text are recognized and arranged by OCR systems.
Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking
We show on an entity linking benchmark that (i) this model improves the entity representations over plain BERT, (ii) that it outperforms entity linking architectures that optimize the tasks separately and (iii) that it only comes second to the current state-of-the-art that does mention detection and entity disambiguation jointly.
Common-Knowledge Concept Recognition for SEVA
We build a common-knowledge concept recognition system for a Systems Engineer's Virtual Assistant (SEVA) which can be used for downstream tasks such as relation extraction, knowledge graph construction, and question-answering.
Counterfactual Detection meets Transfer Learning
We can consider Counterfactuals as belonging in the domain of Discourse structure and semantics, A core area in Natural Language Understanding and in this paper, we introduce an approach to resolving counterfactual detection as well as the indexing of the antecedents and consequents of Counterfactual statements.
On Long-Tailed Phenomena in Neural Machine Translation
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens, tackling which remains a major challenge.
Indic-Transformers: An Analysis of Transformer Language Models for Indian Languages
Language models based on the Transformer architecture have achieved state-of-the-art performance on a wide range of NLP tasks such as text classification, question-answering, and token classification.
NLRG at SemEval-2021 Task 5: Toxic Spans Detection Leveraging BERT-based Token Classification and Span Prediction Techniques
In our paper, we explore simple versions of both of these approaches and their performance on the task.