Part-Of-Speech Tagging
214 papers with code • 17 benchmarks • 26 datasets
Part-of-speech tagging (POS tagging) is the task of tagging a word in a text with its part of speech. A part of speech is a category of words with similar grammatical properties. Common English parts of speech are noun, verb, adjective, adverb, pronoun, preposition, conjunction, etc.
Example:
Vinken | , | 61 | years | old |
---|---|---|---|---|
NNP | , | CD | NNS | JJ |
Libraries
Use these libraries to find Part-Of-Speech Tagging models and implementationsDatasets
Latest papers with no code
A Morphology-Based Investigation of Positional Encodings
How does the importance of positional encoding in pre-trained language models (PLMs) vary across languages with different morphological complexity?
ZAEBUC-Spoken: A Multilingual Multidialectal Arabic-English Speech Corpus
We present ZAEBUC-Spoken, a multilingual multidialectal Arabic-English speech corpus.
The Comparison of Translationese in Machine Translation and Human Transation in terms of Translation Relations
This study explores the distinctions between neural machine translation (NMT) and human translation (HT) through the lens of translation relations.
NLPre: a revised approach towards language-centric benchmarking of Natural Language Preprocessing systems
Aware of the shortcomings of existing NLPre evaluation approaches, we investigate a novel method of reliable and fair evaluation and performance reporting.
Automated Generation of Multiple-Choice Cloze Questions for Assessing English Vocabulary Using GPT-turbo 3.5
The VocaTT (vocabulary teaching and training) engine is written in Python and comprises three basic steps: pre-processing target word lists, generating sentences and candidate word options using GPT, and finally selecting suitable word options.
VNLP: Turkish NLP Package
In this work, we present VNLP: the first dedicated, complete, open-source, well-documented, lightweight, production-ready, state-of-the-art Natural Language Processing (NLP) package for the Turkish language.
Decomposed Prompting: Unveiling Multilingual Linguistic Structure Knowledge in English-Centric Large Language Models
Despite the predominance of English in their training data, English-centric Large Language Models (LLMs) like GPT-3 and LLaMA display a remarkable ability to perform multilingual tasks, raising questions about the depth and nature of their cross-lingual capabilities.
An Effective Incorporating Heterogeneous Knowledge Curriculum Learning for Sequence Labeling
To address this challenge, we propose a two-stage curriculum learning (TCL) framework specifically designed for sequence labeling tasks.
Punctuation Restoration Improves Structure Understanding without Supervision
Unsupervised learning objectives like language modeling and de-noising constitute a significant part in producing pre-trained models that perform various downstream applications from natural language understanding to conversational tasks.
A Comprehensive View of the Biases of Toxicity and Sentiment Analysis Methods Towards Utterances with African American English Expressions
One explanation for this bias is that AI models are trained on limited datasets, and using such a term in training data is more likely to appear in a toxic utterance.