Topic Classification
54 papers with code • 2 benchmarks • 8 datasets
Datasets
Most implemented papers
Active learning in annotating micro-blogs dealing with e-reputation
This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i. e., representation, web reputation) of politicians.
Hierarchical Transformers for Long Document Classification
BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm.
Entailment as Few-Shot Learner
Large pre-trained language models (LMs) have demonstrated remarkable ability as few-shot learners.
KLUE: Korean Language Understanding Evaluation
We introduce Korean Language Understanding Evaluation (KLUE) benchmark.
Cross-Lingual Adaptation using Structural Correspondence Learning
From these correspondences a cross-lingual representation is created that enables the transfer of classification knowledge from the source to the target language.
Controlling the Interaction Between Generation and Inference in Semi-Supervised Variational Autoencoders Using Importance Weighting
Even though Variational Autoencoders (VAEs) are widely used for semi-supervised learning, the reason why they work remains unclear.
Leveraging QA Datasets to Improve Generative Data Augmentation
The ability of generative language models (GLMs) to generate text has improved considerably in the last few years, enabling their use for generative data augmentation.
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
Despite the progress we have recorded in the last few years in multilingual natural language processing, evaluation is typically limited to a small set of languages with available datasets which excludes a large number of low-resource languages.
LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons
We show that conditioning on bilingual lexicons is the key component of LexC-Gen. LexC-Gen is also practical -- it only needs a single GPU to generate data at scale.
Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish
A common method for ZSC is to fine-tune a language model on a Natural Language Inference (NLI) dataset and then use it to infer the entailment between the input document and the target labels.