Topic Classification
73 papers with code • 2 benchmarks • 10 datasets
Datasets
Most implemented papers
Active learning in annotating micro-blogs dealing with e-reputation
This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i. e., representation, web reputation) of politicians.
KLUE: Korean Language Understanding Evaluation
We introduce Korean Language Understanding Evaluation (KLUE) benchmark.
Hierarchical Transformers for Long Document Classification
BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm.
Entailment as Few-Shot Learner
Large pre-trained language models (LMs) have demonstrated remarkable ability as few-shot learners.
Cross-Lingual Adaptation using Structural Correspondence Learning
From these correspondences a cross-lingual representation is created that enables the transfer of classification knowledge from the source to the target language.
Controlling the Interaction Between Generation and Inference in Semi-Supervised Variational Autoencoders Using Importance Weighting
Even though Variational Autoencoders (VAEs) are widely used for semi-supervised learning, the reason why they work remains unclear.
Leveraging QA Datasets to Improve Generative Data Augmentation
The ability of generative language models (GLMs) to generate text has improved considerably in the last few years, enabling their use for generative data augmentation.
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
Despite the progress we have recorded in the last few years in multilingual natural language processing, evaluation is typically limited to a small set of languages with available datasets which excludes a large number of low-resource languages.
LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons
Data scarcity in low-resource languages can be addressed with word-to-word translations from labeled task data in high-resource languages using bilingual lexicons.
Topic-based Evaluation for Conversational Bots
Dialog evaluation is a challenging problem, especially for non task-oriented dialogs where conversational success is not well-defined.