Topic Classification

58 papers with code • 2 benchmarks • 8 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Active learning in annotating micro-blogs dealing with e-reputation

ungeimer/FLAT-TextTagger 16 Jun 2017

This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i. e., representation, web reputation) of politicians.

Hierarchical Transformers for Long Document Classification

helmy-elrais/RoBERT_Recurrence_over_BERT 23 Oct 2019

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm.

Entailment as Few-Shot Learner

PaddlePaddle/PaddleNLP 29 Apr 2021

Large pre-trained language models (LMs) have demonstrated remarkable ability as few-shot learners.

KLUE: Korean Language Understanding Evaluation

KLUE-benchmark/KLUE 20 May 2021

We introduce Korean Language Understanding Evaluation (KLUE) benchmark.

Cross-Lingual Adaptation using Structural Correspondence Learning

pprett/bolt 4 Aug 2010

From these correspondences a cross-lingual representation is created that enables the transfer of classification knowledge from the source to the target language.

Controlling the Interaction Between Generation and Inference in Semi-Supervised Variational Autoencoders Using Importance Weighting

ghazi-f/SSPIWO 13 Oct 2020

Even though Variational Autoencoders (VAEs) are widely used for semi-supervised learning, the reason why they work remains unclear.

Leveraging QA Datasets to Improve Generative Data Augmentation

dheeraj7596/conda 25 May 2022

The ability of generative language models (GLMs) to generate text has improved considerably in the last few years, enabling their use for generative data augmentation.

SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects

dadelani/sib-200 14 Sep 2023

Despite the progress we have recorded in the last few years in multilingual natural language processing, evaluation is typically limited to a small set of languages with available datasets which excludes a large number of low-resource languages.

LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons

BatsResearch/LexC-Gen 21 Feb 2024

We show that conditioning on bilingual lexicons is the key component of LexC-Gen. LexC-Gen is also practical -- it only needs a single GPU to generate data at scale.

Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish

fredxlpy/letz 5 Apr 2024

A common method for ZSC is to fine-tune a language model on a Natural Language Inference (NLI) dataset and then use it to infer the entailment between the input document and the target labels.