Topic Classification

54 papers with code • 2 benchmarks • 8 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Topic Classification

Trend	Dataset	Best Model	Paper	Code	Compare
	OS	RoBERTa-large 355M + Entailment as Few-shot Learner			See all
	Amazon Product Data	Multinomial Naive Bayes			See all

Datasets

Most implemented papers

Most implemented Social Latest No code

Active learning in annotating micro-blogs dealing with e-reputation

ungeimer/FLAT-TextTagger • 16 Jun 2017

This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i. e., representation, web reputation) of politicians.

Paper
Code

Hierarchical Transformers for Long Document Classification

helmy-elrais/RoBERT_Recurrence_over_BERT • • 23 Oct 2019

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm.

Paper
Code

Entailment as Few-Shot Learner

PaddlePaddle/PaddleNLP • • 29 Apr 2021

Large pre-trained language models (LMs) have demonstrated remarkable ability as few-shot learners.

Paper
Code

KLUE: Korean Language Understanding Evaluation

KLUE-benchmark/KLUE • 20 May 2021

We introduce Korean Language Understanding Evaluation (KLUE) benchmark.

Paper
Code

Cross-Lingual Adaptation using Structural Correspondence Learning

pprett/bolt • 4 Aug 2010

From these correspondences a cross-lingual representation is created that enables the transfer of classification knowledge from the source to the target language.

Paper
Code

Controlling the Interaction Between Generation and Inference in Semi-Supervised Variational Autoencoders Using Importance Weighting

ghazi-f/SSPIWO • • 13 Oct 2020

Even though Variational Autoencoders (VAEs) are widely used for semi-supervised learning, the reason why they work remains unclear.

Paper
Code

Leveraging QA Datasets to Improve Generative Data Augmentation

dheeraj7596/conda • • 25 May 2022

The ability of generative language models (GLMs) to generate text has improved considerably in the last few years, enabling their use for generative data augmentation.

Paper
Code

SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects

dadelani/sib-200 • • 14 Sep 2023

Despite the progress we have recorded in the last few years in multilingual natural language processing, evaluation is typically limited to a small set of languages with available datasets which excludes a large number of low-resource languages.

Paper
Code

LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons

BatsResearch/LexC-Gen • 21 Feb 2024

We show that conditioning on bilingual lexicons is the key component of LexC-Gen. LexC-Gen is also practical -- it only needs a single GPU to generate data at scale.

Paper
Code

Forget NLI, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to Luxembourgish

fredxlpy/letz • • 5 Apr 2024

A common method for ZSC is to fine-tune a language model on a Natural Language Inference (NLI) dataset and then use it to infer the entailment between the input document and the target labels.

Paper
Code

Topic Classification

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result