Document Classification

45 papers with code · Natural Language Processing
Subtask of Text Classification

State-of-the-art leaderboards

Greatest papers with code

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

26 Dec 2018facebookresearch/LASER

We introduce an architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different language families and written in 28 different scripts. Finally, we introduce a new test set of aligned sentences in 122 languages based on the Tatoeba corpus, and show that our sentence embeddings obtain strong results in multilingual similarity search even for low-resource languages.

CROSS-LINGUAL BITEXT MINING CROSS-LINGUAL DOCUMENT CLASSIFICATION CROSS-LINGUAL NATURAL LANGUAGE INFERENCE CROSS-LINGUAL TRANSFER DOCUMENT CLASSIFICATION JOINT MULTILINGUAL SENTENCE REPRESENTATIONS PARALLEL CORPUS MINING

Improving Language Understanding by Generative Pre-Training

Preprint 2018 openai/finetune-transformer-lm

We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. We demonstrate the effectiveness of our approach on a wide range of benchmarks for natural language understanding.

DOCUMENT CLASSIFICATION LANGUAGE MODELLING NATURAL LANGUAGE INFERENCE QUESTION ANSWERING SEMANTIC TEXTUAL SIMILARITY

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

ACL 2018 dinghanshen/SWEM

Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring a substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions.

DOCUMENT CLASSIFICATION NAMED ENTITY RECOGNITION SENTIMENT ANALYSIS SUBJECTIVITY ANALYSIS WORD EMBEDDINGS

RMDL: Random Multimodel Deep Learning for Classification

3 May 2018kk7nc/RMDL

This paper introduces Random Multimodel Deep Learning (RMDL): a new ensemble, deep learning approach for classification. RMDL solves the problem of finding the best deep learning structure and architecture while simultaneously improving robustness and accuracy through ensembles of deep learning architectures.

DOCUMENT CLASSIFICATION IMAGE CLASSIFICATION

Efficient Vector Representation for Documents through Corruption

8 Jul 2017mchen24/iclr2017

Doc2VecC represents each document as a simple average of word embeddings. The simple model architecture introduced by Doc2VecC matches or out-performs the state-of-the-art in generating high-quality document representations for sentiment analysis, document classification as well as semantic relatedness tasks.

DOCUMENT CLASSIFICATION SENTIMENT ANALYSIS WORD EMBEDDINGS

On Calibration of Modern Neural Networks

ICML 2017 gpleiss/temperature_scaling

Confidence calibration -- the problem of predicting probability estimates representative of the true correctness likelihood -- is important for classification models in many applications. We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated.

DOCUMENT CLASSIFICATION

Multi-layer Representation Learning for Medical Concepts

17 Feb 2016mp2893/med2vec

Learning efficient representations for concepts has been proven to be an important basis for many applications such as machine translation or document classification. Proper representations of medical concepts such as diagnosis, medication, procedure codes and visits will have broad applications in healthcare analytics.

DOCUMENT CLASSIFICATION MACHINE TRANSLATION MEDICAL DIAGNOSIS REPRESENTATION LEARNING

HDLTex: Hierarchical Deep Learning for Text Classification

24 Sep 2017kk7nc/HDLTex

The continually increasing number of documents produced each year necessitates ever improving information processing methods for searching, retrieving, and organizing text. This is because along with this growth in the number of documents has come an increase in the number of categories.

DOCUMENT CLASSIFICATION

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

9 Oct 2014eske/multivec

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data. Instead it trains directly on monolingual data and extracts a bilingual signal from a smaller set of raw-text sentence-aligned data.

CROSS-LINGUAL DOCUMENT CLASSIFICATION DOCUMENT CLASSIFICATION

KATE: K-Competitive Autoencoder for Text

4 May 2017hugochan/KATE

Autoencoders have been successful in learning meaningful representations from image datasets. However, their performance on text datasets has not been widely studied.

DOCUMENT CLASSIFICATION TOPIC MODELS