In this work, we present DiSMEC, which is a large-scale distributed framework for learning one-versus-rest linear classifiers coupled with explicit capacity control to control model size.
EXTREME MULTI-LABEL CLASSIFICATION MULTI-LABEL CLASSIFICATION MULTI-LABEL LEARNING
Extreme multi-label classification (XMLC) is a problem of tagging an instance with a small subset of relevant labels chosen from an extremely large pool of possible labels.
EXTREME MULTI-LABEL CLASSIFICATION MULTI-LABEL CLASSIFICATION
However, naively applying deep transformer models to the XMC problem leads to sub-optimal performance due to the large output space and the label sparsity issue.
EXTREME MULTI-LABEL CLASSIFICATION MULTI-LABEL CLASSIFICATION MULTI-LABEL TEXT CLASSIFICATION PRODUCT CATEGORIZATION SENTENCE CLASSIFICATION
The lower the HS level, the less the categorization performance.
EXTREME MULTI-LABEL CLASSIFICATION HIERARCHICAL STRUCTURE MULTI-LABEL CLASSIFICATION TEXT CATEGORIZATION
In this paper, we develop a suite of algorithms, called Bonsai, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees.
EXTREME MULTI-LABEL CLASSIFICATION MULTI-LABEL CLASSIFICATION MULTI-LABEL LEARNING
We first introduce the PLT model and discuss training and inference procedures and their computational costs.
EXTREME MULTI-LABEL CLASSIFICATION MULTI-LABEL CLASSIFICATION
Our experiments show that there is indeed additional structure beyond sparsity in the real datasets; our method is able to discover it and exploit it to create excellent reconstructions with fewer measurements (by a factor of 1. 1-3x) compared to the previous state-of-the-art methods.
EXTREME MULTI-LABEL CLASSIFICATION MULTI-LABEL CLASSIFICATION MULTI-LABEL LEARNING
The goal in extreme multi-label classification is to learn a classifier which can assign a small subset of relevant labels to an instance from an extremely large set of target labels.
EXTREME MULTI-LABEL CLASSIFICATION MULTI-LABEL CLASSIFICATION
Finally, we show that the Sparse Weighted Nearest-Neighbor Method can process data points in real time on XMLC datasets with equivalent performance to SOTA models, with a single thread and smaller storage footprint.
EXTREME MULTI-LABEL CLASSIFICATION INFORMATION RETRIEVAL MULTI-LABEL CLASSIFICATION