no code implementations • LChange (ACL) 2022 • Iiro Rastas, Yann Ciarán Ryan, Iiro Tiihonen, Mohammadreza Qaraei, Liina Repo, Rohit Babbar, Eetu Mäkelä, Mikko Tolonen, Filip Ginter
In this paper, we describe a BERT model trained on the Eighteenth Century Collections Online (ECCO) dataset of digitized documents.
Optical Character Recognition Optical Character Recognition (OCR)
no code implementations • 6 Nov 2024 • Nasib Ullah, Erik Schultheis, Jinbin Zhang, Rohit Babbar
To judge whether an extreme classifier is indeed suited to this task, one can look, for example, to whether it returns \emph{calibrated} probabilities, which has hitherto not been done in this field.
1 code implementation • 5 Nov 2024 • Nasib Ullah, Erik Schultheis, Mike Lasby, Yani Ioannou, Rohit Babbar
In this paper, we leverage recent advances in semi-structured sparse training to apply DST in the domain of classification with large output spaces, where memory-efficiency is paramount.
no code implementations • 20 Jun 2024 • Wojciech Kotłowski, Marek Wydmuch, Erik Schultheis, Rohit Babbar, Krzysztof Dembczyński
We consider sequential maximization of performance metrics that are general functions of a confusion matrix of a classifier (such as precision, F-measure, or G-mean).
no code implementations • 13 Jun 2024 • Jinbin Zhang, Nasib Ullah, Rohit Babbar
Extreme Multi-label Learning (XMC) is a task that allocates the most relevant labels for an instance from a predefined label set.
no code implementations • 4 May 2024 • Siddhant Kharbanda, Devaansh Gupta, Gururaj K, Pankaj Malhotra, Cho-Jui Hsieh, Rohit Babbar
While such methods have shown empirical success, we observe two key uncharted aspects, (i) DE training typically uses only a single positive relation even for datasets which offer more, (ii) existing approaches fixate on using only OvA reduction of the multi-label problem.
Extreme Multi-Label Classification MUlTI-LABEL-ClASSIFICATION
no code implementations • 3 May 2024 • Siddhant Kharbanda, Devaansh Gupta, Erik Schultheis, Atmadeep Banerjee, Cho-Jui Hsieh, Rohit Babbar
Extreme Multi-label Text Classification (XMC) involves learning a classifier that can assign an input with a subset of most relevant labels from millions of label choices.
Extreme Multi-Label Classification MUlTI-LABEL-ClASSIFICATION +4
2 code implementations • 29 Jan 2024 • Erik Schultheis, Wojciech Kotłowski, Marek Wydmuch, Rohit Babbar, Strom Borman, Krzysztof Dembczyński
We consider the optimization of complex performance metrics in multi-label classification under the population utility framework.
2 code implementations • NeurIPS 2023 • Erik Schultheis, Marek Wydmuch, Wojciech Kotłowski, Rohit Babbar, Krzysztof Dembczyński
As such, it is characterized by long-tail labels, i. e., most labels have very few positive instances.
no code implementations • 6 Jun 2023 • Erik Schultheis, Rohit Babbar
In classification problems with large output spaces (up to millions of labels), the last layer can require an enormous amount of memory.
no code implementations • 29 Oct 2022 • Siddhant Kharbanda, Atmadeep Banerjee, Erik Schultheis, Rohit Babbar
We thus propose CascadeXML, an end-to-end multi-resolution learning pipeline, which can harness the multi-layered architecture of a transformer model for attending to different label resolutions with separate feature representations.
Extreme Multi-Label Classification MUlTI-LABEL-ClASSIFICATION +3
no code implementations • 26 Jul 2022 • Erik Schultheis, Marek Wydmuch, Rohit Babbar, Krzysztof Dembczyński
The propensity model introduced by Jain et al. 2016 has become a standard approach for dealing with missing and long-tail labels in extreme multi-label classification (XMLC).
1 code implementation • 14 Dec 2021 • Mohammadreza Qaraei, Rohit Babbar
Extreme Multilabel Text Classification (XMTC) is a text classification problem in which, (i) the output space is extremely large, (ii) each data point may have multiple positive labels, and (iii) the data follows a strongly imbalanced distribution.
1 code implementation • 20 Oct 2021 • Marek Wydmuch, Kalina Jasinska-Kobus, Rohit Babbar, Krzysztof Dembczyński
Extreme multi-label classification (XMLC) refers to the task of tagging instances with small subsets of relevant labels coming from an extremely large set of all possible labels.
Extreme Multi-Label Classification MUlTI-LABEL-ClASSIFICATION +1
no code implementations • 27 Sep 2021 • Erik Schultheis, Rohit Babbar
We want to start in a region of weight space a) with low loss value, b) that is favourable for second-order optimization, and c) where the conjugate-gradient (CG) calculations can be performed quickly.
Extreme Multi-Label Classification MUlTI-LABEL-ClASSIFICATION
no code implementations • 23 Sep 2021 • Erik Schultheis, Rohit Babbar
This paper considers binary and multilabel classification problems in a setting where labels are missing independently and with a known rate.
1 code implementation • 13 Sep 2021 • Siddhant Kharbanda, Atmadeep Banerjee, Devaansh Gupta, Akash Palrecha, Rohit Babbar
Automatic annotation of short-text data to a large number of target labels, referred to as Short Text Extreme Classification, has found numerous applications including prediction of related searches and product recommendation.
no code implementations • 1 Jul 2020 • Erik Schultheis, Mohammadreza Qaraei, Priyanshu Gupta, Rohit Babbar
In addition to the computational burden arising from large number of training instances, features and labels, problems in XMC are faced with two statistical challenges, (i) large number of 'tail-labels' -- those which occur very infrequently, and (ii) missing labels as it is virtually impossible to manually assign every relevant label to an instance.
3 code implementations • 17 Apr 2019 • Sujay Khandagale, Han Xiao, Rohit Babbar
In this paper, we develop a suite of algorithms, called Bonsai, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees.
1 code implementation • 5 Mar 2018 • Rohit Babbar, Bernhard Schölkopf
The goal in extreme multi-label classification is to learn a classifier which can assign a small subset of relevant labels to an instance from an extremely large set of target labels.
2 code implementations • 8 Sep 2016 • Rohit Babbar, Bernhard Shoelkopf
In this work, we present DiSMEC, which is a large-scale distributed framework for learning one-versus-rest linear classifiers coupled with explicit capacity control to control model size.
no code implementations • NeurIPS 2013 • Rohit Babbar, Ioannis Partalas, Eric Gaussier, Massih R. Amini
We study in this paper flat and hierarchical classification strategies in the context of large-scale taxonomies.