no code implementations • LChange (ACL) 2022 • Iiro Rastas, Yann Ciarán Ryan, Iiro Tiihonen, Mohammadreza Qaraei, Liina Repo, Rohit Babbar, Eetu Mäkelä, Mikko Tolonen, Filip Ginter
In this paper, we describe a BERT model trained on the Eighteenth Century Collections Online (ECCO) dataset of digitized documents.
Optical Character Recognition Optical Character Recognition (OCR)
no code implementations • 29 Jan 2024 • Erik Schultheis, Wojciech Kotłowski, Marek Wydmuch, Rohit Babbar, Strom Borman, Krzysztof Dembczyński
We consider the optimization of complex performance metrics in multi-label classification under the population utility framework.
2 code implementations • NeurIPS 2023 • Erik Schultheis, Marek Wydmuch, Wojciech Kotłowski, Rohit Babbar, Krzysztof Dembczyński
As such, it is characterized by long-tail labels, i. e., most labels have very few positive instances.
no code implementations • 6 Jun 2023 • Erik Schultheis, Rohit Babbar
In classification problems with large output spaces (up to millions of labels), the last layer can require an enormous amount of memory.
no code implementations • 29 Oct 2022 • Siddhant Kharbanda, Atmadeep Banerjee, Erik Schultheis, Rohit Babbar
We thus propose CascadeXML, an end-to-end multi-resolution learning pipeline, which can harness the multi-layered architecture of a transformer model for attending to different label resolutions with separate feature representations.
Extreme Multi-Label Classification Multi Label Text Classification +2
no code implementations • 26 Jul 2022 • Erik Schultheis, Marek Wydmuch, Rohit Babbar, Krzysztof Dembczyński
The propensity model introduced by Jain et al. 2016 has become a standard approach for dealing with missing and long-tail labels in extreme multi-label classification (XMLC).
1 code implementation • 14 Dec 2021 • Mohammadreza Qaraei, Rohit Babbar
Extreme Multilabel Text Classification (XMTC) is a text classification problem in which, (i) the output space is extremely large, (ii) each data point may have multiple positive labels, and (iii) the data follows a strongly imbalanced distribution.
1 code implementation • 20 Oct 2021 • Marek Wydmuch, Kalina Jasinska-Kobus, Rohit Babbar, Krzysztof Dembczyński
Extreme multi-label classification (XMLC) refers to the task of tagging instances with small subsets of relevant labels coming from an extremely large set of all possible labels.
no code implementations • 27 Sep 2021 • Erik Schultheis, Rohit Babbar
We want to start in a region of weight space a) with low loss value, b) that is favourable for second-order optimization, and c) where the conjugate-gradient (CG) calculations can be performed quickly.
no code implementations • 23 Sep 2021 • Erik Schultheis, Rohit Babbar
This paper considers binary and multilabel classification problems in a setting where labels are missing independently and with a known rate.
1 code implementation • 13 Sep 2021 • Siddhant Kharbanda, Atmadeep Banerjee, Akash Palrecha, Devaansh Gupta, Rohit Babbar
Automatic annotation of short-text data to a large number of target labels, referred to as Short Text Extreme Classification, has found numerous applications including prediction of related searches and product recommendation tasks.
no code implementations • 1 Jul 2020 • Erik Schultheis, Mohammadreza Qaraei, Priyanshu Gupta, Rohit Babbar
In addition to the computational burden arising from large number of training instances, features and labels, problems in XMC are faced with two statistical challenges, (i) large number of 'tail-labels' -- those which occur very infrequently, and (ii) missing labels as it is virtually impossible to manually assign every relevant label to an instance.
3 code implementations • 17 Apr 2019 • Sujay Khandagale, Han Xiao, Rohit Babbar
In this paper, we develop a suite of algorithms, called Bonsai, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees.
1 code implementation • 5 Mar 2018 • Rohit Babbar, Bernhard Schölkopf
The goal in extreme multi-label classification is to learn a classifier which can assign a small subset of relevant labels to an instance from an extremely large set of target labels.
2 code implementations • 8 Sep 2016 • Rohit Babbar, Bernhard Shoelkopf
In this work, we present DiSMEC, which is a large-scale distributed framework for learning one-versus-rest linear classifiers coupled with explicit capacity control to control model size.
no code implementations • NeurIPS 2013 • Rohit Babbar, Ioannis Partalas, Eric Gaussier, Massih R. Amini
We study in this paper flat and hierarchical classification strategies in the context of large-scale taxonomies.