no code implementations • 3 Aug 2022 • Samarth Gupta, Daniel N. Hill, Lexing Ying, Inderjit Dhillon
Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model.
no code implementations • 1 Jun 2022 • Anish Acharya, Sujay Sanghavi, Li Jing, Bhargav Bhushanam, Dhruv Choudhary, Michael Rabbat, Inderjit Dhillon
Self-supervised pretraining on unlabeled data followed by supervised finetuning on labeled data is a popular paradigm for learning from limited labeled examples.
no code implementations • 21 Feb 2022 • Haoya Li, Hsiang-Fu Yu, Lexing Ying, Inderjit Dhillon
Entropy regularized Markov decision processes have been widely used in reinforcement learning.
1 code implementation • NAACL 2022 • Yuanhao Xiong, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu Yu, Inderjit Dhillon
To learn the semantic embeddings of instances and labels with raw text, we propose to pre-train Transformer-based encoders with self-supervised contrastive losses.
Multi Label Text Classification
Multi-Label Text Classification
+2
no code implementations • NeurIPS 2021 • Pei-Hung Chen, Hsiang-Fu Yu, Inderjit Dhillon, Cho-Jui Hsieh
In addition to compressing standard models, out method can also be used on distilled BERT models to further improve compression rate.
no code implementations • 5 Oct 2021 • Haoya Li, Samarth Gupta, HsiangFu Yu, Lexing Ying, Inderjit Dhillon
This paper proposes an approximate Newton method for the policy gradient algorithm with entropy regularization.
1 code implementation • 4 Jun 2021 • Philip A. Etter, Kai Zhong, Hsiang-Fu Yu, Lexing Ying, Inderjit Dhillon
In industrial applications, these models operate at extreme scales, where every bit of performance is critical.
1 code implementation • 15 Feb 2021 • Rajat Sen, Alexander Rakhlin, Lexing Ying, Rahul Kidambi, Dean Foster, Daniel Hill, Inderjit Dhillon
We show that our algorithm has a regret guarantee of $O(k\sqrt{(A-k+1)T \log (|\mathcal{F}|T)})$, where $A$ is the total number of arms and $\mathcal{F}$ is the class containing the regression function, while only requiring $\tilde{O}(A)$ computation per time step.
no code implementations • 28 Nov 2020 • Devvrit, Minhao Cheng, Cho-Jui Hsieh, Inderjit Dhillon
Several previous attempts tackled this problem by ensembling the soft-label prediction and have been proved vulnerable based on the latest attack methods.
1 code implementation • 20 Nov 2020 • Abolfazl Hashemi, Anish Acharya, Rudrajit Das, Haris Vikalo, Sujay Sanghavi, Inderjit Dhillon
In this paper, we show that, in such compressed decentralized optimization settings, there are benefits to having {\em multiple} gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e. g. by means of reducing the precision of compressed information.
no code implementations • ICML 2020 • Yanyao Shen, Hsiang-Fu Yu, Sujay Sanghavi, Inderjit Dhillon
Current XMC approaches are not built for such multi-instance multi-label (MIML) training data, and MIML approaches do not scale to XMC sizes.
1 code implementation • ICML 2020 • Xuanqing Liu, Hsiang-Fu Yu, Inderjit Dhillon, Cho-Jui Hsieh
The main reason is that position information among input units is not inherently encoded, i. e., the models are permutation equivalent; this problem justifies why all of the existing models are accompanied by a sinusoidal encoding/embedding layer at the input.
Ranked #5 on
Semantic Textual Similarity
on MRPC
no code implementations • 17 Feb 2020 • Minhao Cheng, Qi Lei, Pin-Yu Chen, Inderjit Dhillon, Cho-Jui Hsieh
Adversarial training has become one of the most effective methods for improving robustness of neural networks.
1 code implementation • NeurIPS 2019 • Rajat Sen, Hsiang-Fu Yu, Inderjit Dhillon
Forecasting high-dimensional time series plays a crucial role in many applications such as demand forecasting and financial predictions.
2 code implementations • 7 May 2019 • Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, Inderjit Dhillon
However, naively applying deep transformer models to the XMC problem leads to sub-optimal performance due to the large output space and the label sparsity issue.
Extreme Multi-Label Classification
General Classification
+3
no code implementations • 1 Nov 2018 • Anish Acharya, Rahul Goel, Angeliki Metallinou, Inderjit Dhillon
Empirically, we show that the proposed method can achieve 90% compression with minimal impact in accuracy for sentence classification tasks, and outperforms alternative methods like fixed-point quantization or offline word embedding compression.
no code implementations • 5 Aug 2016 • Rashish Tandon, Si Si, Pradeep Ravikumar, Inderjit Dhillon
In this paper, we investigate a divide and conquer approach to Kernel Ridge Regression (KRR).
no code implementations • 19 Feb 2016 • Prateek Jain, Nikhil Rao, Inderjit Dhillon
Several learning applications require solving high-dimensional regression problems where the relevant features belong to a small number of (overlapping) groups.
no code implementations • 4 Sep 2015 • Arnaud Vandaele, Nicolas Gillis, Qi Lei, Kai Zhong, Inderjit Dhillon
Given a symmetric nonnegative matrix $A$, symmetric nonnegative matrix factorization (symNMF) is the problem of finding a nonnegative matrix $H$, usually with much fewer columns than $A$, such that $A \approx HH^T$.
1 code implementation • 1 Dec 2013 • Hyokun Yun, Hsiang-Fu Yu, Cho-Jui Hsieh, S. V. N. Vishwanathan, Inderjit Dhillon
One of the key features of NOMAD is that the ownership of a variable is asynchronously transferred between processors in a decentralized fashion.
Distributed, Parallel, and Cluster Computing