1 code implementation • 19 Nov 2022 • Justin Cui, Ruochen Wang, Si Si, Cho-Jui Hsieh
Among recently proposed methods, Matching Training Trajectories (MTT) achieves state-of-the-art performance on CIFAR-10/100, while having difficulty scaling to ImageNet-1k dataset due to the large memory requirement when performing unrolled gradient computation through back-propagation.
no code implementations • 1 Nov 2022 • Yihan Wang, Si Si, Daliang Li, Michal Lukasik, Felix Yu, Cho-Jui Hsieh, Inderjit S Dhillon, Sanjiv Kumar
More importantly, ProMoT shows remarkable generalization ability on tasks that have different formats, e. g. fine-tuning on a NLI binary classification task improves the model's in-context ability to do summarization (+0. 53 Rouge-2 score compared to the pretrained model), making ProMoT a promising method to build general purpose capabilities such as grounding and reasoning into LLMs with small but high quality datasets.
2 code implementations • 20 Jul 2022 • Justin Cui, Ruochen Wang, Si Si, Cho-Jui Hsieh
Dataset Condensation is a newly emerging technique aiming at learning a tiny dataset that captures the rich information encoded in the original dataset.
no code implementations • NeurIPS 2021 • Yang Li, Si Si, Gang Li, Cho-Jui Hsieh, Samy Bengio
Attentional mechanisms are order-invariant.
no code implementations • 19 Oct 2020 • Yuanhao Xiong, Xuanqing Liu, Li-Cheng Lan, Yang You, Si Si, Cho-Jui Hsieh
For end-to-end efficiency, unlike previous work that assumes random hyperparameter tuning, which over-emphasizes the tuning time, we propose to evaluate with a bandit hyperparameter tuning strategy.
no code implementations • NeurIPS 2020 • Hongge Chen, Si Si, Yang Li, Ciprian Chelba, Sanjiv Kumar, Duane Boning, Cho-Jui Hsieh
With this score, we can identify the pretraining examples in the pretraining task that contribute most to a prediction in the finetuning task.
no code implementations • 14 Jan 2020 • Yang Li, Julien Amelot, Xin Zhou, Samy Bengio, Si Si
While we focus on interface layout prediction, our model can be generally applicable for other layout prediction problems that involve tree structures and 2-dimensional placements.
no code implementations • NeurIPS 2019 • Xuanqing Liu, Si Si, Xiaojin Zhu, Yang Li, Cho-Jui Hsieh
In this paper, we proposed a general framework for data poisoning attacks to graph-based semi-supervised learning (G-SSL).
2 code implementations • NeurIPS 2019 • Hongge Chen, huan zhang, Si Si, Yang Li, Duane Boning, Cho-Jui Hsieh
We show that there is a simple linear time algorithm for verifying a single tree, and for tree ensembles, the verification problem can be cast as a max-clique problem on a multi-partite graph with bounded boxicity.
1 code implementation • 5 Jun 2019 • Xuanqing Liu, Tesi Xiao, Si Si, Qin Cao, Sanjiv Kumar, Cho-Jui Hsieh
In this paper, we propose a new continuous neural network framework called Neural Stochastic Differential Equation (Neural SDE) network, which naturally incorporates various commonly used regularization mechanisms based on random noise injection.
5 code implementations • KDD 2019 • Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, Cho-Jui Hsieh
Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99. 36 on the PPI dataset, while the previous best result was 98. 71 by [16].
Ranked #1 on
Node Classification
on Amazon2M
no code implementations • CVPR 2019 • Zhourong Chen, Yang Li, Samy Bengio, Si Si
The concept of conditional computation for deep nets has been proposed previously to improve model performance by selectively using only parts of the model conditioned on the sample it is processing.
no code implementations • ICLR 2019 • Patrick H. Chen, Si Si, Sanjiv Kumar, Yang Li, Cho-Jui Hsieh
The algorithm achieves an order of magnitude faster inference than the original softmax layer for predicting top-$k$ words in various tasks such as beam search in machine translation or next words prediction.
1 code implementation • ICLR 2019 • Yang Li, Lukasz Kaiser, Samy Bengio, Si Si
We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e. g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences.
no code implementations • NeurIPS 2018 • Patrick H. Chen, Si Si, Yang Li, Ciprian Chelba, Cho-Jui Hsieh
Model compression is essential for serving large deep neural nets on devices with limited resources or applications that require real-time responses.
no code implementations • 21 Feb 2018 • Si Si, Sanjiv Kumar, Yang Li
Use of nonlinear feature maps via kernel approximation has led to success in many online learning tasks.
no code implementations • ICML 2017 • Si Si, huan zhang, S. Sathiya Keerthi, Dhruv Mahajan, Inderjit S. Dhillon, Cho-Jui Hsieh
In this paper, we study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse.
3 code implementations • 26 Jun 2017 • Huan Zhang, Si Si, Cho-Jui Hsieh
In this paper, we present a novel massively parallel algorithm for accelerating the decision tree building procedure on GPUs (Graphics Processing Units), which is a crucial step in Gradient Boosted Decision Tree (GBDT) and random forests training.
no code implementations • 5 Aug 2016 • Rashish Tandon, Si Si, Pradeep Ravikumar, Inderjit Dhillon
In this paper, we investigate a divide and conquer approach to Kernel Ridge Regression (KRR).
no code implementations • 5 Aug 2016 • Cho-Jui Hsieh, Si Si, Inderjit S. Dhillon
Kernel machines often yield superior predictive performance on various tasks; however, they suffer from severe computational challenges.
no code implementations • NeurIPS 2014 • Si Si, Donghyuk Shin, Inderjit S. Dhillon, Beresford N. Parlett
Thus, eigenvectors of the clusters serve as good initializations to a block Lanczos algorithm that is used to compute spectral decomposition of the original graph.
no code implementations • NeurIPS 2014 • Cho-Jui Hsieh, Si Si, Inderjit S. Dhillon
Second, we provide a new theoretical analysis on bounding the error of the solution computed by using Nystr¨om kernel approximation method, and show that the error is related to the weighted kmeans objective function where the weights are given by the model computed from the original kernel.
no code implementations • 4 Nov 2013 • Cho-Jui Hsieh, Si Si, Inderjit S. Dhillon
We show theoretically that the support vectors identified by the subproblem solution are likely to be support vectors of the entire kernel SVM problem, provided that the problem is partitioned appropriately by kernel clustering.