Search Results for author: Si Si

Found 23 papers, 7 papers with code

Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory

1 code implementation19 Nov 2022 Justin Cui, Ruochen Wang, Si Si, Cho-Jui Hsieh

Among recently proposed methods, Matching Training Trajectories (MTT) achieves state-of-the-art performance on CIFAR-10/100, while having difficulty scaling to ImageNet-1k dataset due to the large memory requirement when performing unrolled gradient computation through back-propagation.

Preserving In-Context Learning ability in Large Language Model Fine-tuning

no code implementations1 Nov 2022 Yihan Wang, Si Si, Daliang Li, Michal Lukasik, Felix Yu, Cho-Jui Hsieh, Inderjit S Dhillon, Sanjiv Kumar

More importantly, ProMoT shows remarkable generalization ability on tasks that have different formats, e. g. fine-tuning on a NLI binary classification task improves the model's in-context ability to do summarization (+0. 53 Rouge-2 score compared to the pretrained model), making ProMoT a promising method to build general purpose capabilities such as grounding and reasoning into LLMs with small but high quality datasets.

Domain Generalization Few-Shot Learning +2

DC-BENCH: Dataset Condensation Benchmark

2 code implementations20 Jul 2022 Justin Cui, Ruochen Wang, Si Si, Cho-Jui Hsieh

Dataset Condensation is a newly emerging technique aiming at learning a tiny dataset that captures the rich information encoded in the original dataset.

Data Augmentation Data Compression +2

How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers

no code implementations19 Oct 2020 Yuanhao Xiong, Xuanqing Liu, Li-Cheng Lan, Yang You, Si Si, Cho-Jui Hsieh

For end-to-end efficiency, unlike previous work that assumes random hyperparameter tuning, which over-emphasizes the tuning time, we propose to evaluate with a bandit hyperparameter tuning strategy.

Benchmarking Graph Mining

Multi-Stage Influence Function

no code implementations NeurIPS 2020 Hongge Chen, Si Si, Yang Li, Ciprian Chelba, Sanjiv Kumar, Duane Boning, Cho-Jui Hsieh

With this score, we can identify the pretraining examples in the pretraining task that contribute most to a prediction in the finetuning task.

Transfer Learning

Auto Completion of User Interface Layout Design Using Transformer-Based Tree Decoders

no code implementations14 Jan 2020 Yang Li, Julien Amelot, Xin Zhou, Samy Bengio, Si Si

While we focus on interface layout prediction, our model can be generally applicable for other layout prediction problems that involve tree structures and 2-dimensional placements.

Layout Design

Robustness Verification of Tree-based Models

2 code implementations NeurIPS 2019 Hongge Chen, huan zhang, Si Si, Yang Li, Duane Boning, Cho-Jui Hsieh

We show that there is a simple linear time algorithm for verifying a single tree, and for tree ensembles, the verification problem can be cast as a max-clique problem on a multi-partite graph with bounded boxicity.

Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise

1 code implementation5 Jun 2019 Xuanqing Liu, Tesi Xiao, Si Si, Qin Cao, Sanjiv Kumar, Cho-Jui Hsieh

In this paper, we propose a new continuous neural network framework called Neural Stochastic Differential Equation (Neural SDE) network, which naturally incorporates various commonly used regularization mechanisms based on random noise injection.

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

5 code implementations KDD 2019 Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, Cho-Jui Hsieh

Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99. 36 on the PPI dataset, while the previous best result was 98. 71 by [16].

Graph Clustering Link Prediction +1

You Look Twice: GaterNet for Dynamic Filter Selection in CNNs

no code implementations CVPR 2019 Zhourong Chen, Yang Li, Samy Bengio, Si Si

The concept of conditional computation for deep nets has been proposed previously to improve model performance by selectively using only parts of the model conditioned on the sample it is processing.

Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks

no code implementations ICLR 2019 Patrick H. Chen, Si Si, Sanjiv Kumar, Yang Li, Cho-Jui Hsieh

The algorithm achieves an order of magnitude faster inference than the original softmax layer for predicting top-$k$ words in various tasks such as beam search in machine translation or next words prediction.

Machine Translation Translation

Area Attention

1 code implementation ICLR 2019 Yang Li, Lukasz Kaiser, Samy Bengio, Si Si

We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e. g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences.

Image Captioning Machine Translation +1

GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking

no code implementations NeurIPS 2018 Patrick H. Chen, Si Si, Yang Li, Ciprian Chelba, Cho-Jui Hsieh

Model compression is essential for serving large deep neural nets on devices with limited resources or applications that require real-time responses.

Language Modelling Model Compression +1

Nonlinear Online Learning with Adaptive Nyström Approximation

no code implementations21 Feb 2018 Si Si, Sanjiv Kumar, Yang Li

Use of nonlinear feature maps via kernel approximation has led to success in many online learning tasks.

GPU-acceleration for Large-scale Tree Boosting

3 code implementations26 Jun 2017 Huan Zhang, Si Si, Cho-Jui Hsieh

In this paper, we present a novel massively parallel algorithm for accelerating the decision tree building procedure on GPUs (Graphics Processing Units), which is a crucial step in Gradient Boosted Decision Tree (GBDT) and random forests training.

Kernel Ridge Regression via Partitioning

no code implementations5 Aug 2016 Rashish Tandon, Si Si, Pradeep Ravikumar, Inderjit Dhillon

In this paper, we investigate a divide and conquer approach to Kernel Ridge Regression (KRR).

Generalization Bounds regression

Communication-Efficient Parallel Block Minimization for Kernel Machines

no code implementations5 Aug 2016 Cho-Jui Hsieh, Si Si, Inderjit S. Dhillon

Kernel machines often yield superior predictive performance on various tasks; however, they suffer from severe computational challenges.

Multi-Scale Spectral Decomposition of Massive Graphs

no code implementations NeurIPS 2014 Si Si, Donghyuk Shin, Inderjit S. Dhillon, Beresford N. Parlett

Thus, eigenvectors of the clusters serve as good initializations to a block Lanczos algorithm that is used to compute spectral decomposition of the original graph.

Fast Prediction for Large-Scale Kernel Machines

no code implementations NeurIPS 2014 Cho-Jui Hsieh, Si Si, Inderjit S. Dhillon

Second, we provide a new theoretical analysis on bounding the error of the solution computed by using Nystr¨om kernel approximation method, and show that the error is related to the weighted kmeans objective function where the weights are given by the model computed from the original kernel.

General Classification regression

A Divide-and-Conquer Solver for Kernel Support Vector Machines

no code implementations4 Nov 2013 Cho-Jui Hsieh, Si Si, Inderjit S. Dhillon

We show theoretically that the support vectors identified by the subproblem solution are likely to be support vectors of the entire kernel SVM problem, provided that the problem is partitioned appropriately by kernel clustering.

Cannot find the paper you are looking for? You can Submit a new open access paper.