Search Results for author: Tianle Cai

Found 28 papers, 19 papers with code

Do Transformers Really Perform Bad for Graph Representation?

4 code implementations • 9 Jun 2021 • Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.

Ranked #1 on Graph Regression on PCQM4M-LSC

Graph Classification Graph Property Prediction +2

1,902

Paper
Code

First Place Solution of KDD Cup 2021 & OGB Large-Scale Challenge Graph Prediction Track

4 code implementations • 15 Jun 2021 • Chengxuan Ying, Mingqi Yang, Shuxin Zheng, Guolin Ke, Shengjie Luo, Tianle Cai, Chenglin Wu, Yuxin Wang, Yanming Shen, Di He

In this technical report, we present our solution of KDD Cup 2021 OGB Large-Scale Challenge - PCQM4M-LSC Track.

1,902

Paper
Code

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

1 code implementation • 19 Jan 2024 • Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao

We present two levels of fine-tuning procedures for Medusa to meet the needs of different use cases: Medusa-1: Medusa is directly fine-tuned on top of a frozen backbone LLM, enabling lossless inference acceleration.

1,856

Paper
Code

Large Language Models as Tool Makers

1 code implementation • 26 May 2023 • Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, Denny Zhou

Our approach consists of two phases: 1) tool making: an LLM acts as the tool maker that crafts tools for a set of tasks.

1,000

Paper
Code

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

1 code implementation • 11 Apr 2024 • Yikang Shen, Zhen Guo, Tianle Cai, Zengyi Qin

Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human intelligence.

901

Paper
Code

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

1 code implementation • 29 Feb 2024 • Muyang Li, Tianle Cai, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Ming-Yu Liu, Kai Li, Song Han

To overcome this dilemma, we observe the high similarity between the input from adjacent diffusion steps and propose displaced patch parallelism, which takes advantage of the sequential nature of the diffusion process by reusing the pre-computed feature maps from the previous timestep to provide context for the current step.

417

Paper
Code

Locally Differentially Private (Contextual) Bandits Learning

3 code implementations • NeurIPS 2020 • Kai Zheng, Tianle Cai, Weiran Huang, Zhenguo Li, Li-Wei Wang

We study locally differentially private (LDP) bandits learning in this paper.

Multi-Armed Bandits Privacy Preserving Deep Learning

334

Paper
Code

BitDelta: Your Fine-Tune May Only Be Worth One Bit

1 code implementation • 15 Feb 2024 • James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai

Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks.

159

Paper
Code

What Makes Convolutional Models Great on Long Sequence Modeling?

1 code implementation • 17 Oct 2022 • Yuhong Li, Tianle Cai, Yi Zhang, Deming Chen, Debadeepta Dey

We focus on the structure of the convolution kernel and identify two critical but intuitive principles enjoyed by S4 that are sufficient to make up an effective global convolutional model: 1) The parameterization of the convolutional kernel needs to be efficient in the sense that the number of parameters should scale sub-linearly with sequence length.

Ranked #6 on Long-range modeling on LRA

Long-range modeling

158

Paper
Code

REST: Retrieval-Based Speculative Decoding

1 code implementation • 14 Nov 2023 • Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D. Lee, Di He

We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation.

Language Modelling Retrieval +1

133

Paper
Code

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

1 code implementation • 7 Sep 2020 • Tianle Cai, Shengjie Luo, Keyulu Xu, Di He, Tie-Yan Liu, Li-Wei Wang

We provide an explanation by showing that InstanceNorm serves as a preconditioner for GNNs, but such preconditioning effect is weaker with BatchNorm due to the heavy batch noise in graph datasets.

Ranked #25 on Graph Property Prediction on ogbg-molhiv

Graph Classification Graph Representation Learning

Paper
Code

SnapKV: LLM Knows What You are Looking for Before Generation

1 code implementation • 22 Apr 2024 • Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen

Specifically, SnapKV achieves a consistent decoding speed with a 3. 6x increase in generation speed and an 8. 2x enhancement in memory efficiency compared to baseline when processing inputs of 16K tokens.

16k

Paper
Code

Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot

1 code implementation • NeurIPS 2020 • Jingtong Su, Yihang Chen, Tianle Cai, Tianhao Wu, Ruiqi Gao, Li-Wei Wang, Jason D. Lee

In this paper, we conduct sanity checks for the above beliefs on several recent unstructured pruning methods and surprisingly find that: (1) A set of methods which aims to find good subnetworks of the randomly-initialized network (which we call "initial tickets"), hardly exploits any information from the training data; (2) For the pruned networks obtained by these methods, randomly changing the preserved weights in each layer, while keeping the total number of preserved weights unchanged per layer, does not affect the final performance.

Network Pruning

Paper
Code

Towards Certifying L-infinity Robustness using Neural Networks with L-inf-dist Neurons

2 code implementations • 10 Feb 2021 • Bohang Zhang, Tianle Cai, Zhou Lu, Di He, LiWei Wang

This directly provides a rigorous guarantee of the certified robustness based on the margin of prediction outputs.

Paper
Code

Reward Collapse in Aligning Large Language Models

1 code implementation • 28 May 2023 • Ziang Song, Tianle Cai, Jason D. Lee, Weijie J. Su

This insight allows us to derive closed-form expressions for the reward distribution associated with a set of utility functions in an asymptotic regime.

Paper
Code

Adversarially Robust Generalization Just Requires More Unlabeled Data

1 code implementation • 3 Jun 2019 • Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, Li-Wei Wang

Neural network robustness has recently been highlighted by the existence of adversarial examples.

Paper
Code

Accelerating Greedy Coordinate Gradient via Probe Sampling

1 code implementation • 2 Mar 2024 • Yiran Zhao, Wenyue Zheng, Tianle Cai, Xuan Long Do, Kenji Kawaguchi, Anirudh Goyal, Michael Shieh

Safety of Large Language Models (LLMs) has become a central issue given their rapid progress and wide applications.

Paper
Code

Defective Convolutional Networks

1 code implementation • 19 Nov 2019 • Tiange Luo, Tianle Cai, Mengxiao Zhang, Siyu Chen, Di He, Li-Wei Wang

Robustness of convolutional neural networks (CNNs) has gained in importance on account of adversarial examples, i. e., inputs added as well-designed perturbations that are imperceptible to humans but can cause the model to predict incorrectly.

Paper
Code

RANDOM MASK: Towards Robust Convolutional Neural Networks

1 code implementation • ICLR 2019 • Tiange Luo, Tianle Cai, Mengxiao Zhang, Siyu Chen, Li-Wei Wang

We next investigate the adversarial examples which 'fool' a CNN with Random Mask.

Paper
Code

Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems

no code implementations • 28 May 2019 • Tianle Cai, Ruiqi Gao, Jikai Hou, Siyu Chen, Dong Wang, Di He, Zhihua Zhang, Li-Wei Wang

First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks.

regression Second-order methods

Paper
Add Code

Convergence of Adversarial Training in Overparametrized Neural Networks

no code implementations • NeurIPS 2019 • Ruiqi Gao, Tianle Cai, Haochuan Li, Li-Wei Wang, Cho-Jui Hsieh, Jason D. Lee

Neural networks are vulnerable to adversarial examples, i. e. inputs that are imperceptibly perturbed from natural data and yet incorrectly classified by the network.

Paper
Add Code

A Theory of Label Propagation for Subpopulation Shift

no code implementations • 22 Feb 2021 • Tianle Cai, Ruiqi Gao, Jason D. Lee, Qi Lei

In this work, we propose a provably effective framework for domain adaptation based on label propagation.

Domain Adaptation Generalization Bounds

Paper
Add Code

Towards a Theoretical Framework of Out-of-Distribution Generalization

no code implementations • NeurIPS 2021 • Haotian Ye, Chuanlong Xie, Tianle Cai, Ruichen Li, Zhenguo Li, LiWei Wang

We also introduce a new concept of expansion function, which characterizes to what extent the variance is amplified in the test domains over the training domains, and therefore give a quantitative meaning of invariant features.

Domain Generalization Model Selection +1

Paper
Add Code

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

no code implementations • NeurIPS 2021 • Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, LiWei Wang, Tie-Yan Liu

Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing.

Paper
Add Code

Do Transformers Really Perform Badly for Graph Representation?

no code implementations • NeurIPS 2021 • Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.

Graph Representation Learning

Paper
Add Code

Defective Convolutional Layers Learn Robust CNNs

no code implementations • 25 Sep 2019 • Tiange Luo, Tianle Cai, Xiaomeng Zhang, Siyu Chen, Di He, LiWei Wang

We first show that predictions made by the defective CNN are less dependent on textural information, but more on shape information, and further find that adversarial examples generated by the defective CNN appear to have semantic shapes.

Paper
Add Code

Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond

no code implementations • 19 Jul 2022 • Yuzheng Hu, Tianle Cai, Jinyong Shan, Shange Tang, Chaochao Cai, Ethan Song, Bo Li, Dawn Song

We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks, where the protocols might differ between one another, yet a procedure of obtaining local gradients is implicitly shared.

Philosophy Privacy Preserving +2

Paper
Add Code

Scaling In-Context Demonstrations with Structured Attention

no code implementations • 5 Jul 2023 • Tianle Cai, Kaixuan Huang, Jason D. Lee, Mengdi Wang

However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations.

In-Context Learning Sentence

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.