Search Results for author: Guangxuan Xiao

Found 12 papers, 7 papers with code

Retrieval Head Mechanistically Explains Long-Context Factuality

no code implementations • 24 Apr 2024 • Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, Yao Fu

Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context.

Continual Pretraining Hallucination +2

Paper
Add Code

BitDelta: Your Fine-Tune May Only Be Worth One Bit

1 code implementation • 15 Feb 2024 • James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai

Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks.

159

Paper
Code

InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory

no code implementations • 7 Feb 2024 • Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Song Han, Maosong Sun

To alleviate these issues, existing efforts employ sliding attention windows and discard distant tokens to achieve the processing of extremely long sequences.

Paper
Add Code

Efficient Streaming Language Models with Attention Sinks

5 code implementations • 29 Sep 2023 • Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis

In this paper, we first demonstrate that the emergence of attention sink is due to the strong attention scores towards initial tokens as a "sink" even if they are not semantically important.

Language Modelling

6,196

Paper
Code

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

5 code implementations • 1 Jun 2023 • Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han

We then propose to search for the optimal per-channel scaling that protects the salient weights by observing the activation, not weights.

Autonomous Driving Common Sense Reasoning +3

18,403

Paper
Code

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

1 code implementation • 17 May 2023 • Guangxuan Xiao, Tianwei Yin, William T. Freeman, Frédo Durand, Song Han

FastComposer proposes delayed subject conditioning in the denoising step to maintain both identity and editability in subject-driven image generation.

Ranked #7 on Diffusion Personalization Tuning Free on AgeDB

Denoising Diffusion Personalization Tuning Free +1

587

Paper
Code

Sparse and Local Networks for Hypergraph Reasoning

no code implementations • 9 Mar 2023 • Guangxuan Xiao, Leslie Pack Kaelbling, Jiajun Wu, Jiayuan Mao

Reasoning about the relationships between entities from input facts (e. g., whether Ari is a grandparent of Charlie) generally requires explicit consideration of other entities that are not mentioned in the query (e. g., the parents of Charlie).

Knowledge Graphs World Knowledge

Paper
Add Code

Offsite-Tuning: Transfer Learning without Full Model

1 code implementation • 9 Feb 2023 • Guangxuan Xiao, Ji Lin, Song Han

In this paper, we propose Offsite-Tuning, a privacy-preserving and efficient transfer learning framework that can adapt billion-parameter foundation models to downstream data without access to the full model.

Privacy Preserving Transfer Learning

359

Paper
Code

FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training

no code implementations • 18 Jan 2023 • Kezhao Huang, Haitian Jiang, Minjie Wang, Guangxuan Xiao, David Wipf, Xiang Song, Quan Gan, Zengfeng Huang, Jidong Zhai, Zheng Zhang

A key performance bottleneck when training graph neural network (GNN) models on large, real-world graphs is loading node features onto a GPU.

Paper
Add Code

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

3 code implementations • 18 Nov 2022 • Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, Song Han

We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs.

Quantization

11,631

Paper
Code

Efficient Training and Inference of Hypergraph Reasoning Networks

no code implementations • 29 Sep 2021 • Guangxuan Xiao, Leslie Pack Kaelbling, Jiajun Wu, Jiayuan Mao

To leverage the sparsity in hypergraph neural networks, SpaLoc represents the grounding of relationships such as parent and grandparent as sparse tensors and uses neural networks and finite-domain quantification operations to infer new facts based on the input.

Knowledge Graphs Logical Reasoning +1

Paper
Add Code

Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks

1 code implementation • ICML Workshop AML 2021 • Zhengyan Zhang, Guangxuan Xiao, Yongwei Li, Tian Lv, Fanchao Qi, Zhiyuan Liu, Yasheng Wang, Xin Jiang, Maosong Sun

In this work, we demonstrate the universal vulnerability of PTMs, where fine-tuned PTMs can be easily controlled by backdoor attacks in arbitrary downstream tasks.

Backdoor Attack

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.