no code implementations • 30 Dec 2024 • Yibo Wen, Chenwei Xu, Jerry Yao-Chieh Hu, Han Liu
We present a three-stage framework for training deep learning models specializing in antibody sequence-structure co-design.
no code implementations • 26 Nov 2024 • Jerry Yao-Chieh Hu, Weimin Wu, Yi-Chen Lee, Yu-Chao Huang, Minshuo Chen, Han Liu
We investigate the approximation and estimation rates of conditional diffusion transformers (DiTs) with classifier-free guidance.
no code implementations • 25 Nov 2024 • Jerry Yao-Chieh Hu, Wei-Po Wang, Ammar Gilani, Chenyang Li, Zhao Song, Han Liu
Our key contributions are prompt tuning on \textit{single-head} transformers with only a \textit{single} self-attention layer: (i) is universal, and (ii) supports efficient (even almost-linear time) algorithms under the Strong Exponential Time Hypothesis (SETH).
no code implementations • 25 Nov 2024 • Weimin Wu, Maojiang Su, Jerry Yao-Chieh Hu, Zhao Song, Han Liu
We investigate the transformer's capability for in-context learning (ICL) to simulate the training process of deep models.
no code implementations • 8 Nov 2024 • Jerry Yao-Chieh Hu, Erzhi Liu, Han Liu, Zhao Song, Lichen Zhang
Given a database of bit strings $A_1,\ldots, A_m\in \{0, 1\}^n$, a fundamental data structure task is to estimate the distances between a given query $B\in \{0, 1\}^n$ with all the strings in the database.
no code implementations • 30 Oct 2024 • Jerry Yao-Chieh Hu, Dennis Wu, Han Liu
We show that the optimal capacity of KHMs occurs when the feature space allows memories to form an optimal spherical code.
no code implementations • 3 Sep 2024 • Erzhi Liu, Jerry Yao-Chieh Hu, Alex Reneau, Zhao Song, Han Liu
In this paper, we improve the best previous result [Backurs, Lin, Mahabadi, Silwal, and Tarnawski, ICLR 2024] in three aspects: - We reduce query time by a factor of $\alpha^{-1} \log n$.
no code implementations • 1 Jul 2024 • Jerry Yao-Chieh Hu, Weimin Wu, Zhao Song, Han Liu
For backward computation, we leverage the low-rank structure within the gradient computation of DiTs training for possible algorithmic speedup.
no code implementations • 5 Jun 2024 • Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo, Zhao Song, Han Liu
We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory.
no code implementations • 3 Jun 2024 • Haozheng Luo, Jiahao Yu, Wenxin Zhang, Jialong Li, Jerry Yao-Chieh Hu, Xinyu Xing, Han Liu
We introduce a low-resource safety enhancement method for aligning large language models (LLMs) without the need for supervised fine-tuning (SFT) or reinforcement learning from human feedback (RLHF).
no code implementations • 31 May 2024 • Jiahao Yu, Haozheng Luo, Jerry Yao-Chieh Hu, Wenbo Guo, Han Liu, Xinyu Xing
Attackers carefully craft jailbreaking prompts such that a target LLM will respond to the harmful question.
1 code implementation • 5 Apr 2024 • Jerry Yao-Chieh Hu, Bo-Yu Chen, Dennis Wu, Feng Ruan, Han Liu
We present a nonparametric construction for deep learning compatible modern Hopfield models and utilize this framework to debut an efficient variant.
1 code implementation • 4 Apr 2024 • Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Robin Luo, Hong-Yu Chen, Weijian Li, Wei-Po Wang, Han Liu
We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models.
Ranked #1 on
Quantization
on Wiki-40B
1 code implementation • 4 Apr 2024 • Chenwei Xu, Yu-Chao Huang, Jerry Yao-Chieh Hu, Weijian Li, Ammar Gilani, Hsi-Sheng Goan, Han Liu
We introduce the \textbf{B}i-Directional \textbf{S}parse \textbf{Hop}field Network (\textbf{BiSHop}), a novel end-to-end framework for deep tabular learning.
1 code implementation • 4 Apr 2024 • Dennis Wu, Jerry Yao-Chieh Hu, Teng-Yun Hsiao, Han Liu
Specifically, we accomplish this by constructing a separation loss $\mathcal{L}_\Phi$ that separates the local minima of kernelized energy by separating stored memory patterns in kernel space.
no code implementations • 7 Feb 2024 • Jerry Yao-Chieh Hu, Thomas Lin, Zhao Song, Han Liu
Specifically, we establish an upper bound criterion for the norm of input query patterns and memory patterns.
1 code implementation • 28 Dec 2023 • Dennis Wu, Jerry Yao-Chieh Hu, Weijian Li, Bo-Yu Chen, Han Liu
We present STanHop-Net (Sparse Tandem Hopfield Network) for multivariate time series prediction with memory-enhanced capabilities.
no code implementations • 28 Dec 2023 • Chenwei Xu, Jerry Yao-Chieh Hu, Aakaash Narayanan, Mattson Thieme, Vladimir Nagaslaev, Mark Austin, Jeremy Arnold, Jose Berlioz, Pierrick Hanlet, Aisha Ibrahim, Dennis Nicklaus, Jovan Mitrevski, Jason Michael St. John, Gauri Pradhan, Andrea Saewert, Kiyomi Seiya, Brian Schupbach, Randy Thurman-Keup, Nhan Tran, Rui Shi, Seda Ogrenci, Alexis Maya-Isabelle Shuping, Kyle Hazelwood, Han Liu
We introduce a novel Proximal Policy Optimization (PPO) algorithm aimed at addressing the challenge of maintaining a uniform proton beam intensity delivery in the Muon to Electron Conversion Experiment (Mu2e) at Fermi National Accelerator Laboratory (Fermilab).
1 code implementation • NeurIPS 2023 • Jerry Yao-Chieh Hu, Donglin Yang, Dennis Wu, Chenwei Xu, Bo-Yu Chen, Han Liu
Building upon this, we derive the sparse memory retrieval dynamics from the sparse energy function and show its one-step approximation is equivalent to the sparse-structured attention.
1 code implementation • 9 Jun 2023 • Alex Reneau, Jerry Yao-Chieh Hu, Chenwei Xu, Weijian Li, Ammar Gilani, Han Liu
We introduce the concept of programmable feature engineering for time series modeling and propose a feature programming framework.