no code implementations • 15 Oct 2024 • YIngyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Yufa Zhou
Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives, from conversational AI to search and AI assistants.
no code implementations • 15 Oct 2024 • Bo Chen, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song
Our results demonstrate that as long as the input data has a constant condition number, e. g., $n = O(d)$, the linear looped Transformers can achieve a small error by multi-step gradient descent during in-context learning.
no code implementations • 15 Oct 2024 • Yekun Ke, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song
Recent empirical studies have identified fixed point iteration phenomena in deep neural networks, where the hidden state tends to stabilize after several layers, showing minimal change in subsequent layers.
no code implementations • 14 Oct 2024 • Bo Chen, YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song
Our approach achieves a running time of $O(mn^{4/5})$ significantly faster than the naive approach $O(mn)$ for attention generation, where $n$ is the context length, $m$ is the query length, and $d$ is the hidden dimension.
no code implementations • 12 Oct 2024 • Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou
For small cache sizes, we provide an algorithm that improves over existing methods and achieves the tight bounds.
no code implementations • 12 Oct 2024 • YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou
In contrast, the multi-layer perceptrons with $\mathsf{ReLU}$ activation ($\mathsf{ReLU}$-$\mathsf{MLP}$), one of the most fundamental components of neural networks, is known to be expressive; specifically, a two-layer neural network is a universal approximator given an exponentially large number of hidden neurons.
1 code implementation • 25 Sep 2024 • Zhenmei Shi, Yifei Ming, Xuan-Phi Nguyen, YIngyu Liang, Shafiq Joty
Our research introduces a novel approach for the long context bottleneck to accelerate LLM inference and reduce GPU memory consumption.
no code implementations • 23 Aug 2024 • YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou
The computational complexity of the self-attention mechanism in popular transformer architectures poses significant challenges for training and inference, and becomes the bottleneck for long inputs.
no code implementations • 22 Aug 2024 • Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song
In this work, we improved the analysis of the running time of SparseGPT [Frantar, Alistarh ICML 2023] from $O(d^{3})$ to $O(d^{\omega} + d^{2+a+o(1)} + d^{1+\omega(1, 1, a)-a})$ for any $a \in [0, 1]$, where $\omega$ is the exponent of matrix multiplication.
no code implementations • 12 Aug 2024 • Jiuxiang Gu, Xiaoyu Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Junwei Yu
Determining the John ellipsoid - the largest volume ellipsoid contained within a convex polytope - is a fundamental problem with applications in machine learning, optimization, and data analytics.
1 code implementation • 22 Jul 2024 • Zhuoyan Xu, Zhenmei Shi, YIngyu Liang
In this study, we delve into the ICL capabilities of LLMs on composite tasks, with only simple tasks as in-context examples.
no code implementations • 20 Jul 2024 • YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou
In addition, our data structure can guarantee that the process of answering user query satisfies $(\epsilon, \delta)$-DP with $\widetilde{O}(n^{-1} \epsilon^{-1} \alpha^{-1/2} R^{2s} R_w r^2)$ additive error and $n^{-1} (\alpha + \epsilon_s)$ relative error between our output and the true answer.
no code implementations • 18 Jul 2024 • Jiuxiang Gu, YIngyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song
Training data privacy is a fundamental problem in modern Artificial Intelligence (AI) applications, such as face recognition, recommendation systems, language generation, and many others, as it may contain sensitive user information related to legal issues.
1 code implementation • 20 Jun 2024 • YIngyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang
Prompting and context-based fine-tuning methods, which we call Prefix Learning, have been proposed to enhance the performance of language models on various downstream tasks.
no code implementations • 30 May 2024 • Zhenmei Shi, Junyi Wei, Zhuoyan Xu, YIngyu Liang
This sheds light on where transformers pay attention to and how that affects ICL.
no code implementations • 26 May 2024 • YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou
Tensor Attention, a multi-view attention that is able to capture high-order correlations among multiple modalities, can overcome the representational limitations of classical matrix attention.
no code implementations • 26 May 2024 • YIngyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou
We prove that if the target distribution is a $k$-mixture of Gaussians, the density of the entire diffusion process will also be a $k$-mixture of Gaussians.
no code implementations • 8 May 2024 • YIngyu Liang, Heshan Liu, Zhenmei Shi, Zhao Song, Zhuoyan Xu, Junze Yin
We then design a fast algorithm to approximate the attention matrix via a sum of such $k$ convolution matrices.
no code implementations • 6 May 2024 • Jiuxiang Gu, Chenyang Li, YIngyu Liang, Zhenmei Shi, Zhao Song
The softmax activation function plays a crucial role in the success of large language models (LLMs), particularly in the self-attention mechanism of the widely adopted Transformer architecture.
1 code implementation • 22 Feb 2024 • Zhuoyan Xu, Zhenmei Shi, Junyi Wei, Fangzhou Mu, Yin Li, YIngyu Liang
An emerging solution with recent success in vision and NLP involves finetuning a foundation model on a selection of relevant tasks, before its adaptation to a target task with limited labeled samples.
no code implementations • 12 Feb 2024 • Chenyang Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Tianyi Zhou
We direct our focus to the complex algebraic learning task of modular addition involving $k$ inputs.
1 code implementation • 9 Aug 2023 • Yiyou Sun, Zhenmei Shi, YIngyu Liang, Yixuan Li
This paper bridges the gap by providing an analytical framework to formalize and investigate when and how known classes can help discover novel classes.
no code implementations • 27 May 2023 • Nils Palumbo, Yang Guo, Xi Wu, Jiefeng Chen, YIngyu Liang, Somesh Jha
Nevertheless, under recent strong adversarial attacks (GMSA, which has been shown to be much more effective than AutoAttack against transduction), Goldwasser et al.'s work was shown to have low performance in a practical deep-learning setting.
1 code implementation • 2 May 2023 • Jiefeng Chen, Jayaram Raghuram, Jihye Choi, Xi Wu, YIngyu Liang, Somesh Jha
We theoretically analyze the stratified rejection setting and propose a novel defense method -- Adversarial Training with Consistent Prediction-based Rejection (CPR) -- for building a robust selective classifier.
1 code implementation • 13 Mar 2023 • Zhenmei Shi, Yifei Ming, Ying Fan, Frederic Sala, YIngyu Liang
In this paper, we propose a simple and effective regularization method based on the nuclear norm of the learned features for domain generalization.
1 code implementation • 28 Feb 2023 • Zhenmei Shi, Jiefeng Chen, Kunyang Li, Jayaram Raghuram, Xi Wu, YIngyu Liang, Somesh Jha
foundation models) has recently become a prevalent learning paradigm, where one first pre-trains a representation using large-scale unlabeled data, and then learns simple predictors on top of the representation using small labeled data from the downstream tasks.
no code implementations • ICLR 2022 • Zhenmei Shi, Junyi Wei, YIngyu Liang
These results provide theoretical evidence showing that feature learning in neural networks depends strongly on the input structure and leads to the superior performance.
no code implementations • 31 Jan 2022 • Xiaomin Zhang, Xucheng Zhang, Po-Ling Loh, YIngyu Liang
Mixtures of ranking models are standard tools for ranking problems.
no code implementations • AAAI Workshop AdvML 2022 • Jiefeng Chen, Jayaram Raghuram, Jihye Choi, Xi Wu, YIngyu Liang, Somesh Jha
Motivated by this metric, we propose novel loss functions and a robust training method -- \textit{stratified adversarial training with rejection} (SATR) -- for a classifier with reject option, where the goal is to accept and correctly-classify small input perturbations, while allowing the rejection of larger input perturbations that cannot be correctly classified.
1 code implementation • ICLR 2022 • Jiefeng Chen, Xi Wu, Yang Guo, YIngyu Liang, Somesh Jha
There has been emerging interest in using transductive learning for adversarial robustness (Goldwasser et al., NeurIPS 2020; Wu et al., ICML 2020; Wang et al., ArXiv 2021).
1 code implementation • 6 Oct 2021 • Mehmet F. Demirel, Shengchao Liu, Siddhant Garg, Zhenmei Shi, YIngyu Liang
Our experiments demonstrate the strong performance of AWARE in graph-level prediction tasks in the standard setting in the domains of molecular property prediction and social networks.
1 code implementation • NeurIPS 2021 • Jiefeng Chen, Frederick Liu, Besim Avci, Xi Wu, YIngyu Liang, Somesh Jha
This observation leads to two challenging tasks: (1) unsupervised accuracy estimation, which aims to estimate the accuracy of a pre-trained classifier on a set of unlabeled test inputs; (2) error detection, which aims to identify mis-classified test inputs.
no code implementations • 15 Jun 2021 • Jiefeng Chen, Yang Guo, Xi Wu, Tianqi Li, Qicheng Lao, YIngyu Liang, Somesh Jha
Compared to traditional "test-time" defenses, these defense mechanisms "dynamically retrain" the model based on test time input via transductive learning; and theoretically, attacking these defenses boils down to bilevel optimization, which seems to raise the difficulty for adaptive attacks.
no code implementations • EACL 2021 • Zhongkai Sun, Prathusha K Sarma, YIngyu Liang, William Sethares
Imposing the style of one image onto another is called style transfer.
1 code implementation • 2 Feb 2021 • Zhenmei Shi, Fuhao Shi, Wei-Sheng Lai, Chia-Kai Liang, YIngyu Liang
We present a deep neural network (DNN) that uses both sensor data (gyroscope) and image content (optical flow) to stabilize videos through unsupervised learning.
no code implementations • 1 Jan 2021 • Xi Wu, Yang Guo, Tianqi Li, Jiefeng Chen, Qicheng Lao, YIngyu Liang, Somesh Jha
On the positive side, we show that, if one is allowed to access the training data, then Domain Adversarial Neural Networks (${\sf DANN}$), an algorithm designed for unsupervised domain adaptation, can provide nontrivial robustness in the test-time maximin threat model against strong transfer attacks and adaptive fixed point attacks.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Zhao Jinman, Shawn Zhong, Xiaomin Zhang, YIngyu Liang
We look into the task of \emph{generalizing} word embeddings: given a set of pre-trained word vectors over a finite vocabulary, the goal is to predict embedding vectors for out-of-vocabulary words, \emph{without} extra contextual information.
1 code implementation • 12 Oct 2020 • Minyi Dai, Mehmet F. Demirel, YIngyu Liang, Jia-Mian Hu
Various machine learning models have been used to predict the properties of polycrystalline materials, but none of them directly consider the physical interactions among neighboring grains despite such microscopic interactions critically determining macroscopic material properties.
Materials Science
no code implementations • 28 Sep 2020 • Jiefeng Chen, Yixuan Li, Xi Wu, YIngyu Liang, Somesh Jha
We show that, by mining informative auxiliary OOD data, one can significantly improve OOD detection performance, and somewhat surprisingly, generalize to unseen adversarial attacks.
Out-of-Distribution Detection Out of Distribution (OOD) Detection
1 code implementation • NeurIPS 2020 • Siddhant Garg, YIngyu Liang
Unsupervised and self-supervised learning approaches have become a crucial tool to learn representations for downstream prediction tasks.
1 code implementation • 4 Aug 2020 • Siddhant Garg, Adarsh Kumar, Vibhor Goel, YIngyu Liang
We introduce adversarial perturbations in the model weights using a composite loss on the predictions of the original model and the desired trigger through projected gradient descent.
no code implementations • 10 Jul 2020 • Yingyu Liang, Hui Yuan
In the setting of entangled single-sample distributions, the goal is to estimate some common parameter shared by a family of $n$ distributions, given one single sample from each distribution.
1 code implementation • 26 Jun 2020 • Jiefeng Chen, Yixuan Li, Xi Wu, YIngyu Liang, Somesh Jha
We show that, by mining informative auxiliary OOD data, one can significantly improve OOD detection performance, and somewhat surprisingly, generalize to unseen adversarial attacks.
Out-of-Distribution Detection Out of Distribution (OOD) Detection
no code implementations • 22 Apr 2020 • Xi Wu, Yang Guo, Jiefeng Chen, YIngyu Liang, Somesh Jha, Prasad Chalasani
Recent studies provide hints and failure examples for domain invariant representation learning, a common approach for this problem, but the explanations provided are somewhat different and do not provide a unified picture.
no code implementations • 20 Apr 2020 • Hui Yuan, YIngyu Liang
We study mean estimation and linear regression under general conditions, and analyze a simple and computationally efficient method based on iteratively trimming samples and re-estimating the parameter on the trimmed sample set.
no code implementations • ICLR 2020 • Fangzhou Mu, YIngyu Liang, Yin Li
We address the challenging problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Siddhant Garg, Rohit Kumar Sharma, YIngyu Liang
In this paper we show that concatenating the embeddings from the pre-trained model with those from a simple sentence embedding model trained only on the target data, can improve over the performance of FT for few-sample tasks.
1 code implementation • AAAI Workshop AdvML 2022 • Jiefeng Chen, Yixuan Li, Xi Wu, YIngyu Liang, Somesh Jha
Formally, we extensively study the problem of Robust Out-of-Distribution Detection on common OOD detection approaches, and show that state-of-the-art OOD detectors can be easily fooled by adding small perturbations to the in-distribution and OOD inputs.
Out-of-Distribution Detection Out of Distribution (OOD) Detection
no code implementations • 23 Feb 2020 • Yingyu Liang, Zhao Song, Mengdi Wang, Lin F. Yang, Xin Yang
We show that our approach obtains small error and is efficient in both space and time.
no code implementations • 13 Nov 2019 • Zhongkai Sun, Prathusha Sarma, William Sethares, YIngyu Liang
Multimodal language analysis often considers relationships between features based on text and those based on acoustical and visual properties.
no code implementations • IJCNLP 2019 • Prathusha K Sarma, YIngyu Liang, William A. Sethares
This paper proposes a way to improve the performance of existing algorithms for text classification in domains with strong language semantics.
1 code implementation • NeurIPS 2019 • Jiefeng Chen, Xi Wu, Vaibhav Rastogi, YIngyu Liang, Somesh Jha
An emerging problem in trustworthy machine learning is to train models that produce robust interpretations for their predictions.
no code implementations • NeurIPS 2019 • Zeyuan Allen-Zhu, Yuanzhi Li, YIngyu Liang
In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations.
no code implementations • 31 Oct 2018 • Hongyang R. Zhang, Vatsal Sharan, Moses Charikar, YIngyu Liang
We consider the tensor completion problem of predicting the missing entries of a tensor.
1 code implementation • EMNLP 2018 • Jinman Zhao, Sidharth Mudgal, YIngyu Liang
We approach the problem of generalizing pre-trained word embeddings beyond fixed-size vocabularies without using additional contextual information.
no code implementations • NeurIPS 2018 • Yuanzhi Li, YIngyu Liang
Neural networks have many successful applications, while much less theoretical understanding has been gained.
1 code implementation • NeurIPS 2019 • Shengchao Liu, Mehmet Furkan Demirel, YIngyu Liang
This paper introduces the N-gram graph, a simple unsupervised representation for molecules.
Ranked #3 on Molecular Property Prediction on QM7
1 code implementation • 20 May 2018 • Jiefeng Chen, Xi Wu, Vaibhav Rastogi, YIngyu Liang, Somesh Jha
We analyze our results in a theoretical framework and offer strong evidence that pixel discretization is unlikely to work on all but the simplest of the datasets.
1 code implementation • ACL 2018 • Mikhail Khodak, Nikunj Saunshi, YIngyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora
Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features.
Ranked #3 on Sentiment Analysis on MPQA
1 code implementation • ACL 2018 • Prathusha K Sarma, YIngyu Liang, William A. Sethares
Generic word embeddings are trained on large-scale generic corpora; Domain Specific (DS) word embeddings are trained only on data from a domain of interest.
no code implementations • 22 Feb 2018 • Yuanzhi Li, YIngyu Liang
Mixtures of Linear Regressions (MLR) is an important mixture model with many applications.
no code implementations • ICML 2017 • Maria-Florina Balcan, Travis Dick, YIngyu Liang, Wenlong Mou, Hongyang Zhang
We study the problem of clustering sensitive data while preserving the privacy of individuals represented in the dataset, which has broad applications in practical machine learning and data analysis tasks.
1 code implementation • ICML 2017 • Yuanzhi Li, YIngyu Liang
Non-negative matrix factorization is a basic tool for decomposing data into the feature and weight matrices under non-negativity constraints, and in practice is often solved in the alternating minimization framework.
no code implementations • 27 Apr 2017 • Maria-Florina Balcan, YIngyu Liang, David P. Woodruff, Hongyang Zhang
This work studies the strong duality of non-convex matrix factorization problems: we show that under certain dual conditions, these problems and its dual have the same optimum.
1 code implementation • ICML 2017 • Sanjeev Arora, Rong Ge, YIngyu Liang, Tengyu Ma, Yi Zhang
We show that training of generative adversarial network (GAN) may not have good generalization properties; e. g., training may appear successful but the trained distribution may be far from target distribution in standard metrics.
no code implementations • 8 Dec 2016 • Nan Du, YIngyu Liang, Maria-Florina Balcan, Manuel Gomez-Rodriguez, Hongyuan Zha, Le Song
A typical viral marketing model identifies influential users in a social network to maximize a single product adoption assuming unlimited user attention, campaign budgets, and time.
no code implementations • NeurIPS 2016 • Yuanzhi Li, YIngyu Liang, Andrej Risteski
Non-negative matrix factorization is a popular tool for decomposing data into feature and weight matrices under non-negativity constraints.
no code implementations • 9 Nov 2016 • Bo Xie, YIngyu Liang, Le Song
In this paper, we answer these questions by analyzing one-hidden-layer neural networks with ReLU activation, and show that despite the non-convexity, neural networks with diverse units have no spurious local minima.
1 code implementation • 13 Oct 2016 • Kiran Vodrahalli, Po-Hsuan Chen, YIngyu Liang, Christopher Baldassano, Janice Chen, Esther Yong, Christopher Honey, Uri Hasson, Peter Ramadge, Ken Norman, Sanjeev Arora
Several research groups have shown how to correlate fMRI responses to the meanings of presented stimuli.
no code implementations • 6 Feb 2016 • Yuanzhi Li, YIngyu Liang, Andrej Risteski
We show that the properties only need to hold in an average sense and can be achieved by the clipping step.
1 code implementation • TACL 2018 • Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski
A novel aspect of our technique is that each extracted word sense is accompanied by one of about 2000 "discourse atoms" that gives a succinct description of which other words co-occur with that word sense.
no code implementations • 18 Nov 2015 • Sanjeev Arora, YIngyu Liang, Tengyu Ma
Under this assumption ---which is experimentally tested on real-life nets like AlexNet--- it is formally proved that feed forward net is a correct inference method for recovering the hidden layer.
no code implementations • NeurIPS 2015 • Bo Xie, YIngyu Liang, Le Song
We propose a simple, computationally efficient, and memory friendly algorithm based on the "doubly stochastic gradients" to scale up a range of kernel nonlinear component analysis, such as kernel PCA, CCA and SVD.
no code implementations • 23 Mar 2015 • Maria-Florina Balcan, YIngyu Liang, Le Song, David Woodruff, Bo Xie
Can we perform kernel PCA on the entire dataset in a distributed and communication efficient fashion while maintaining provable and strong guarantees in solution quality?
4 code implementations • TACL 2016 • Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski
Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods.
no code implementations • NeurIPS 2014 • Nan Du, YIngyu Liang, Maria-Florina F. Balcan, Le Song
Coverage functions are an important class of discrete functions that capture laws of diminishing returns.
no code implementations • NeurIPS 2014 • Maria-Florina Balcan, Vandana Kanchanapally, YIngyu Liang, David Woodruff
We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for $k$-means clustering and related problems.
1 code implementation • NeurIPS 2014 • Bo Dai, Bo Xie, Niao He, YIngyu Liang, Anant Raj, Maria-Florina Balcan, Le Song
The general perception is that kernel methods are not scalable, and neural nets are the methods of choice for nonlinear learning problems.
no code implementations • 9 Apr 2014 • Aurélien Bellet, YIngyu Liang, Alireza Bagheri Garakani, Maria-Florina Balcan, Fei Sha
We further show that the communication cost of dFW is optimal by deriving a lower-bound on the communication cost required to construct an $\epsilon$-approximate solution.
no code implementations • 1 Jan 2014 • Maria-Florina Balcan, YIngyu Liang, Pramod Gupta
One of the most widely used techniques for data clustering is agglomerative clustering.
no code implementations • 8 Dec 2013 • Nan Du, YIngyu Liang, Maria Florina Balcan, Le Song
The typical algorithmic problem in viral marketing aims to identify a set of influential users in a social network, who, when convinced to adopt a product, shall influence other users in the network and trigger a large cascade of adoptions.
no code implementations • NeurIPS 2013 • Maria Florina Balcan, Steven Ehrlich, YIngyu Liang
We provide a distributed method for constructing a global coreset which improves over the previous methods by reducing the communication complexity, and which works over general communication topologies.
no code implementations • 5 Dec 2011 • Maria Florina Balcan, YIngyu Liang
For $k$-median, a center-based objective of special interest, we additionally give algorithms for a more relaxed assumption in which we allow the optimal solution to change in a small $\epsilon$ fraction of the points after perturbation.