Search Results for author: Ian En-Hsu Yen

Found 18 papers, 3 papers with code

FP8-BERT: Post-Training Quantization for Transformer

no code implementations10 Dec 2023 Jianwei Li, Tianchi Zhang, Ian En-Hsu Yen, Dongkuan Xu

Transformer-based models, such as BERT, have been widely applied in a wide range of natural language processing tasks.


S4: a High-sparsity, High-performance AI Accelerator

no code implementations16 Jul 2022 Ian En-Hsu Yen, Zhibin Xiao, Dongkuan Xu

And the degree of sparsity one can exploit has become higher as larger model sizes have been considered along with the trend of pre-training giant models.

Quantization Vocal Bursts Intensity Prediction

Efficient Global String Kernel with Random Features: Beyond Counting Substructures

no code implementations25 Nov 2019 Lingfei Wu, Ian En-Hsu Yen, Siyu Huo, Liang Zhao, Kun Xu, Liang Ma, Shouling Ji, Charu Aggarwal

In this paper, we present a new class of global string kernels that aims to (i) discover global properties hidden in the strings through global alignments, (ii) maintain positive-definiteness of the kernel, without introducing a diagonal dominant kernel matrix, and (iii) have a training cost linear with respect to not only the length of the string but also the number of training string samples.

From Node Embedding to Graph Embedding: Scalable Global Graph Kernel via Random Features

no code implementations NIPS 2018 2018 Lingfei Wu, Ian En-Hsu Yen, Kun Xu, Liang Zhao, Yinglong Xia, Michael Witbrock

Graph kernels are one of the most important methods for graph data analysis and have been successfully applied in diverse applications.

Graph Embedding

Random Warping Series: A Random Features Method for Time-Series Embedding

1 code implementation14 Sep 2018 Lingfei Wu, Ian En-Hsu Yen, Jin-Feng Yi, Fangli Xu, Qi Lei, Michael Witbrock

The proposed kernel does not suffer from the issue of diagonal dominance while naturally enjoys a \emph{Random Features} (RF) approximation, which reduces the computational complexity of existing DTW-based techniques from quadratic to linear in terms of both the number and the length of time-series.

Clustering Dynamic Time Warping +2

Loss Decomposition for Fast Learning in Large Output Spaces

no code implementations ICML 2018 Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, Pradeep Ravikumar

For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space.

Word Embeddings

D2KE: From Distance to Kernel and Embedding

no code implementations14 Feb 2018 Lingfei Wu, Ian En-Hsu Yen, Fangli Xu, Pradeep Ravikumar, Michael Witbrock

For many machine learning problem settings, particularly with structured inputs such as sequences or sets of objects, a distance measure between inputs can be specified more naturally than a feature representation.

Time Series Analysis

Doubly Greedy Primal-Dual Coordinate Descent for Sparse Empirical Risk Minimization

no code implementations ICML 2017 Qi Lei, Ian En-Hsu Yen, Chao-yuan Wu, Inderjit S. Dhillon, Pradeep Ravikumar

We consider the popular problem of sparse empirical risk minimization with linear predictors and a large number of both features and observations.

Latent Feature Lasso

no code implementations ICML 2017 Ian En-Hsu Yen, Wei-Cheng Lee, Sung-En Chang, Arun Sai Suggala, Shou-De Lin, Pradeep Ravikumar

The latent feature model (LFM), proposed in (Griffiths \& Ghahramani, 2005), but possibly with earlier origins, is a generalization of a mixture model, where each instance is generated not from a single latent class but from a combination of latent features.

Dual Decomposed Learning with Factorwise Oracle for Structural SVM of Large Output Domain

no code implementations NeurIPS 2016 Ian En-Hsu Yen, Xiangru Huang, Kai Zhong, Ruohan Zhang, Pradeep K. Ravikumar, Inderjit S. Dhillon

In this work, we show that, by decomposing training of Structural Support Vector Machine (SVM) into a series of multiclass SVM problems connected through messages, one can replace expensive structured oracle with Factorwise Maximization Oracle (FMO) that allows efficient implementation of complexity sublinear to the factor domain.

PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification

1 code implementation ICML 2016 Ian En-Hsu Yen, Xiangru Huang, Pradeep Ravikumar, Kai Zhong, Inderjit S. Dhillon

In this work, we show that a margin-maximizing loss with l1 penalty, in case of Extreme Classification, yields extremely sparse solution both in primal and in dual without sacrificing the expressive power of predictor.

General Classification Text Classification

Sparse Linear Programming via Primal and Dual Augmented Coordinate Descent

no code implementations NeurIPS 2015 Ian En-Hsu Yen, Kai Zhong, Cho-Jui Hsieh, Pradeep K. Ravikumar, Inderjit S. Dhillon

Over the past decades, Linear Programming (LP) has been widely used in different areas and considered as one of the mature technologies in numerical optimization.

A Dual Augmented Block Minimization Framework for Learning with Limited Memory

no code implementations NeurIPS 2015 Ian En-Hsu Yen, Shan-Wei Lin, Shou-De Lin

In past few years, several techniques have been proposed for training of linear Support Vector Machine (SVM) in limited-memory setting, where a dual block-coordinate descent (dual-BCD) method was used to balance cost spent on I/O and computation.

Constant Nullspace Strong Convexity and Fast Convergence of Proximal Methods under High-Dimensional Settings

no code implementations NeurIPS 2014 Ian En-Hsu Yen, Cho-Jui Hsieh, Pradeep K. Ravikumar, Inderjit S. Dhillon

State of the art statistical estimators for high-dimensional problems take the form of regularized, and hence non-smooth, convex programs.

Sparse Random Feature Algorithm as Coordinate Descent in Hilbert Space

no code implementations NeurIPS 2014 Ian En-Hsu Yen, Ting-Wei Lin, Shou-De Lin, Pradeep K. Ravikumar, Inderjit S. Dhillon

In this paper, we propose a Sparse Random Feature algorithm, which learns a sparse non-linear predictor by minimizing an $\ell_1$-regularized objective function over the Hilbert Space induced from kernel function.

Cannot find the paper you are looking for? You can Submit a new open access paper.