Search Results for author: Patrick H. Chen

Found 10 papers, 1 papers with code

ELIAS: End-to-End Learning to Index and Search in Large Output Spaces

1 code implementation16 Oct 2022 Nilesh Gupta, Patrick H. Chen, Hsiang-Fu Yu, Cho-Jui Hsieh, Inderjit S Dhillon

A popular approach for dealing with the large label space is to arrange the labels into a shallow tree-based index and then learn an ML model to efficiently search this index via beam search.

Extreme Multi-Label Classification

FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search

no code implementations22 Jun 2022 Patrick H. Chen, Chang Wei-cheng, Yu Hsiang-fu, Inderjit S. Dhillon, Hsieh Cho-jui

Approximate K-Nearest Neighbor Search (AKNNS) has now become ubiquitous in modern applications, for example, as a fast search procedure with two tower deep learning models.

Overcoming Catastrophic Forgetting by Generative Regularization

no code implementations3 Dec 2019 Patrick H. Chen, Wei Wei, Cho-Jui Hsieh, Bo Dai

In this paper, we propose a new method to overcome catastrophic forgetting by adding generative regularization to Bayesian inference framework.

Bayesian Inference Continual Learning

MulCode: A Multiplicative Multi-way Model for Compressing Neural Language Model

no code implementations IJCNLP 2019 Yukun Ma, Patrick H. Chen, Cho-Jui Hsieh

For example, input embedding and Softmax matrices in IWSLT-2014 German-to-English data set account for more than 80{\%} of the total model parameters.

Language Modelling Machine Translation +2

LEARNING TO LEARN WITH BETTER CONVERGENCE

no code implementations25 Sep 2019 Patrick H. Chen, Sashank Reddi, Sanjiv Kumar, Cho-Jui Hsieh

We consider the learning to learn problem, where the goal is to leverage deeplearning models to automatically learn (iterative) optimization algorithms for training machine learning models.

Efficient Contextual Representation Learning Without Softmax Layer

no code implementations28 Feb 2019 Liunian Harold Li, Patrick H. Chen, Cho-Jui Hsieh, Kai-Wei Chang

Our framework reduces the time spent on the output layer to a negligible level, eliminates almost all the trainable parameters of the softmax layer and performs language modeling without truncating the vocabulary.

Dimensionality Reduction Language Modelling +2

Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks

no code implementations ICLR 2019 Patrick H. Chen, Si Si, Sanjiv Kumar, Yang Li, Cho-Jui Hsieh

The algorithm achieves an order of magnitude faster inference than the original softmax layer for predicting top-$k$ words in various tasks such as beam search in machine translation or next words prediction.

Clustering Machine Translation +1

GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking

no code implementations NeurIPS 2018 Patrick H. Chen, Si Si, Yang Li, Ciprian Chelba, Cho-Jui Hsieh

Model compression is essential for serving large deep neural nets on devices with limited resources or applications that require real-time responses.

Language Modelling Model Compression +1

A comparison of second-order methods for deep convolutional neural networks

no code implementations ICLR 2018 Patrick H. Chen, Cho-Jui Hsieh

Despite many second-order methods have been proposed to train neural networks, most of the results were done on smaller single layer fully connected networks, so we still cannot conclude whether it's useful in training deep convolutional networks.

Second-order methods

Cannot find the paper you are looking for? You can Submit a new open access paper.