Search Results for author: Andrew M. Dai

Found 32 papers, 16 papers with code

Finetuned Language Models Are Zero-Shot Learners

1 code implementation3 Sep 2021 Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le

We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially boosts zero-shot performance on unseen tasks.

Common Sense Reasoning Language Modelling +5

BEDS-Bench: Behavior of EHR-models under Distributional Shift--A Benchmark

1 code implementation17 Jul 2021 Anand Avati, Martin Seneviratne, Emily Xue, Zhen Xu, Balaji Lakshminarayanan, Andrew M. Dai

Most ML approaches focus on generalization performance on unseen data that are similar to the training data (In-Distribution, or IND).

MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records

no code implementations3 Feb 2021 Zhen Xu, David R. So, Andrew M. Dai

One important challenge of applying deep learning to electronic health records (EHR) is the complexity of their multimodal structure.

Neural Architecture Search

Learning to Select Best Forecast Tasks for Clinical Outcome Prediction

no code implementations NeurIPS 2020 Yuan Xue, Nan Du, Anne Mottram, Martin Seneviratne, Andrew M. Dai

The paradigm of pretraining' from a set of relevant auxiliary tasks and thenfinetuning' on a target task has been successfully applied in many different domains.

Meta-Learning

Learnability and Complexity of Quantum Samples

1 code implementation22 Oct 2020 Murphy Yuezhen Niu, Andrew M. Dai, Li Li, Augustus Odena, Zhengli Zhao, Vadim Smelyanskyi, Hartmut Neven, Sergio Boixo

Given a quantum circuit, a quantum computer can sample the output distribution exponentially faster in the number of bits than classical computers.

Training independent subnetworks for robust prediction

1 code implementation ICLR 2021 Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew M. Dai, Dustin Tran

Recent approaches to efficiently ensemble neural networks have shown that strong robustness and uncertainty performance can be achieved with a negligible gain in parameters over the original network.

Learning Unstable Dynamical Systems with Time-Weighted Logarithmic Loss

no code implementations10 Jul 2020 Kamil Nar, Yuan Xue, Andrew M. Dai

When training the parameters of a linear dynamical model, the gradient descent algorithm is likely to fail to converge if the squared-error loss is used as the training loss function.

Flow Contrastive Estimation of Energy-Based Models

1 code implementation CVPR 2020 Ruiqi Gao, Erik Nijkamp, Diederik P. Kingma, Zhen Xu, Andrew M. Dai, Ying Nian Wu

(2) The update of the flow model approximately minimizes the Jensen-Shannon divergence between the flow model and the data distribution.

Image Generation

Modelling EHR timeseries by restricting feature interaction

no code implementations14 Nov 2019 Kun Zhang, Yuan Xue, Gerardo Flores, Alvin Rajkomar, Claire Cui, Andrew M. Dai

Time series data are prevalent in electronic health records, mostly in the form of physiological parameters such as vital signs and lab tests.

Mortality Prediction Time Series

Federated and Differentially Private Learning for Electronic Health Records

no code implementations13 Nov 2019 Stephen R. Pfohl, Andrew M. Dai, Katherine Heller

The use of collaborative and decentralized machine learning techniques such as federated learning have the potential to enable the development and deployment of clinical risk predictions models in low-resource settings without requiring sensitive data be shared or stored in a central repository.

Federated Learning

Capacity, Bandwidth, and Compositionality in Emergent Language Learning

1 code implementation24 Oct 2019 Cinjon Resnick, Abhinav Gupta, Jakob Foerster, Andrew M. Dai, Kyunghyun Cho

In this paper, we investigate the learning biases that affect the efficacy and compositionality of emergent languages.

Systematic Generalization

Learning an Adaptive Learning Rate Schedule

no code implementations20 Sep 2019 Zhen Xu, Andrew M. Dai, Jonas Kemp, Luke Metz

The learning rate is one of the most important hyper-parameters for model training and generalization.

Improved Hierarchical Patient Classification with Language Model Pretraining over Clinical Notes

1 code implementation6 Sep 2019 Jonas Kemp, Alvin Rajkomar, Andrew M. Dai

Clinical notes in electronic health records contain highly heterogeneous writing styles, including non-standard terminology or abbreviations.

Classification General Classification +1

Learning the Graphical Structure of Electronic Health Records with Graph Convolutional Transformer

2 code implementations11 Jun 2019 Edward Choi, Zhen Xu, Yujia Li, Michael W. Dusenberry, Gerardo Flores, Yuan Xue, Andrew M. Dai

A recent study showed that using the graphical structure underlying EHR data (e. g. relationship between diagnoses and treatments) improves the performance of prediction tasks such as heart failure prediction.

Graph Reconstruction Readmission Prediction +1

Analyzing the Role of Model Uncertainty for Electronic Health Records

no code implementations10 Jun 2019 Michael W. Dusenberry, Dustin Tran, Edward Choi, Jonas Kemp, Jeremy Nixon, Ghassen Jerfel, Katherine Heller, Andrew M. Dai

We further show that RNNs with only Bayesian embeddings can be a more efficient way to capture model uncertainty compared to ensembles, and we analyze how model uncertainty is impacted across individual input features and patient subgroups.

Gmail Smart Compose: Real-Time Assisted Writing

no code implementations17 May 2019 Mia Xu Chen, Benjamin N Lee, Gagan Bansal, Yuan Cao, Shuyuan Zhang, Justin Lu, Jackie Tsay, Yinan Wang, Andrew M. Dai, Zhifeng Chen, Timothy Sohn, Yonghui Wu

In this paper, we present Smart Compose, a novel system for generating interactive, real-time suggestions in Gmail that assists users in writing mails by reducing repetitive typing.

Language Modelling Model Selection

Music Transformer

8 code implementations ICLR 2019 Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck

This is impractical for long sequences such as musical compositions since their memory complexity for intermediate relative information is quadratic in the sequence length.

Music Modeling

Peptide-Spectra Matching from Weak Supervision

no code implementations20 Aug 2018 Samuel S. Schoenholz, Sean Hackett, Laura Deming, Eugene Melamud, Navdeep Jaitly, Fiona McAllister, Jonathon O'Brien, George Dahl, Bryson Bennett, Andrew M. Dai, Daphne Koller

As in many other scientific domains, we face a fundamental problem when using machine learning to identify proteins from mass spectrometry data: large ground truth datasets mapping inputs to correct outputs are extremely difficult to obtain.

Embedding Text in Hyperbolic Spaces

no code implementations WS 2018 Bhuwan Dhingra, Christopher J. Shallue, Mohammad Norouzi, Andrew M. Dai, George E. Dahl

Ideally, we could incorporate our prior knowledge of this hierarchical structure into unsupervised learning algorithms that work on text data.

Hierarchical structure Sentence Embeddings

A Goal-oriented Neural Conversation Model by Self-Play

no code implementations ICLR 2018 Wei Wei, Quoc V. Le, Andrew M. Dai, Li-Jia Li

One challenge in applying such techniques to building goal-oriented conversation models is that maximum likelihood-based models are not optimized toward accomplishing goals.

Language Modelling Natural Language Understanding

MaskGAN: Better Text Generation via Filling in the _______

no code implementations ICLR 2018 William Fedus, Ian Goodfellow, Andrew M. Dai

Neural autoregressive and seq2seq models that generate text by sampling words sequentially, with each word conditioned on the previous model, are state-of-the-art for several machine translation and summarization benchmarks.

Image Generation Machine Translation +3

Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step

1 code implementation ICLR 2018 William Fedus, Mihaela Rosca, Balaji Lakshminarayanan, Andrew M. Dai, Shakir Mohamed, Ian Goodfellow

Unlike other generative models, the data distribution is learned via a game between a generator (the generative model) and a discriminator (a teacher providing training signal) that each minimize their own cost.

Who Said What: Modeling Individual Labelers Improves Classification

1 code implementation26 Mar 2017 Melody Y. Guan, Varun Gulshan, Andrew M. Dai, Geoffrey E. Hinton

We also show that our method performs better than competing algorithms by Welinder and Perona (2010), and by Mnih and Hinton (2012).

Classification General Classification

Adversarial Training Methods for Semi-Supervised Text Classification

4 code implementations25 May 2016 Takeru Miyato, Andrew M. Dai, Ian Goodfellow

Adversarial training provides a means of regularizing supervised learning algorithms while virtual adversarial training is able to extend supervised learning algorithms to the semi-supervised setting.

Classification General Classification +4

Generating Sentences from a Continuous Space

11 code implementations CONLL 2016 Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio

The standard recurrent neural network language model (RNNLM) generates sentences one word at a time and does not work from an explicit global sentence representation.

Language Modelling

Semi-supervised Sequence Learning

169 code implementations NeurIPS 2015 Andrew M. Dai, Quoc V. Le

In our experiments, we find that long short term memory recurrent networks after being pretrained with the two approaches are more stable and generalize better.

Language Modelling Text Classification

Document Embedding with Paragraph Vectors

5 code implementations29 Jul 2015 Andrew M. Dai, Christopher Olah, Quoc V. Le

Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts.

Document Embedding Sentiment Analysis +1

The supervised hierarchical Dirichlet process

no code implementations17 Dec 2014 Andrew M. Dai, Amos J. Storkey

However, until now, Hierarchical Dirichlet Process (HDP) mixtures have not seen significant use in supervised problems with grouped data since a straightforward application of the HDP on the grouped data results in learnt clusters that are not predictive of the responses.

Cannot find the paper you are looking for? You can Submit a new open access paper.