1 code implementation • 31 Jan 2023 • Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, Adam Roberts
We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022).
1 code implementation • 21 Oct 2022 • Ziqi Wang, Yuexin Wu, Frederick Liu, Daogao Liu, Le Hou, Hongkun Yu, Jing Li, Heng Ji
However, these data augmentation methods either potentially cause shifts in decision boundaries (representation interpolation), are not expressive enough (token replacement), or introduce too much computational overhead (augmentation with models).
no code implementations • 20 Oct 2022 • Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han
We show that our approach improves the general reasoning ability of a 540B-parameter LLM (74. 4%->82. 1% on GSM8K, 78. 2%->83. 0% on DROP, 90. 0%->94. 4% on OpenBookQA, and 63. 4%->67. 9% on ANLI-A3) and achieves state-of-the-art-level performance, without any ground truth label.
Ranked #1 on
Natural Language Inference
on ANLI-A3
1 code implementation • 20 Oct 2022 • Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei
We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation).
Ranked #1 on
Multi-task Language Understanding
on BBH-nlp
Cross-Lingual Question Answering
Multi-task Language Understanding
+1
no code implementations • 21 May 2022 • Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi
Although chain-of-thought prompting has shown impressive results on many natural language reasoning tasks, it often performs poorly on tasks which need to solve problems harder than the demonstration examples.
Ranked #14 on
Arithmetic Reasoning
on GSM8K
no code implementations • ACL 2022 • Le Hou, Richard Yuanzhe Pang, Tianyi Zhou, Yuexin Wu, Xinying Song, Xiaodan Song, Denny Zhou
Transformer-based models generally allocate the same amount of computation for each token in a given sequence.
no code implementations • 8 Oct 2021 • Shuo Yang, Le Hou, Xiaodan Song, Qiang Liu, Denny Zhou
Our approach exploits the special structure of BERT that contains a stack of repeated modules (i. e., transformer encoders).
no code implementations • 1 Jan 2021 • Shuo Yang, Le Hou, Xiaodan Song, Qiang Liu, Denny Zhou
It has been widely observed that increasing deep learning model sizes often leads to significant performance improvements on a variety of natural language processing and computer vision tasks.
4 code implementations • 5 Mar 2020 • Noam Shazeer, Zhenzhong Lan, Youlong Cheng, Nan Ding, Le Hou
We introduce "talking-heads attention" - a variation on multi-head attention which includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation. While inserting only a small number of additional parameters and a moderate amount of additionalcomputation, talking-heads attention leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.
1 code implementation • 18 Feb 2020 • Le Hou, Rajarsi Gupta, John S. Van Arnam, Yuwei Zhang, Kaustubh Sivalenka, Dimitris Samaras, Tahsin M. Kurc, Joel H. Saltz
To address this, we developed an analysis pipeline that segments nuclei in whole slide tissue images from multiple cancer types with a quality control process.
no code implementations • 26 Sep 2019 • Robert M. Patton, J. Travis Johnston, Steven R. Young, Catherine D. Schuman, Thomas E. Potok, Derek C. Rose, Seung-Hwan Lim, Junghoon Chae, Le Hou, Shahira Abousamra, Dimitris Samaras, Joel Saltz
Using MENNDL--an HPC-enabled software stack for neural architecture search--we generate a neural network with comparable accuracy to state-of-the-art networks on a cancer pathology dataset that is also $16\times$ faster at inference.
1 code implementation • 6 Sep 2019 • Le Hou, Youlong Cheng, Noam Shazeer, Niki Parmar, Yeqing Li, Panagiotis Korfiatis, Travis M. Drucker, Daniel J. Blezek, Xiaodan Song
It is infeasible to train CNN models directly on such high resolution images, because neural activations of a single image do not fit in the memory of a single GPU/TPU, and naive data and model parallelism approaches do not work.
no code implementations • 9 Jul 2019 • Shahira Abousamra, Le Hou, Rajarsi Gupta, Chao Chen, Dimitris Samaras, Tahsin Kurc, Rebecca Batiste, Tianhao Zhao, Shroyer Kenneth, Joel Saltz
This allows for a much larger training set, that reflects visual variability across multiple cancer types and thus training of a single network which can be automatically applied to each cancer type without human adjustment.
no code implementations • CVPR 2019 • Le Hou, Ayush Agarwal, Dimitris Samaras, Tahsin M. Kurc, Rajarsi R. Gupta, Joel H. Saltz
In addition, we propose a hybrid synthesis pipeline that utilizes textures in real histopathology patches and GAN models, to tackle heterogeneity in tissue textures.
1 code implementation • 26 May 2019 • Han Le, Rajarsi Gupta, Le Hou, Shahira Abousamra, Danielle Fassler, Tahsin Kurc, Dimitris Samaras, Rebecca Batiste, Tianhao Zhao, Arvind Rao, Alison L. Van Dyke, ASHISH SHARMA, Erich Bremer, Jonas S. Almeida, Joel Saltz
Quantitative assessment of Tumor-TIL spatial relationships is increasingly important in both basic science and clinical aspects of breast cancer research.
no code implementations • ICLR 2019 • Kolya Malkin, Caleb Robinson, Le Hou, Nebojsa Jojic
We present a deep learning-based method for super-resolving coarse (low-resolution) labels assigned to groups of image pixels into pixel-level (high-resolution) labels, given the joint distribution between those low- and high-resolution labels.
no code implementations • 9 Apr 2019 • Maozheng Zhao, Le Hou, Han Le, Dimitris Samaras, Nebojsa Jojic, Danielle Fassler, Tahsin Kurc, Rajarsi Gupta, Kolya Malkin, Shroyer Kenneth, Joel Saltz
On the other hand, collecting low resolution labels (labels for a block of pixels) for these high resolution images is much more cost efficient.
no code implementations • 13 Dec 2017 • Le Hou, Ayush Agarwal, Dimitris Samaras, Tahsin M. Kurc, Rajarsi R. Gupta, Joel H. Saltz
We propose a unified pipeline that: a) generates a set of initial synthetic histopathology images with paired information about the nuclei such as segmentation masks; b) refines the initial synthetic images through a Generative Adversarial Network (GAN) to reference styles; c) trains a task-specific CNN and boosts the performance of the task-specific CNN with on-the-fly generated adversarial examples.
no code implementations • 3 Apr 2017 • Le Hou, Vu Nguyen, Dimitris Samaras, Tahsin M. Kurc, Yi Gao, Tianhao Zhao, Joel H. Saltz
In this work, we propose a sparse Convolutional Autoencoder (CAE) for fully unsupervised, simultaneous nucleus detection and feature extraction in histopathology tissue images.
no code implementations • 20 Dec 2016 • Veda Murthy, Le Hou, Dimitris Samaras, Tahsin M. Kurc, Joel H. Saltz
Classifying the various shapes and attributes of a glioma cell nucleus is crucial for diagnosis and understanding the disease.
3 code implementations • 17 Nov 2016 • Le Hou, Chen-Ping Yu, Dimitris Samaras
In this work, we propose to leverage these relationships between classes by training deep nets with the exact squared Earth Mover's Distance (also known as Wasserstein distance) for single-label classification.
no code implementations • 23 Aug 2016 • Le Hou, Dimitris Samaras, Tahsin M. Kurc, Yi Gao, Joel H. Saltz
In this paper, we propose and apply AAFs on feedforward NNs for regression tasks.
1 code implementation • CVPR 2016 • Le Hou, Dimitris Samaras, Tahsin M. Kurc, Yi Gao, James E. Davis, Joel H. Saltz
However, to recognize cancer subtypes automatically, training a CNN on gigapixel resolution Whole Slide Tissue Images (WSI) is currently computationally impossible.