Search Results for author: Joel Hestness

Found 20 papers, 8 papers with code

MediSwift: Efficient Sparse Pre-trained Biomedical Language Models

no code implementations1 Mar 2024 Vithursan Thangarasa, Mahmoud Salem, Shreyas Saxena, Kevin Leong, Joel Hestness, Sean Lie

Large language models (LLMs) are typically trained on general source data for various domains, but a recent surge in domain-specific LLMs has shown their potential to outperform general-purpose models in domain-specific tasks (e. g., biomedicine).

Question Answering

Position Interpolation Improves ALiBi Extrapolation

1 code implementation18 Oct 2023 Faisal Al-Khateeb, Nolan Dey, Daria Soboleva, Joel Hestness

Linear position interpolation helps pre-trained models using rotary position embeddings (RoPE) to extrapolate to longer sequence lengths.

Language Modelling Position +1

SlimPajama-DC: Understanding Data Combinations for LLM Training

1 code implementation19 Sep 2023 Zhiqiang Shen, Tianhua Tao, Liqun Ma, Willie Neiswanger, Zhengzhong Liu, Hongyi Wang, Bowen Tan, Joel Hestness, Natalia Vassilieva, Daria Soboleva, Eric Xing

This paper aims to understand the impacts of various data combinations (e. g., web text, Wikipedia, GitHub, books) on the pretraining of large language models using SlimPajama.

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

2 code implementations6 Apr 2023 Nolan Dey, Gurpreet Gosal, Zhiming, Chen, Hemant Khachane, William Marshall, Ribhu Pathria, Marvin Tom, Joel Hestness

We study recent research advances that improve large language models through efficient pre-training and scaling, and open datasets and tools.

RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network

1 code implementation28 Jun 2022 Vitaliy Chiley, Vithursan Thangarasa, Abhay Gupta, Anshul Samar, Joel Hestness, Dennis Decoste

However, training them requires substantial accelerator memory for saving large, multi-resolution activations.

Ranked #314 on Image Classification on ImageNet (using extra training data)

General Classification Image Classification +2

Time Dependency, Data Flow, and Competitive Advantage

no code implementations17 Mar 2022 Ehsan Valavi, Joel Hestness, Marco Iansiti, Newsha Ardalani, Feng Zhu, Karim R. Lakhani

Relating the text topics to various business areas of interest, we argue that competing in a business area in which data value decays rapidly alters strategies to acquire competitive advantage.

Time and the Value of Data

no code implementations17 Mar 2022 Ehsan Valavi, Joel Hestness, Newsha Ardalani, Marco Iansiti

In addition, we argue that increasing the stock of data by including older datasets may, in fact, damage the model's accuracy.

BIG-bench Machine Learning

Efficiently Disentangle Causal Representations

1 code implementation6 Jan 2022 Yuanpeng Li, Joel Hestness, Mohamed Elhoseiny, Liang Zhao, Kenneth Church

This paper proposes an efficient approach to learning disentangled representations with causal mechanisms based on the difference of conditional probabilities in original and new distributions.

Memory Efficient 3D U-Net with Reversible Mobile Inverted Bottlenecks for Brain Tumor Segmentation

no code implementations19 Apr 2021 Mihir Pendse, Vithursan Thangarasa, Vitaliy Chiley, Ryan Holmdahl, Joel Hestness, Dennis Decoste

The inverted residual bottleneck block uses lightweight depthwise separable convolutions to reduce computation by decomposing convolutions into a pointwise convolution and a depthwise convolution.

Brain Tumor Segmentation Tumor Segmentation

Gradient Descent Resists Compositionality

no code implementations1 Jan 2021 Yuanpeng Li, Liang Zhao, Joel Hestness, Kenneth Church, Mohamed Elhoseiny

In this paper, we argue that gradient descent is one of the reasons that make compositionality learning hard during neural network optimization.

Transferability of Compositionality

no code implementations1 Jan 2021 Yuanpeng Li, Liang Zhao, Joel Hestness, Ka Yee Lun, Kenneth Church, Mohamed Elhoseiny

To our best knowledge, this is the first work to focus on the transferability of compositionality, and it is orthogonal to existing efforts of learning compositional representations in training distribution.

Out-of-Distribution Generalization

Compositional Generalization for Primitive Substitutions

1 code implementation IJCNLP 2019 Yuanpeng Li, Liang Zhao, Jian-Yu Wang, Joel Hestness

Compositional generalization is a basic mechanism in human language learning, but current neural networks lack such ability.

Few-Shot Learning Machine Translation +2

Beyond Human-Level Accuracy: Computational Challenges in Deep Learning

1 code implementation3 Sep 2019 Joel Hestness, Newsha Ardalani, Greg Diamos

However, recent prior work shows that as dataset sizes grow, DL model accuracy and model size grow predictably.

Empirically Characterizing Overparameterization Impact on Convergence

no code implementations ICLR 2019 Newsha Ardalani, Joel Hestness, Gregory Diamos

A long-held conventional wisdom states that larger models train more slowly when using gradient descent.

A Proposed Hierarchy of Deep Learning Tasks

no code implementations27 Sep 2018 Joel Hestness, Sharan Narang, Newsha Ardalani, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou, Gregory Diamos, Kenneth Church

As the pace of deep learning innovation accelerates, it becomes increasingly important to organize the space of problems by relative difficultly.

Deep Learning Scaling is Predictable, Empirically

no code implementations1 Dec 2017 Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou

As DL application domains grow, we would like a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements to advance the state-of-the-art.

Language Modelling Machine Translation +3

Cannot find the paper you are looking for? You can Submit a new open access paper.