Language Models

OPT

Introduced by Zhang et al. in OPT: Open Pre-trained Transformer Language Models

OPT is a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters. The model uses an AdamW optimizer and weight decay of 0.1. It follows a linear learning rate schedule, warming up from 0 to the maximum learning rate over the first 2000 steps in OPT-175B, or over 375M tokens in the smaller models, and decaying down to 10% of the maximum LR over 300B tokens. The batch sizes range from 0.5M to 4M depending on the model size and is kept constant throughout the course of training.

Source: OPT: Open Pre-trained Transformer Language Models

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 38 9.95%
Quantization 22 5.76%
Large Language Model 15 3.93%
Question Answering 13 3.40%
In-Context Learning 12 3.14%
Text Generation 9 2.36%
Retrieval 7 1.83%
Translation 7 1.83%
Sentence 6 1.57%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories