OPT

Introduced by Zhang et al. in OPT: Open Pre-trained Transformer Language Models

OPT is a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters. The model uses an AdamW optimizer and weight decay of 0.1. It follows a linear learning rate schedule, warming up from 0 to the maximum learning rate over the first 2000 steps in OPT-175B, or over 375M tokens in the smaller models, and decaying down to 10% of the maximum LR over 300B tokens. The batch sizes range from 0.5M to 4M depending on the model size and is kept constant throughout the course of training.

Source: OPT: Open Pre-trained Transformer Language Models

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	30	10.60%
Quantization	15	5.30%
Question Answering	13	4.59%
Large Language Model	10	3.53%
In-Context Learning	8	2.83%
Retrieval	6	2.12%
Object Detection	6	2.12%
Text Generation	6	2.12%
Translation	6	2.12%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Language Models

OPT

Papers

Tasks

Usage Over Time

Components

Categories Edit Add Remove

Categories

Add Remove