Methods > Natural Language Processing > Transformers

GPT-3

Introduced by Brown et al. in Language Models are Few-Shot Learners

GPT-3 is an autoregressive transformer model with 175 billion parameters. It uses the same architecture/model as GPT-2, including the modified initialization, pre-normalization, and reversible tokenization, with the exception that GPT-3 uses alternating dense and locally banded sparse attention patterns in the layers of the transformer, similar to the Sparse Transformer.

Source: Language Models are Few-Shot Learners

Latest Papers

PAPER DATE
One Model to Rule them All: Towards Zero-Shot Learning for Databases
Benjamin HilprechtCarsten Binnig
2021-05-03
Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks
Tatyana IazykovaDenis KapelyushnikOlga BystrovaAndrey Kutuzov
2021-05-03
Entailment as Few-Shot Learner
Sinong WangHan FangMadian KhabsaHanzi MaoHao Ma
2021-04-29
PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Wei ZengXiaozhe RenTeng SuHui WangYi LiaoZhiwei WangXin JiangZhenZhang YangKaisheng WangXiaoda ZhangChen LiZiyan GongYifan YaoXinjing HuangJun WangJianfeng YuQi GuoYue YuYan ZhangJin WangHengtao TaoDasen YanZexuan YiFang PengFangqing JiangHan ZhangLingfeng DengYehong ZhangZhe LinChao ZhangShaojie ZhangMingyue GuoShanzhi GuGaojun FanYaoWei WangXuefeng JinQun LiuYonghong Tian
2021-04-26
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation
Kang Min YooDongju ParkJaewook KangSang-Woo LeeWoomyeong Park
2021-04-18
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao LuMax BartoloAlastair MooreSebastian RiedelPontus Stenetorp
2021-04-18
Natural Instructions: Benchmarking Generalization to New Tasks from Natural Language Instructions
Swaroop MishraDaniel KhashabiChitta BaralHannaneh Hajishirzi
2021-04-18
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
Tianyu LiuYizhe ZhangChris BrockettYi MaoZhifang SuiWeizhu ChenBill Dolan
2021-04-18
The Power of Scale for Parameter-Efficient Prompt Tuning
| Brian LesterRami Al-RfouNoah Constant
2021-04-18
An Adversarially-Learned Turing Test for Dialog Generation Models
| Xiang GaoYizhe ZhangMichel GalleyBill Dolan
2021-04-16
Text2App: A Framework for Creating Android Apps from Text Descriptions
| Masum HasanKazi Sajeed MehrabWasi Uddin AhmadRifat Shahriyar
2021-04-16
Surface Form Competition: Why the Highest Probability Answer Isn't Always Right
| Ari HoltzmanPeter WestVered SchwartzYejin ChoiLuke Zettlemoyer
2021-04-16
Meta-tuning Language Models to Answer Prompts Better
Ruiqi ZhongKristy LeeZheng ZhangDan Klein
2021-04-10
Automatic Graph Partitioning for Very Large-scale Deep Learning
Masahiro TanakaKenjiro TauraToshihiro HanawaKentaro Torisawa
2021-03-30
Detecting Hate Speech with GPT-3
| Ke-Li ChiuRohan Alexander
2021-03-23
Calibrate Before Use: Improving Few-Shot Performance of Language Models
| Tony Z. ZhaoEric WallaceShi FengDan KleinSameer Singh
2021-02-19
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
Zhuohan LiSiyuan ZhuangShiyuan GuoDanyang ZhuoHao ZhangDawn SongIon Stoica
2021-02-16
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
Laria ReynoldsKyle McDonell
2021-02-15
Multiversal views on language models
Laria ReynoldsKyle McDonell
2021-02-12
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers
Chaoyang HeShen LiMahdi SoltanolkotabiSalman Avestimehr
2021-02-05
Understanding Emails and Drafting Responses -- An Approach Using GPT-3
Jonas ThiergartStefan HuberThomas Übellacker
2021-02-05
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models
Alex TamkinMiles BrundageJack ClarkDeep Ganguli
2021-02-04
"Is depression related to cannabis?": A knowledge-infused model for Entity and Relation Extraction with Limited Supervision
Kaushik RoyUsha LokalaVedant KhandelwalAmit Sheth
2021-02-01
Persistent Anti-Muslim Bias in Large Language Models
Abubakar AbidMaheen FarooqiJames Zou
2021-01-14
How Multipurpose Are Language Models?
Anonymous
2021-01-01
Making Pre-trained Language Models Better Few-shot Learners
| Tianyu GaoAdam FischDanqi Chen
2020-12-31
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
| Leo GaoStella BidermanSid BlackLaurence GoldingTravis HoppeCharles FosterJason PhangHorace HeAnish ThiteNoa NabeshimaShawn PresserConnor Leahy
2020-12-31
Revisiting Linformer with a modified self-attention with linear complexity
Madhusudan Verma
2020-12-16
Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment
Julien LaunayIacopo PoliKilian MüllerGustave ParienteIgor CarronLaurent DaudetFlorent KrzakalaSylvain Gigan
2020-12-11
CPM: A Large-scale Generative Chinese Pre-trained Language Model
| Zhengyan ZhangXu HanHao ZhouPei KeYuxian GuDeming YeYujia QinYusheng SuHaozhe JiJian GuanFanchao QiXiaozhi WangYanan ZhengGuoyang ZengHuanqi CaoShengqi ChenDaixuan LiZhenbo SunZhiyuan LiuMinlie HuangWentao HanJie TangJuanzi LiXiaoyan ZhuMaosong Sun
2020-12-01
Do Fine-tuned Commonsense Language Models Really Generalize?
Mayank KejriwalKe Shen
2020-11-18
COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs
Jena D. HwangChandra BhagavatulaRonan Le BrasJeff DaKeisuke SakaguchiAntoine BosselutYejin Choi
2020-10-12
Toward a Thermodynamics of Meaning
| Jonathan Scott Enderle
2020-09-24
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
| Timo SchickHinrich Schütze
2020-09-15
The Radicalization Risks of GPT-3 and Advanced Neural Language Models
Kris McGuffieAlex Newhouse
2020-09-15
Unit Test Case Generation with Transformers
| Michele TufanoDawn DrainAlexey SvyatkovskiyShao Kun DengNeel Sundaresan
2020-09-11
Measuring Massive Multitask Language Understanding
| Dan HendrycksCollin BurnsSteven BasartAndy ZouMantas MazeikaDawn SongJacob Steinhardt
2020-09-07
Discrete Word Embedding for Logical Natural Language Understanding
Masataro AsaiZilu Tang
2020-08-26
Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems
Andrea MadottoZihan LiuZhaojiang LinPascale Fung
2020-08-14
Language Models are Few-Shot Learners
| Tom B. BrownBenjamin MannNick RyderMelanie SubbiahJared KaplanPrafulla DhariwalArvind NeelakantanPranav ShyamGirish SastryAmanda AskellSandhini AgarwalAriel Herbert-VossGretchen KruegerTom HenighanRewon ChildAditya RameshDaniel M. ZieglerJeffrey WuClemens WinterChristopher HesseMark ChenEric SiglerMateusz LitwinScott GrayBenjamin ChessJack ClarkChristopher BernerSam McCandlishAlec RadfordIlya SutskeverDario Amodei
2020-05-28

Categories