Methods > General > Learning Rate Schedules

Linear Warmup With Cosine Annealing


Linear Warmup With Cosine Annealing is a learning rate schedule where we increase the learning rate linearly for $n$ updates and then anneal according to a cosine schedule afterwards.

Latest Papers

PAPER DATE
One Model to Rule them All: Towards Zero-Shot Learning for Databases
Benjamin HilprechtCarsten Binnig
2021-05-03
Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks
Tatyana IazykovaDenis KapelyushnikOlga BystrovaAndrey Kutuzov
2021-05-03
Mitigating Political Bias in Language Models Through Reinforced Calibration
Ruibo LiuChenyan JiaJason WeiGuangxuan XuLili WangSoroush Vosoughi
2021-04-30
Entailment as Few-Shot Learner
Sinong WangHan FangMadian KhabsaHanzi MaoHao Ma
2021-04-29
UoT-UWF-PartAI at SemEval-2021 Task 5: Self Attention Based Bi-GRU with Multi-Embedding Representation for Toxicity Highlighter
Hamed Babaei GiglouTaher RahgooyMostafa RahgouyJafar Razmara
2021-04-27
Extractive and Abstractive Explanations for Fact-Checking and Evaluation of News
Ashkan KazemiZehua LiVerónica Pérez-RosasRada Mihalcea
2021-04-27
Easy and Efficient Transformer : Scalable Inference Solution For large NLP mode
| Gongzheng liYadong XiJingzhen DingDuan WangBai LiuChangjie FanXiaoxi MaoZeng Zhao
2021-04-26
PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Wei ZengXiaozhe RenTeng SuHui WangYi LiaoZhiwei WangXin JiangZhenZhang YangKaisheng WangXiaoda ZhangChen LiZiyan GongYifan YaoXinjing HuangJun WangJianfeng YuQi GuoYue YuYan ZhangJin WangHengtao TaoDasen YanZexuan YiFang PengFangqing JiangHan ZhangLingfeng DengYehong ZhangZhe LinChao ZhangShaojie ZhangMingyue GuoShanzhi GuGaojun FanYaoWei WangXuefeng JinQun LiuYonghong Tian
2021-04-26
Accounting for Agreement Phenomena in Sentence Comprehension with Transformer Language Models: Effects of Similarity-based Interference on Surprisal and Attention
Soo Hyun RyuRichard L. Lewis
2021-04-26
"What's The Context?" : Long Context NLM Adaptation for ASR Rescoring in Conversational Agents
Ashish ShenoySravan BodapatiMonica SunkaraSrikanth RonankiKatrin Kirchhoff
2021-04-21
Efficient pre-training objectives for Transformers
Luca Di LielloMatteo GabburoAlessandro Moschitti
2021-04-20
Analyzing COVID-19 Tweets with Transformer-based Language Models
Philip FeldmanSim TiwariCharissa S. L. CheahJames R. FouldsSHimei Pan
2021-04-20
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation
Kang Min YooDongju ParkJaewook KangSang-Woo LeeWoomyeong Park
2021-04-18
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao LuMax BartoloAlastair MooreSebastian RiedelPontus Stenetorp
2021-04-18
Natural Instructions: Benchmarking Generalization to New Tasks from Natural Language Instructions
Swaroop MishraDaniel KhashabiChitta BaralHannaneh Hajishirzi
2021-04-18
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
Tianyu LiuYizhe ZhangChris BrockettYi MaoZhifang SuiWeizhu ChenBill Dolan
2021-04-18
The Power of Scale for Parameter-Efficient Prompt Tuning
| Brian LesterRami Al-RfouNoah Constant
2021-04-18
An Adversarially-Learned Turing Test for Dialog Generation Models
| Xiang GaoYizhe ZhangMichel GalleyBill Dolan
2021-04-16
Text2App: A Framework for Creating Android Apps from Text Descriptions
| Masum HasanKazi Sajeed MehrabWasi Uddin AhmadRifat Shahriyar
2021-04-16
Surface Form Competition: Why the Highest Probability Answer Isn't Always Right
| Ari HoltzmanPeter WestVered SchwartzYejin ChoiLuke Zettlemoyer
2021-04-16
NAREOR: The Narrative Reordering Problem
Varun GangalSteven Y. FengEduard HovyTeruko Mitamura
2021-04-14
Understanding Transformers for Bot Detection in Twitter
| Andres Garcia-SilvaCristian BerrioJose Manuel Gomez-Perez
2021-04-13
Meta-tuning Language Models to Answer Prompts Better
Ruiqi ZhongKristy LeeZheng ZhangDan Klein
2021-04-10
Knowledge-Aware Graph-Enhanced GPT-2 for Dialogue State Tracking
Weizhe LinBo-Hsian TsengBill Byrne
2021-04-09
KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding
Keyur FalduAmit ShethPrashant KikaniHemang Akabari
2021-04-09
Using GPT-2 to Create Synthetic Data to Improve the Prediction Performance of NLP Machine Learning Classification Models
Dewayne Whitfield
2021-04-02
Automatic Graph Partitioning for Very Large-scale Deep Learning
Masahiro TanakaKenjiro TauraToshihiro HanawaKentaro Torisawa
2021-03-30
Thinking Aloud: Dynamic Context Generation Improves Zero-Shot Reasoning Performance of GPT-2
Gregor BetzKyle RichardsonChristian Voigt
2021-03-24
FastMoE: A Fast Mixture-of-Expert Training System
| Jiaao HeJiezhong QiuAohan ZengZhilin YangJidong ZhaiJie Tang
2021-03-24
Detecting Hate Speech with GPT-3
| Ke-Li ChiuRohan Alexander
2021-03-23
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Sushant SinghAusif Mahmood
2021-03-23
Play the Shannon Game With Language Models: A Human-Free Approach to Summary Evaluation
Nicholas EganOleg VasilyevJohn Bohannon
2021-03-19
GPT Understands, Too
| Xiao LiuYanan ZhengZhengxiao DuMing DingYujie QianZhilin YangJie Tang
2021-03-18
Long Document Summarization in a Low Resource Setting using Pretrained Language Models
Ahsaas BajajPavitra DangatiKalpesh KrishnaPradhiksha Ashok KumarRheeya UppaalBradford WindsorEliot BrennerDominic DotterrerRajarshi DasAndrew McCallum
2021-03-01
From Universal Language Model to Downstream Task: Improving RoBERTa-Based Vietnamese Hate Speech Detection
Quang Huu PhamViet Anh NguyenLinh Bao DoanNgoc N. TranTa Minh Thanh
2021-02-24
Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models
Harold OttJasmin BogatinovskiAlexander AckerSasho NedelkoskiOdej Kao
2021-02-23
Calibrate Before Use: Improving Few-Shot Performance of Language Models
| Tony Z. ZhaoEric WallaceShi FengDan KleinSameer Singh
2021-02-19
THEaiTRE 1.0: Interactive generation of theatre play scripts
Rudolf RosaTomáš MusilOndřej DušekDominik JurkoPatrícia SchmidtováDavid MarečekOndřej BojarTom KocmiDaniel HrbekDavid KošťákMartina KinskáMarie NovákováJosef DoležalKlára VoseckáTomáš StudeníkPetr Žabka
2021-02-17
Exploring Transformers in Natural Language Generation: GPT, BERT, and XLNet
M. Onat TopalAnil BasImke van Heerden
2021-02-16
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
Zhuohan LiSiyuan ZhuangShiyuan GuoDanyang ZhuoHao ZhangDawn SongIon Stoica
2021-02-16
The corruptive force of AI-generated advice
Margarita LeibNils C. KöbisRainer Michael RilkeMarloes HagensBernd Irlenbusch
2021-02-15
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
Laria ReynoldsKyle McDonell
2021-02-15
Multiversal views on language models
Laria ReynoldsKyle McDonell
2021-02-12
AuGPT: Dialogue with Pre-trained Language Models and Data Augmentation
| Jonáš KulhánekVojtěch HudečekTomáš NekvindaOndřej Dušek
2021-02-09
Generating Fake Cyber Threat Intelligence Using Transformer-Based Models
Priyanka RanadeAritran PiplaiSudip MittalAnupam JoshiTim Finin
2021-02-08
How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases
| Hannah KirkYennie JunHaider IqbalElias BenussiFilippo VolpinFrederic A. DreyerAleksandar ShtedritskiYuki M. Asano
2021-02-08
A Hybrid Task-Oriented Dialog System with Domain and Task Adaptive Pretraining
| Boliang ZhangYing LyuNing DingTianhao ShenZhaoyang JiaKun HanKevin Knight
2021-02-08
Neural Data-to-Text Generation with LM-based Text Augmentation
Ernie ChangXiaoyu ShenDawei ZhuVera DembergHui Su
2021-02-06
Jointly Improving Language Understanding and Generation with Quality-Weighted Weak Supervision of Automatic Labeling
Ernie ChangVera DembergAlex Marin
2021-02-06
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers
Chaoyang HeShen LiMahdi SoltanolkotabiSalman Avestimehr
2021-02-05
Understanding Emails and Drafting Responses -- An Approach Using GPT-3
Jonas ThiergartStefan HuberThomas Übellacker
2021-02-05
Adaptive Semiparametric Language Models
Dani YogatamaCyprien de Masson d'AutumeLingpeng Kong
2021-02-04
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models
Alex TamkinMiles BrundageJack ClarkDeep Ganguli
2021-02-04
"Is depression related to cannabis?": A knowledge-infused model for Entity and Relation Extraction with Limited Supervision
Kaushik RoyUsha LokalaVedant KhandelwalAmit Sheth
2021-02-01
Synthesizing Monolingual Data for Neural Machine Translation
Benjamin MarieAtsushi Fujita
2021-01-29
BERT Transformer model for Detecting Arabic GPT2 Auto-Generated Tweets
Fouzi HarragMaria DebbahKareem DarwishAhmed Abdelali
2021-01-22
Towards Facilitating Empathic Conversations in Online Mental Health Support: A Reinforcement Learning Approach
| ASHISH SHARMAInna W. LinAdam S. MinerDavid C. AtkinsTim Althoff
2021-01-19
Persistent Anti-Muslim Bias in Large Language Models
Abubakar AbidMaheen FarooqiJames Zou
2021-01-14
Polyjuice: Automated, General-purpose Counterfactual Generation
Tongshuang WuMarco Tulio RibeiroJeffrey HeerDaniel S. Weld
2021-01-01
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Xiang Lisa LiPercy Liang
2021-01-01
Cluster-Former: Clustering-based Sparse Transformer for Question Answering
Anonymous
2021-01-01
Subformer: A Parameter Reduced Transformer
Anonymous
2021-01-01
Pretrain Knowledge-Aware Language Models
Anonymous
2021-01-01
Adding Recurrence to Pretrained Transformers
Anonymous
2021-01-01
KETG: A Knowledge Enhanced Text Generation Framework
Anonymous
2021-01-01
How Multipurpose Are Language Models?
Anonymous
2021-01-01
Making Pre-trained Language Models Better Few-shot Learners
| Tianyu GaoAdam FischDanqi Chen
2020-12-31
Conditional Generation of Temporally-ordered Event Sequences
Shih-ting LinNathanael ChambersGreg Durrett
2020-12-31
Directed Beam Search: Plug-and-Play Lexically Constrained Language Generation
| Damian PascualBeni EgressyFlorian BolliRoger Wattenhofer
2020-12-31
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
| Leo GaoStella BidermanSid BlackLaurence GoldingTravis HoppeCharles FosterJason PhangHorace HeAnish ThiteNoa NabeshimaShawn PresserConnor Leahy
2020-12-31
Robust Dialogue Utterance Rewriting as Sequence Tagging
Jie HaoLinfeng SongLiWei WangKun XuZhaopeng TuDong Yu
2020-12-29
Uncertainty and Surprisal Jointly Deliver the Punchline: Exploiting Incongruity-Based Features for Humor Recognition
Yubo XieJunze LiPearl Pu
2020-12-22
Breaking Writer's Block: Low-cost Fine-tuning of Natural Language Generation Models
Alexandre DuvalThomas LamsonGael de Leseleuc de KerouaraMatthias Gallé
2020-12-19
Query expansion with artificially generated texts
Vincent Claveau
2020-12-16
Revisiting Linformer with a modified self-attention with linear complexity
Madhusudan Verma
2020-12-16
RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation
| Michał BieńMichał GilskiMartyna MaciejewskaWojciech TaisnerDawid WiśniewskiAgnieszka Ławrynowicz
2020-12-15
Extracting Training Data from Large Language Models
Nicholas CarliniFlorian TramerEric WallaceMatthew JagielskiAriel Herbert-VossKatherine LeeAdam RobertsTom BrownDawn SongUlfar ErlingssonAlina OpreaColin Raffel
2020-12-14
Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment
Julien LaunayIacopo PoliKilian MüllerGustave ParienteIgor CarronLaurent DaudetFlorent KrzakalaSylvain Gigan
2020-12-11
As good as new. How to successfully recycle English GPT-2 to make models for other languages
| Wietse de VriesMalvina Nissim
2020-12-10
Towards Neural Programming Interfaces
| Zachary C. BrownNathaniel RobinsonDavid WingateNancy Fulda
2020-12-10
UBAR: Towards Fully End-to-End Task-Oriented Dialog Systems with GPT-2
| Yunyi YangYunhao LiXiaojun Quan
2020-12-07
CX DB8: A queryable extractive summarizer and semantic search engine
| Allen Roush
2020-12-07
Enhanced Offensive Language Detection Through Data Augmentation
Ruibo LiuGuangxuan XuSoroush Vosoughi
2020-12-05
RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation
Nan TangJu FanFangyi LiJianhong TuXiaoyong DuGuoliang LiSam MaddenMourad Ouzzani
2020-12-04
Adversarial Sparse Transformer for Time Series Forecasting
Sifan WuXi XiaoQianggang DingPeilin ZhaoYing WEIJunzhou Huang
2020-12-01
CPM: A Large-scale Generative Chinese Pre-trained Language Model
| Zhengyan ZhangXu HanHao ZhouPei KeYuxian GuDeming YeYujia QinYusheng SuHaozhe JiJian GuanFanchao QiXiaozhi WangYanan ZhengGuoyang ZengHuanqi CaoShengqi ChenDaixuan LiZhenbo SunZhiyuan LiuMinlie HuangWentao HanJie TangJuanzi LiXiaoyan ZhuMaosong Sun
2020-12-01
Generative Pre-training for Paraphrase Generation by Representing and Predicting Spans in Exemplars
Tien-Cuong BuiVan-Duc LeHai-Thien ToSang Kyun Cha
2020-11-29
Do Fine-tuned Commonsense Language Models Really Generalize?
Mayank KejriwalKe Shen
2020-11-18
An Efficient and Scalable Deep Learning Approach for Road Damage Detection
| Sadra Naddaf-shM-Mahdi Naddaf-ShAmir R. KashaniHassan Zargarzadeh
2020-11-18
Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey
Benyamin GhojoghAli Ghodsi
2020-11-17
DebateSum: A large-scale argument mining and summarization dataset
| Allen RoushArvind Balaji
2020-11-14
Adapting a Language Model for Controlled Affective Text Generation
| Ishika SinghAhsan BarkatiTushar GoswamyAshutosh Modi
2020-11-08
Semi-Supervised Low-Resource Style Transfer of Indonesian Informal to Formal Language with Iterative Forward-Translation
| Haryo Akbarianto WibowoTatag Aziz PrawiroMuhammad IhsanAlham Fikri AjiRadityo Eko PrasojoRahmad MahendraSuci Fitriany
2020-11-06
Tabular Transformers for Modeling Multivariate Time Series
| Inkit PadhiYair SchiffIgor MelnykMattia RigottiYoussef MrouehPierre DogninJerret RossRavi NairErik Altman
2020-11-03
Topic-Preserving Synthetic News Generation: An Adversarial Deep Reinforcement Learning Approach
Ahmadreza MosallanezhadKai ShuHuan Liu
2020-10-30
Unsupervised Paraphrase Generation via Dynamic Blocking
Tong NiuSemih YavuzYingbo ZhouHuan WangNitish Shirish KeskarCaiming Xiong
2020-10-24
Topic Modeling with Contextualized Word Representation Clusters
Laure ThompsonDavid Mimno
2020-10-23
LightSeq: A High Performance Inference Library for Transformers
| Xiaohui WangYing XiongYang WeiMingxuan WangLei LI
2020-10-23
Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset
Xie ChenYu WuZhenghao WangShujie LiuJinyu Li
2020-10-22
Transferable Graph Optimizers for ML Compilers
Yanqi ZhouSudip RoyAmirali AbdolrashidiDaniel WongPeter MaQiumin XuHanxiao LiuPhitchaya Mangpo PhothilimthanaShen WangAnna GoldieAzalia MirhoseiniJames Laudon
2020-10-21
Performance of Transfer Learning Model vs. Traditional Neural Network in Low System Resource Environment
William Hui
2020-10-20
Better Distractions: Transformer-based Distractor Generation and Multiple Choice Question Filtering
Jeroen OfferijnsSuzan VerberneTessa Verhoef
2020-10-19
Decoding Methods for Neural Narrative Generation
| Alexandra DeLuciaAaron MuellerXiang Lisa LiJoão Sedoc
2020-10-14
DA-Transformer: Distance-aware Transformer
Chuhan WuFangzhao WuYongfeng Huang
2020-10-14
Memformer: The Memory-Augmented Transformer
Qingyang WuZhenzhong LanJing GuZhou Yu
2020-10-14
The workweek is the best time to start a family -- A Study of GPT-2 Based Claim Generation
Shai GretzYonatan BiluEdo Cohen-KarlikNoam Slonim
2020-10-13
Meta-Context Transformers for Domain-Specific Response Generation
Debanjana KarSuranjana SamantaAmar Prakash Azad
2020-10-12
COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs
Jena D. HwangChandra BhagavatulaRonan Le BrasJeff DaKeisuke SakaguchiAntoine BosselutYejin Choi
2020-10-12
Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU
| Brielen MadureiraDavid Schlangen
2020-10-11
Scene Graph Modification Based on Natural Language Commands
| Xuanli HeQuan Hung TranGholamreza HaffariWalter ChangTrung BuiZhe LinFranck DernoncourtNhan Dam
2020-10-06
Investigating African-American Vernacular English in Transformer-Based Text Generation
| Sophie GroenwoldLily OuAesha ParekhSamhita HonnavalliSharon LevyDiba MirzaWilliam Yang Wang
2020-10-06
GenAug: Data Augmentation for Finetuning Text Generators
| Steven Y. FengVarun GangalDongyeop KangTeruko MitamuraEduard Hovy
2020-10-05
Inquisitive Question Generation for High Level Text Comprehension
| Wei-Jen KoTe-Yuan ChenYiyan HuangGreg DurrettJunyi Jessy Li
2020-10-04
Examining the rhetorical capacities of neural language models
Zining ZhuChuer PanMohamed AbdallaFrank Rudzicz
2020-10-01
The design and implementation of Language Learning Chatbot with XAI using Ontology and Transfer Learning
Nuobei ShiQin ZengRaymond Lee
2020-09-29
Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions
| Peter A. Jansen
2020-09-29
Toward a Thermodynamics of Meaning
| Jonathan Scott Enderle
2020-09-24
On Data Augmentation for Extreme Multi-label Classification
Danqing ZhangTao LiHaiyang ZhangBing Yin
2020-09-22
Prior Art Search and Reranking for Generated Patent Text
Jieh-Sheng LeeJieh Hsiang
2020-09-19
Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models
Jihyeon RohHuiseong GimSoo-Young Lee
2020-09-18
Critical Thinking for Language Models
| Gregor BetzChristian VoigtKyle Richardson
2020-09-15
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
| Timo SchickHinrich Schütze
2020-09-15
Dialogue Response Ranking Training with Large-Scale Human Feedback Data
Xiang GaoYizhe ZhangMichel GalleyChris BrockettBill Dolan
2020-09-15
The Radicalization Risks of GPT-3 and Advanced Neural Language Models
Kris McGuffieAlex Newhouse
2020-09-15
GeDi: Generative Discriminator Guided Sequence Generation
| Ben KrauseAkhilesh Deepak GotmareBryan McCannNitish Shirish KeskarShafiq JotyRichard SocherNazneen Fatema Rajani
2020-09-14
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
Shuohang WangLuowei ZhouZhe GanYen-Chun ChenYuwei FangSiqi SunYu ChengJingjing Liu
2020-09-13
Unit Test Case Generation with Transformers
| Michele TufanoDawn DrainAlexey SvyatkovskiyShao Kun DengNeel Sundaresan
2020-09-11
Modern Methods for Text Generation
| Dimas Munoz Montesinos
2020-09-10
Brain2Word: Decoding Brain Activity for Language Generation
| Nicolas AffolterBeni EgressyDamian PascualRoger Wattenhofer
2020-09-10
Sparsifying Transformer Models with Trainable Representation Pooling
Michał PietruszkaŁukasz BorchmannŁukasz Garncarek
2020-09-10
Pay Attention when Required
| Swetha MandavaSzymon MigaczAlex Fit Florea
2020-09-09
Measuring Massive Multitask Language Understanding
| Dan HendrycksCollin BurnsSteven BasartAndy ZouMantas MazeikaDawn SongJacob Steinhardt
2020-09-07
Black Box to White Box: Discover Model Characteristics Based on Strategic Probing
Josh KalinMatthew CiolinoDavid NoeverGerry Dozier
2020-09-07
Improving Language Generation with Sentence Coherence Objective
| Ruixiao SunJie YangMehrdad Yousefzadeh
2020-09-07
Comparative Evaluation of Pretrained Transfer Learning Models on Automatic Short Answer Grading
| Sasi Kiran GaddipatiDeebul NairPaul G. Plöger
2020-09-02
Knowledge Efficient Deep Learning for Natural Language Processing
Hai Wang
2020-08-28
DAVE: Deriving Automatically Verilog from English
Hammond PearceBenjamin TanRamesh Karri
2020-08-27
Discrete Word Embedding for Logical Natural Language Understanding
Masataro AsaiZilu Tang
2020-08-26
ETC-NLG: End-to-end Topic-Conditioned Natural Language Generation
| Ginevra CarboneGabriele Sarti
2020-08-25
Dynamics of feed forward induced interference training
Shirui Tang
2020-08-24
Narrative Interpolation for Generating and Understanding Stories
Su WangGreg DurrettKatrin Erk
2020-08-17
Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study
Dara BahriYi TayChe ZhengDonald MetzlerCliff BrunkAndrew Tomkins
2020-08-17
Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size
Davis YoshidaAllyson EttingerKevin Gimpel
2020-08-16
Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems
Andrea MadottoZihan LiuZhaojiang LinPascale Fung
2020-08-14
Navigating Human Language Models with Synthetic Agents
Philip FeldmanAntonio Bucchiarone
2020-08-10
The Jazz Transformer on the Front Line: Exploring the Shortcomings of AI-composed Music through Quantitative Measures
| Shih-Lun WuYi-Hsuan Yang
2020-08-04
Trojaning Language Models for Fun and Profit
Xinyang ZhangZheng ZhangShouling JiTing Wang
2020-08-01
Multi-node Bert-pretraining: Cost-efficient Approach
Jiahuang LinXin LiGennady Pekhimenko
2020-08-01
Language Modelling for Source Code with Transformer-XL
| Thomas DowdellHongyu Zhang
2020-07-31
TweepFake: about Detecting Deepfake Tweets
Tiziano FagniFabrizio FalchiMargherita GambiniAntonio MartellaMaurizio Tesconi
2020-07-31
Composer Style Classification of Piano Sheet Music Images Using Language Model Pretraining
| TJ TsaiKevin Ji
2020-07-29
Generative Pretraining from Pixels
| Mark ChenAlec RadfordRewon ChildJeff WuHeewoo JunPrafulla DhariwalDavid LuanIlya Sutskever
2020-07-17
Deep Transformer based Data Augmentation with Subword Units for Morphologically Rich Online ASR
Balázs TarjánGyörgy SzaszákTibor FegyóPéter Mihajlik
2020-07-14
The Go Transformer: Natural Language Modeling for Game Play
Matthew CiolinoDavid NoeverJosh Kalin
2020-07-07
Do Transformers Need Deep Long-Range Memory
Jack W. RaeAli Razavi
2020-07-07
You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion
Roei SchusterCongzheng SongEran TromerVitaly Shmatikov
2020-07-05
On-The-Fly Information Retrieval Augmentation for Language Models
Hai WangDavid Mcallester
2020-07-03
Do Transformers Need Deep Long-Range Memory?
Jack RaeAli Razavi
2020-07-01
Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation
Bo PangErik NijkampWenjuan HanLinqi ZhouYixian LiuKewei Tu
2020-07-01
Roles and Utilization of Attention Heads in Transformer-based Neural Language Models
Jae-young JoSung-Hyon Myaeng
2020-07-01
Probing for Referential Information in Language Models
Ionut-Teodor SorodocKristina GulordavaGemma Boleda
2020-07-01
LSTM and GPT-2 Synthetic Speech Transfer Learning for Speaker Recognition to Overcome Data Scarcity
Jordan J. BirdDiego R. FariaAnikó EkártCristiano PremebidaPedro P. S. Ayrosa
2020-07-01
The Summary Loop: Learning to Write Abstractive Summaries Without Examples
| Philippe LabanAndrew HsiJohn CannyMarti A. Hearst
2020-07-01
Knowledge-Aware Language Model Pretraining
Corby RossetChenyan XiongMinh PhanXia SongPaul BennettSaurabh Tiwary
2020-06-29
Progressive Generation of Long Text with Pretrained Language Models
| Bowen TanZichao YangMaruan AI-ShedivatEric P. XingZhiting Hu
2020-06-28
Mind The Facts: Knowledge-Boosted Coherent Abstractive Text Summarization
Beliz GunelChenguang ZhuMichael ZengXuedong Huang
2020-06-27
Video-Grounded Dialogues with Pretrained Generation Language Models
Hung LeSteven C. H. Hoi
2020-06-27
A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19
| David OnianiYanshan Wang
2020-06-19
Memory-Efficient Pipeline-Parallel DNN Training
Deepak NarayananAmar PhanishayeeKaiyu ShiXie ChenMatei Zaharia
2020-06-16
Unsupervised Paraphrase Generation using Pre-trained Language Models
Chaitra HegdeShrikumar Patil
2020-06-09
Few-Shot Generative Conversational Query Rewriting
| Shi YuJiahua LiuJingqin YangChenyan XiongPaul BennettJianfeng GaoZhiyuan Liu
2020-06-09
GMAT: Global Memory Augmentation for Transformers
| Ankit GuptaJonathan Berant
2020-06-05
Automatic Text Summarization of COVID-19 Medical Research Articles using BERT and GPT-2
| Virapat KieuvongngamBowen TanYiming Niu
2020-06-03
Emergence of Separable Manifolds in Deep Language Representations
| Jonathan MamouHang LeMiguel Del RioCory StephensonHanlin TangYoon KimSueYeon Chung
2020-06-01
First Neural Conjecturing Datasets and Experiments
Josef UrbanJan Jakubův
2020-05-29
Language Models are Few-Shot Learners
| Tom B. BrownBenjamin MannNick RyderMelanie SubbiahJared KaplanPrafulla DhariwalArvind NeelakantanPranav ShyamGirish SastryAmanda AskellSandhini AgarwalAriel Herbert-VossGretchen KruegerTom HenighanRewon ChildAditya RameshDaniel M. ZieglerJeffrey WuClemens WinterChristopher HesseMark ChenEric SiglerMateusz LitwinScott GrayBenjamin ChessJack ClarkChristopher BernerSam McCandlishAlec RadfordIlya SutskeverDario Amodei
2020-05-28
Artificial Intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry
| Nils KöbisLuca Mossink
2020-05-20
Exploring Transformers for Large-Scale Speech Recognition
Liang LuChangliang LiuJinyu LiYifan Gong
2020-05-19
Large Scale Multi-Actor Generative Dialog Modeling
Alex BoydRaul PuriMohammad ShoeybiMostofa PatwaryBryan Catanzaro
2020-05-13
On the Generation of Medical Dialogues for COVID-19
| Wenmian YangGuangtao ZengBowen TanZeqian JuSubrato ChakravortyXuehai HeShu ChenXingyi YangQingyang WuZhou YuEric XingPengtao Xie
2020-05-11
Distributional Discrepancy: A Metric for Unconditional Text Generation
| Ping CaiXingyuan ChenPeng JinHongjun WangTianrui Li
2020-05-04
Spying on your neighbors: Fine-grained probing of contextual embeddings for information about surrounding words
Josef KlafkaAllyson Ettinger
2020-05-04
Transformer-based End-to-End Question Generation
| Luis Enrico LopezDiane Kathryn CruzJan Christian Blaise CruzCharibeth Cheng
2020-05-03
A Simple Language Model for Task-Oriented Dialogue
| Ehsan Hosseini-AslBryan McCannChien-Sheng WuSemih YavuzRichard Socher
2020-05-02
A Controllable Model of Grounded Response Generation
Zeqiu WuMichel GalleyChris BrockettYizhe ZhangXiang GaoChris QuirkRik Koncel-KedziorskiJianfeng GaoHannaneh HajishirziMari OstendorfBill Dolan
2020-05-01
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training
| Yizhe ZhangGuoyin WangChunyuan LiZhe GanChris BrockettBill Dolan
2020-05-01
Multilingual Corpus Creation for Multilingual Semantic Similarity Task
Mahtab AhmedChahna DixitRobert E. MercerAtif KhanMuhammad Rifayat SameeFelipe Urra
2020-05-01
Evaluation Metrics for Headline Generation Using Deep Pre-Trained Embeddings
Abdul MoeedYang AnGerhard HagererGeorg Groh
2020-05-01
Improving Neural Language Generation with Spectrum Control
Lingxiao WangJing HuangKevin HuangZiniu HuGuangtao WangQuanquan Gu
2020-05-01
PlotMachines: Outline-Conditioned Generation with Dynamic Plot State Tracking
| Hannah RashkinAsli CelikyilmazYejin ChoiJianfeng Gao
2020-04-30
Segatron: Segment-Aware Transformer for Language Modeling and Understanding
| He BaiPeng ShiJimmy LinYuqing XieLuchen TanKun XiongWen GaoMing Li
2020-04-30
GePpeTto Carves Italian into a Language Model
| Lorenzo De MatteiMichele CafagnaFelice Dell'OrlettaMalvina NissimMarco Guerini
2020-04-29
LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning
Kaitao SongHao SunXu TanTao QinJianfeng LuHongzhi LiuTie-Yan Liu
2020-04-27
Assessing Discourse Relations in Language Generation from GPT-2
Wei-Jen KoJunyi Jessy Li
2020-04-26
A Tailored Pre-Training Model for Task-Oriented Dialog Generation
Jing GuQingyang WuChongruo wuWeiyan ShiZhou Yu
2020-04-24
Mirror Ritual: An Affective Interface for Emotional Self-Reflection
Nina RajcicJon McCormack
2020-04-21
StereoSet: Measuring stereotypical bias in pretrained language models
| Moin NadeemAnna BethkeSiva Reddy
2020-04-20
ResNeSt: Split-Attention Networks
| Hang ZhangChongruo wuZhongyue ZhangYi ZhuHaibin LinZhi ZhangYue SunTong HeJonas MuellerR. ManmathaMu LiAlexander Smola
2020-04-19
Generating Counter Narratives against Online Hate Speech: Data and Strategies
Serra Sinem TekirogluYi-Ling ChungMarco Guerini
2020-04-08
Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity
Hamza HarkousIsabel GrovesAmir Saffari
2020-04-08
TextGAIL: Generative Adversarial Imitation Learning for Text Generation
Qingyang WuLei LIZhou Yu
2020-04-07
DARE: Data Augmented Relation Extraction with GPT-2
Yannis PapanikolaouAndrea Pierleoni
2020-04-06
Sparse Text Generation
| Pedro Henrique MartinsZita MarinhoAndré F. T. Martins
2020-04-06
Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space
| Chunyuan LiXiang GaoYuan LiBaolin PengXiujun LiYizhe ZhangJianfeng Gao
2020-04-05
Generating Rationales in Visual Question Answering
Hammad A. AyyubiMd. Mehrab TanjimJulian J. McAuleyGarrison W. Cottrell
2020-04-04
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
| Kevin ClarkMinh-Thang LuongQuoc V. LeChristopher D. Manning
2020-03-23
Finnish Language Modeling with Deep Transformer Models
Abhilash JainAku RuoheStig-Arne GrönroosMikko Kurimo
2020-03-14
Generating Major Types of Chinese Classical Poetry in a Uniformed Framework
Jinyi HuMaosong Sun
2020-03-13
RecipeGPT: Generative Pre-training Based Cooking Recipe Generation and Evaluation System
| Helena H. LeeKe ShuPalakorn AchananuparpPhilips Kokoh PrasetyoYue LiuEe-Peng LimLav R. Varshney
2020-03-05
Hybrid Generative-Retrieval Transformers for Dialogue Domain Adaptation
Igor ShalyminovAlessandro SordoniAdam AtkinsonHannes Schulz
2020-03-03
Training Question Answering Models From Synthetic Data
Raul PuriRyan SpringMostofa PatwaryMohammad ShoeybiBryan Catanzaro
2020-02-22
Transformer on a Diet
| Chenguang WangZihao YeAston ZhangZheng ZhangAlexander J. Smola
2020-02-14
Training Large Neural Networks with Constant Memory using a New Execution Algorithm
| Bharadwaj PudipeddiMaral MesmakhosroshahiJinwen XiSujeeth Bharadwaj
2020-02-13
CBAG: Conditional Biomedical Abstract Generation
Justin SybrandtIlya Safro
2020-02-13
Introducing Aspects of Creativity in Automatic Poetry Generation
Brendan BenaJugal Kalita
2020-02-06
Joint Contextual Modeling for ASR Correction and Language Understanding
Yue WengSai Sumanth MiryalaChandra KhatriRunze WangHuaixiu ZhengPiero MolinoMahdi NamazifarAlexandros PapangelisHugh WilliamsFranziska BellGokhan Tur
2020-01-28
Reducing Non-Normative Text Generation from Language Models
Xiangyu PengSiyan LiSpencer FrazierMark Riedl
2020-01-23
Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network
| Jungkyu LeeTaeryun WonTae Kwan LeeHyemin LeeGeonmo GuKiho Hong
2020-01-17
PatentTransformer-2: Controlling Patent Text Generation by Structural Metadata
Jieh-Sheng LeeJieh Hsiang
2020-01-11
OTEANN: Estimating the Transparency of Orthographies with an Artificial Neural Network
| Xavier Marjou
2019-12-31
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
| Guangxiang ZhaoJunyang LinZhiyuan ZhangXuancheng RenQi SuXu sun
2019-12-25
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization
| Xianzhi DuTsung-Yi LinPengchong JinGolnaz GhiasiMingxing TanYin CuiQuoc V. LeXiaodan Song
2019-12-10
Personalized Patent Claim Generation and Measurement
Jieh-Sheng Lee
2019-12-07
A Comparative Study of Pretrained Language Models on Thai Social Text Categorization
Thanapapas HorsuwanKasidis KanwatcharaPeerapon VateekulBoonserm Kijsirikul
2019-12-03
Neural Academic Paper Generation
| Samet DemirUras MutluÖzgur Özdemir
2019-12-02
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling
| Sachin MehtaRik Koncel-KedziorskiMohammad RastegariHannaneh Hajishirzi
2019-11-27
Evaluating Commonsense in Pre-trained Language Models
| Xuhui ZhouYue ZhangLeyang CuiDandan Huang
2019-11-27
Paraphrasing with Large Language Models
Sam WitteveenMartin Andrews
2019-11-21
Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks
| Saurabh SinghShankar Krishnan
2019-11-21
EfficientDet: Scalable and Efficient Object Detection
| Mingxing TanRuoming PangQuoc V. Le
2019-11-20
Unsupervised Natural Question Answering with a Small Model
Martin AndrewsSam Witteveen
2019-11-19
Compressive Transformers for Long-Range Sequence Modelling
| Jack W. RaeAnna PotapenkoSiddhant M. JayakumarTimothy P. Lillicrap
2019-11-13
Attending to Entities for Better Text Understanding
Pengxiang ChengKatrin Erk
2019-11-11
INSET: Sentence Infilling with INter-SEntential Transformer
| Yichen HuangYizhe ZhangOussama ElachqarYu Cheng
2019-11-10
Zero-Shot Paraphrase Generation with Multilingual Language Models
Yinpeng GuoYi LiaoXin JiangQing ZhangYibo ZhangQun Liu
2019-11-09
Grounded Conversation Generation as Guided Traverses in Commonsense Knowledge Graphs
| Houyu ZhangZheng-Hao LiuChenyan XiongZhiyuan Liu
2019-11-07
Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds
Tassilo KleinMoin Nabi
2019-11-06
Assessing Social and Intersectional Biases in Contextualized Word Representations
| Yi Chern TanL. Elisa Celis
2019-11-04
Selecting, Planning, and Rewriting: A Modular Approach for Data-to-Document Generation and Translation
Lesly MiculicichMarc MaroneHany Hassan
2019-11-01
Inspecting Unification of Encoding and Matching with Transformer: A Case Study of Machine Reading Comprehension
Hangbo BaoLi DongFuru WeiWenhui WangNan YangLei CuiSonghao PiaoMing Zhou
2019-11-01
GEM: Generative Enhanced Model for adversarial attacks
Piotr NiewinskiMaria PszonaMaria Janicka
2019-11-01
Natural Language Generation for Effective Knowledge Distillation
Raphael TangYao LuJimmy Lin
2019-11-01
Masked Language Model Scoring
| Julian SalazarDavis LiangToan Q. NguyenKatrin Kirchhoff
2019-10-31
An Empirical Study of Efficient ASR Rescoring with Transformers
Hongzhao HuangFuchun Peng
2019-10-24
Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model
Oleksii HrinchukMariya PopovaBoris Ginsburg
2019-10-23
Evolution of transfer learning in natural language processing
Aditya MaltePratik Ratadiya
2019-10-16
Q8BERT: Quantized 8Bit BERT
| Ofir ZafrirGuy BoudoukhPeter IzsakMoshe Wasserblat
2019-10-14
Stabilizing Transformers for Reinforcement Learning
| Emilio ParisottoH. Francis SongJack W. RaeRazvan PascanuCaglar GulcehreSiddhant M. JayakumarMax JaderbergRaphael Lopez KaufmanAidan ClarkSeb NouryMatthew M. BotvinickNicolas HeessRaia Hadsell
2019-10-13
Multilingual Question Answering from Formatted Text applied to Conversational Agents
Wissam SibliniCharlotte PasqualAxel LavielleMohamed ChallalCyril Cauchois
2019-10-10
Alternating Recurrent Dialog Model with Large-scale Pre-trained Language Models
| Qingyang WuYichi ZhangYu LiZhou Yu
2019-10-09
Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation
| Hang GaoXizhou ZhuSteve LinJifeng Dai
2019-10-07
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
| Samyam RajbhandariJeff RasleyOlatunji RuwaseYuxiong He
2019-10-04
Towards Understanding of Medical Randomized Controlled Trials by Conclusion Generation
| Alexander Te-Wei ShiehYung-Sung ChuangShang-Yu SuYun-Nung Chen
2019-10-03
TMLab: Generative Enhanced Model (GEM) for adversarial attacks
Piotr NiewinskiMaria PszonaMaria Janicka
2019-10-01
GDP: Generalized Device Placement for Dataflow Graphs
Yanqi ZhouSudip RoyAmirali AbdolrashidiDaniel WongPeter C. MaQiumin XuMing ZhongHanxiao LiuAnna GoldieAzalia MirhoseiniJames Laudon
2019-09-28
Extremely Small BERT Models from Mixed-Vocabulary Training
Sanqiang ZhaoRaghav GuptaYang songDenny Zhou
2019-09-25
How Additional Knowledge can Improve Natural Language Commonsense Question Answering?
Arindam MitraPratyay BanerjeeKuntal Kumar PalSwaroop MishraChitta Baral
2019-09-19
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
| Mohammad ShoeybiMostofa PatwaryRaul PuriPatrick LegresleyJared CasperBryan Catanzaro
2019-09-17
A Self-Attentional Neural Architecture for Code Completion with Multi-Task Learning
Fang LiuGe LiBolin WeiXin XiaZhiyi FuZhi Jin
2019-09-16
Ouroboros: On Accelerating Training of Transformer-Based Language Models
| Qian YangZhouyuan HuoWenlin WangHeng HuangLawrence Carin
2019-09-14
Reasoning Over Semantic-Level Graph for Fact Checking
| Wanjun ZhongJingjing XuDuyu TangZenan XuNan DuanMing ZhouJiahai WangJian Yin
2019-09-09
Semantics-aware BERT for Language Understanding
| Zhuosheng ZhangYuwei WuHai ZhaoZuchao LiShuailiang ZhangXi ZhouXiang Zhou
2019-09-05
Effective Use of Transformer Networks for Entity Tracking
| Aditya GuptaGreg Durrett
2019-09-05
How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings
| Kawin Ethayarajh
2019-09-02
Quantity doesn't buy quality syntax with neural language models
Marten van SchijndelAaron MuellerTal Linzen
2019-08-31
Adversarial Learning with Contextual Embeddings for Zero-resource Cross-lingual Classification and NER
Phillip KeungYichao LuVikas Bhardwaj
2019-08-31
Pre-training A Neural Language Model Improves The Sample Efficiency of an Emergency Room Classification Model
Binbin XuCédric Gil-JardinéFrantz ThiessardEric TellierMarta AvalosEmmanuel Lagarde
2019-08-30
Adaptively Sparse Transformers
| Gonçalo M. CorreiaVlad NiculaeAndré F. T. Martins
2019-08-30
Measuring Patent Claim Generation by Span Relevancy
Jieh-Sheng LeeJieh Hsiang
2019-08-26
Release Strategies and the Social Impacts of Language Models
Irene SolaimanMiles BrundageJack ClarkAmanda AskellAriel Herbert-VossJeff WuAlec RadfordGretchen KruegerJong Wook KimSarah KrepsMiles McCainAlex NewhouseJason BlazakisKris McGuffieJasmine Wang
2019-08-24
Universal Adversarial Triggers for Attacking and Analyzing NLP
| Eric WallaceShi FengNikhil KandpalMatt GardnerSameer Singh
2019-08-20
BioFLAIR: Pretrained Pooled Contextualized Embeddings for Biomedical Sequence Labeling Tasks
| Shreyas SharmaRon Daniel Jr
2019-08-13
Attentive Normalization
| Xilai LiWei SunTianfu Wu
2019-08-04
Noisy Channel for Low Resource Grammatical Error Correction
Simon FlachsOph{\'e}lie LacroixAnders S{\o}gaard
2019-08-01
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
| Sascha RotheShashi NarayanAliaksei Severyn
2019-07-29
DLGNet: A Transformer-based Model for Dialogue Response Generation
Oluwatobi OlabiyiErik T. Mueller
2019-07-26
Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection
David Ifeoluwa AdelaniHaotian MaiFuming FangHuy H. NguyenJunichi YamagishiIsao Echizen
2019-07-22
Patent Claim Generation by Fine-Tuning OpenAI GPT-2
Jieh-Sheng LeeJieh Hsiang
2019-07-01
GPT-based Generation for Classical Chinese Poetry
| Yi LiaoYasheng WangQun LiuXin Jiang
2019-06-29
A Tensorized Transformer for Language Modeling
| Xindian MaPeng ZhangShuai ZhangNan DuanYuexian HouDawei SongMing Zhou
2019-06-24
Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction
| Christoph AltMarc HübnerLeonhard Hennig
2019-06-19
XLNet: Generalized Autoregressive Pretraining for Language Understanding
| Zhilin YangZihang DaiYiming YangJaime CarbonellRuslan SalakhutdinovQuoc V. Le
2019-06-19
One Epoch Is All You Need
Aran Komatsuzaki
2019-06-16
A Multiscale Visualization of Attention in the Transformer Model
| Jesse Vig
2019-06-12
Analyzing the Structure of Attention in a Transformer Language Model
Jesse VigYonatan Belinkov
2019-06-07
CODAH: An Adversarially-Authored Question Answering Dataset for Common Sense
Michael ChenMike D{'}ArcyAlisa LiuFernJared ezDoug Downey
2019-06-01
Figure Eight at SemEval-2019 Task 3: Ensemble of Transfer Learning Methods for Contextual Emotion Detection
Joan Xiao
2019-06-01
Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)
| Mariya TonevaLeila Wehbe
2019-05-28
Story Ending Prediction by Transferable BERT
| Zhongyang LiXiao DingTing Liu
2019-05-17
Transformer-XL: Language Modeling with Longer-Term Dependency
Zihang Dai*Zhilin Yang*Yiming YangWilliam W. CohenJaime CarbonellQuoc V. LeRuslan Salakhutdinov
2019-05-01
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
| Yue CaoJiarui XuStephen LinFangyun WeiHan Hu
2019-04-25
Generating Long Sequences with Sparse Transformers
| Rewon ChildScott GrayAlec RadfordIlya Sutskever
2019-04-23
Language Models with Transformers
| Chenguang WangMu LiAlexander J. Smola
2019-04-20
CondConv: Conditionally Parameterized Convolutions for Efficient Inference
| Brandon YangGabriel BenderQuoc V. LeJiquan Ngiam
2019-04-10
NLPR@SRPOL at SemEval-2019 Task 6 and Task 5: Linguistically enhanced deep learning offensive sentence classifier
Alessandro SegantiHelena SobolIryna OrlovaHannam KimJakub StaniszewskiTymoteusz KrumholcKrystian Koziel
2019-04-10
Visualizing Attention in Transformer-Based Language Representation Models
Jesse Vig
2019-04-04
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
| Raphael TangYao LuLinqing LiuLili MouOlga VechtomovaJimmy Lin
2019-03-28
Language Models are Unsupervised Multitask Learners
| Alec RadfordJeffrey WuRewon ChildDavid LuanDario AmodeiIlya Sutskever
2019-02-14
Passage Re-ranking with BERT
| Rodrigo NogueiraKyunghyun Cho
2019-01-13
Linguistic Analysis of Pretrained Sentence Encoders with Acceptability Judgments
Alex WarstadtSamuel R. Bowman
2019-01-11
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
| Zihang DaiZhilin YangYiming YangJaime CarbonellQuoc V. LeRuslan Salakhutdinov
2019-01-09
Improving Language Understanding by Generative Pre-Training
| Alec RadfordKarthik NarasimhanTim SalimansIlya Sutskever
2018-06-11

Categories