For example, at a 500M parameter size, Primer improves the original T5 architecture on C4 auto-regressive language modeling, reducing the training cost by 4X.
Ranked #1 on Language Modelling on C4
Transformers have become one of the most important architectural innovations in deep learning and have enabled many breakthroughs over the past few years.
Ranked #19 on Natural Language Inference on MultiNLI
One important challenge of applying deep learning to electronic health records (EHR) is the complexity of their multimodal structure.
However, this progress has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks---or similarly restrictive search spaces.
We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations.
Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models.
Ranked #1 on Machine Translation on WMT2014 English-Czech
The Machine Recognition of Crystallization Outcomes (MARCO) initiative has assembled roughly half a million annotated images of macromolecular crystallization experiments from various sources and setups.