Search Results for author: Yi Tay

Found 112 papers, 46 papers with code

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

no code implementations • 18 Apr 2024 • Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei LI, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen, Samuel Phua, Yazheng Yang, Yi Tay, Yuqi Wang, Zhongkai Zhu, Zhihui Xie

On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e. g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation.

GSM8K Question Answering +2

Paper
Add Code

PaLI-X: On Scaling up a Multilingual Vision and Language Model

2 code implementations • 29 May 2023 • Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture.

Ranked #1 on Fine-Grained Image Recognition on OVEN

Chart Question Answering document understanding +9

Paper
Code

PaLM 2 Technical Report

1 code implementation • 17 May 2023 • Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vlad Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, Guy Gur-Ari, Steven Hand, Hadi Hashemi, Le Hou, Joshua Howland, Andrea Hu, Jeffrey Hui, Jeremy Hurwitz, Michael Isard, Abe Ittycheriah, Matthew Jagielski, Wenhao Jia, Kathleen Kenealy, Maxim Krikun, Sneha Kudugunta, Chang Lan, Katherine Lee, Benjamin Lee, Eric Li, Music Li, Wei Li, Yaguang Li, Jian Li, Hyeontaek Lim, Hanzhao Lin, Zhongtao Liu, Frederick Liu, Marcello Maggioni, Aroma Mahendru, Joshua Maynez, Vedant Misra, Maysam Moussalem, Zachary Nado, John Nham, Eric Ni, Andrew Nystrom, Alicia Parrish, Marie Pellat, Martin Polacek, Alex Polozov, Reiner Pope, Siyuan Qiao, Emily Reif, Bryan Richter, Parker Riley, Alex Castro Ros, Aurko Roy, Brennan Saeta, Rajkumar Samuel, Renee Shelby, Ambrose Slone, Daniel Smilkov, David R. So, Daniel Sohn, Simon Tokumine, Dasha Valter, Vijay Vasudevan, Kiran Vodrahalli, Xuezhi Wang, Pidong Wang, ZiRui Wang, Tao Wang, John Wieting, Yuhuai Wu, Kelvin Xu, Yunhan Xu, Linting Xue, Pengcheng Yin, Jiahui Yu, Qiao Zhang, Steven Zheng, Ce Zheng, Weikang Zhou, Denny Zhou, Slav Petrov, Yonghui Wu

Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM.

Ranked #1 on Question Answering on StrategyQA

Code Generation Common Sense Reasoning +6

Paper
Code

Symbol tuning improves in-context learning in language models

no code implementations • 15 May 2023 • Jerry Wei, Le Hou, Andrew Lampinen, Xiangning Chen, Da Huang, Yi Tay, Xinyun Chen, Yifeng Lu, Denny Zhou, Tengyu Ma, Quoc V. Le

We present symbol tuning - finetuning language models on in-context input-label pairs where natural language labels (e. g., "positive/negative sentiment") are replaced with arbitrary symbols (e. g., "foo/bar").

In-Context Learning

Paper
Add Code

UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining

1 code implementation • 18 Apr 2023 • Hyung Won Chung, Noah Constant, Xavier Garcia, Adam Roberts, Yi Tay, Sharan Narang, Orhan Firat

As part of our contribution, we release: (i) an improved and refreshed mC4 multilingual corpus consisting of 29 trillion characters across 107 languages, and (ii) a suite of pretrained umT5 model checkpoints trained with UniMax sampling.

1,199

Paper
Code

CoLT5: Faster Long-Range Transformers with Conditional Computation

no code implementations • 17 Mar 2023 • Joshua Ainslie, Tao Lei, Michiel de Jong, Santiago Ontañón, Siddhartha Brahma, Yury Zemlyanskiy, David Uthus, Mandy Guo, James Lee-Thorp, Yi Tay, Yun-Hsuan Sung, Sumit Sanghai

Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token.

Ranked #1 on Long-range modeling on SCROLLS

Long-range modeling

Paper
Add Code

Larger language models do in-context learning differently

no code implementations • 7 Mar 2023 • Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma

We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e. g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task.

In-Context Learning

Paper
Add Code

Scaling Vision Transformers to 22 Billion Parameters

1 code implementation • 10 Feb 2023 • Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby

The scaling of Transformers has driven breakthrough capabilities for language models.

Ranked #1 on Zero-Shot Transfer Image Classification on ObjectNet

Action Classification Fairness +3

192

Paper
Code

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

1 code implementation • 31 Jan 2023 • Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, Adam Roberts

We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022).

1,391

Paper
Code

DSI++: Updating Transformer Memory with New Documents

no code implementations • 19 Dec 2022 • Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Jinfeng Rao, Marc Najork, Emma Strubell, Donald Metzler

In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents.

Continual Learning Natural Questions +1

Paper
Add Code

Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search Classification

no code implementations • 16 Dec 2022 • Jai Gupta, Yi Tay, Chaitanya Kamath, Vinh Q. Tran, Donald Metzler, Shailesh Bavadekar, Mimi Sun, Evgeniy Gabrilovich

With the devastating outbreak of COVID-19, vaccines are one of the crucial lines of defense against mass infection in this global pandemic.

Natural Language Understanding

Paper
Add Code

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

1 code implementation • 9 Dec 2022 • Aran Komatsuzaki, Joan Puigcerver, James Lee-Thorp, Carlos Riquelme Ruiz, Basil Mustafa, Joshua Ainslie, Yi Tay, Mostafa Dehghani, Neil Houlsby

In this work, we propose sparse upcycling -- a simple way to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint.

510

Paper
Code

Inverse scaling can become U-shaped

no code implementations • 3 Nov 2022 • Jason Wei, Najoung Kim, Yi Tay, Quoc V. Le

The Inverse Scaling Prize (McKenzie et al. 2022) identified eleven such inverse scaling tasks, evaluated on models of up to 280B parameters and up to 500 zettaFLOPs of training compute.

Attribute

Paper
Add Code

Scaling Instruction-Finetuned Language Models

6 code implementations • 20 Oct 2022 • Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei

We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation).

Ranked #1 on Multi-task Language Understanding on BBH-nlp

Coreference Resolution Cross-Lingual Question Answering +2

1,391

Paper
Code

Transcending Scaling Laws with 0.1% Extra Compute

no code implementations • 20 Oct 2022 • Yi Tay, Jason Wei, Hyung Won Chung, Vinh Q. Tran, David R. So, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani

This paper proposes UL2R, a method that substantially improves existing language models and their scaling curves with a relatively tiny amount of extra compute.

Ranked #2 on Cross-Lingual Question Answering on TyDiQA-GoldP

Arithmetic Reasoning Cross-Lingual Question Answering +4

Paper
Add Code

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

1 code implementation • 17 Oct 2022 • Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei

BIG-Bench (Srivastava et al., 2022) is a diverse evaluation suite that focuses on tasks believed to be beyond the capabilities of current language models.

Language Modelling

375

Paper
Code

Language Models are Multilingual Chain-of-Thought Reasoners

2 code implementations • 6 Oct 2022 • Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei

Finally, we show that the multilingual reasoning abilities of language models extend to other tasks such as commonsense reasoning and word-in-context semantic judgment.

GSM8K Math

166

Paper
Code

Recitation-Augmented Language Models

1 code implementation • 4 Oct 2022 • Zhiqing Sun, Xuezhi Wang, Yi Tay, Yiming Yang, Denny Zhou

We propose a new paradigm to help Large Language Models (LLMs) generate more accurate factual knowledge without retrieving from an external corpus, called RECITation-augmented gEneration (RECITE).

Natural Questions Question Answering +2

Paper
Code

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?

no code implementations • 21 Jul 2022 • Yi Tay, Mostafa Dehghani, Samira Abnar, Hyung Won Chung, William Fedus, Jinfeng Rao, Sharan Narang, Vinh Q. Tran, Dani Yogatama, Donald Metzler

There have been a lot of interest in the scaling properties of Transformer models.

Inductive Bias

Paper
Add Code

Confident Adaptive Language Modeling

no code implementations • 14 Jul 2022 • Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Q. Tran, Yi Tay, Donald Metzler

Recent advances in Transformer-based large language models (LLMs) have led to significant performance improvements across many tasks.

Language Modelling Text Generation

Paper
Add Code

Emergent Abilities of Large Language Models

no code implementations • 15 Jun 2022 • Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks.

Language Modelling

Paper
Add Code

UL2: Unifying Language Learning Paradigms

1 code implementation • 10 May 2022 • Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Siamak Shakeri, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, Donald Metzler

Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.

Ranked #1 on Long-range modeling on SCROLLS (CNLI metric)

Arithmetic Reasoning Common Sense Reasoning +11

32,798

Paper
Code

ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference

no code implementations • Findings (ACL) 2022 • Kai Hui, Honglei Zhuang, Tao Chen, Zhen Qin, Jing Lu, Dara Bahri, Ji Ma, Jai Prakash Gupta, Cicero Nogueira dos santos, Yi Tay, Don Metzler

This results in significant inference time speedups since the decoder-only architecture only needs to learn to interpret static encoder embeddings during inference.

Information Retrieval Language Modelling +2

Paper
Add Code

PaLM: Scaling Language Modeling with Pathways

5 code implementations • Google Research 2022 • Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel

To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.

Ranked #1 on Coreference Resolution on Winograd Schema Challenge

Auto Debugging Code Generation +17

975

Paper
Code

HyperPrompt: Prompt-based Task-Conditioning of Transformers

no code implementations • 1 Mar 2022 • Yun He, Huaixiu Steven Zheng, Yi Tay, Jai Gupta, Yu Du, Vamsi Aribandi, Zhe Zhao, Yaguang Li, Zhao Chen, Donald Metzler, Heng-Tze Cheng, Ed H. Chi

Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient way.

Computational Efficiency Multi-Task Learning +1

Paper
Add Code

A New Generation of Perspective API: Efficient Multilingual Character-level Transformers

no code implementations • 22 Feb 2022 • Alyssa Lees, Vinh Q. Tran, Yi Tay, Jeffrey Sorensen, Jai Gupta, Donald Metzler, Lucy Vasserman

As such, it is crucial to develop models that are effective across a diverse range of languages, usages, and styles.

Toxic Comment Classification

Paper
Add Code

Transformer Memory as a Differentiable Search Index

1 code implementation • 14 Feb 2022 • Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler

In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model.

Information Retrieval Retrieval

148

Paper
Code

Self-Instantiated Recurrent Units with Dynamic Soft Recursion

no code implementations • NeurIPS 2021 • Aston Zhang, Yi Tay, Yikang Shen, Alvin Chan Guo Wei, Shuai Zhang

On the other hand, the extent of the Self-IRU recursion is controlled by gates whose values are between 0 and 1 and may vary across the temporal dimension of sequences, enabling dynamic soft recursion depth at each time step.

Inductive Bias

Paper
Add Code

PolyViT: Co-training Vision Transformers on Images, Videos and Audio

no code implementations • 25 Nov 2021 • Valerii Likhosherstov, Anurag Arnab, Krzysztof Choromanski, Mario Lucic, Yi Tay, Adrian Weller, Mostafa Dehghani

Can we train a single transformer model capable of processing multiple modalities and datasets, whilst sharing almost all of its learnable parameters?

Audio Classification

Paper
Add Code

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

3 code implementations • ICLR 2022 • Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler

Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during pre-training.

Denoising Multi-Task Learning

5,899

Paper
Code

The Efficiency Misnomer

no code implementations • ICLR 2022 • Mostafa Dehghani, Anurag Arnab, Lucas Beyer, Ashish Vaswani, Yi Tay

We further present suggestions to improve reporting of efficiency metrics.

Paper
Add Code

SCENIC: A JAX Library for Computer Vision Research and Beyond

1 code implementation • CVPR 2022 • Mostafa Dehghani, Alexey Gritsenko, Anurag Arnab, Matthias Minderer, Yi Tay

Scenic is an open-source JAX library with a focus on Transformer-based models for computer vision research and beyond.

2,995

Paper
Code

Sharpness-Aware Minimization Improves Language Model Generalization

no code implementations • ACL 2022 • Dara Bahri, Hossein Mobahi, Yi Tay

The allure of superhuman-level capabilities has led to considerable interest in language models like GPT-3 and T5, wherein the research has, by and large, revolved around new model architectures, training tasks, and loss objectives, along with substantial engineering efforts to scale up model capacity and dataset size.

Language Modelling Natural Questions

Paper
Add Code

Improving Compositional Generalization with Self-Training for Data-to-Text Generation

1 code implementation • ACL 2022 • Sanket Vaibhav Mehta, Jinfeng Rao, Yi Tay, Mihir Kale, Ankur P. Parikh, Emma Strubell

Data-to-text generation focuses on generating fluent natural language responses from structured meaning representations (MRs).

Data-to-Text Generation

32,798

Paper
Code

Improving Neural Ranking via Lossless Knowledge Distillation

no code implementations • 30 Sep 2021 • Zhen Qin, Le Yan, Yi Tay, Honglei Zhuang, Xuanhui Wang, Michael Bendersky, Marc Najork

We explore a novel perspective of knowledge distillation (KD) for learning to rank (LTR), and introduce Self-Distilled neural Rankers (SDR), where student rankers are parameterized identically to their teachers.

Knowledge Distillation Learning-To-Rank

Paper
Add Code

Scale Efficiently: Insights from Pretraining and Finetuning Transformers

no code implementations • ICLR 2022 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.

Paper
Add Code

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

3 code implementations • 22 Sep 2021 • Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

32,804

Paper
Code

Are Pretrained Convolutions Better than Pretrained Transformers?

1 code implementation • ACL 2021 • Yi Tay, Mostafa Dehghani, Jai Prakash Gupta, Vamsi Aribandi, Dara Bahri, Zhen Qin, Donald Metzler

In the context of language models, are convolutional models competitive to Transformers when pre-trained?

32,804

Paper
Code

On Orthogonality Constraints for Transformers

no code implementations • ACL 2021 • Aston Zhang, Alvin Chan, Yi Tay, Jie Fu, Shuohang Wang, Shuai Zhang, Huajie Shao, Shuochao Yao, Roy Ka-Wei Lee

Orthogonality constraints encourage matrices to be orthogonal for numerical stability.

Dialogue Generation Machine Translation +1

Paper
Add Code

The Benchmark Lottery

no code implementations • 14 Jul 2021 • Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals

The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods.

Benchmarking BIG-bench Machine Learning +3

Paper
Add Code

SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption

no code implementations • ICLR 2022 • Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler

Self-supervised contrastive representation learning has proved incredibly successful in the vision and natural language domains, enabling state-of-the-art performance with orders of magnitude less labeled data.

Contrastive Learning Representation Learning +1

Paper
Add Code

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

2 code implementations • ICLR 2022 • Yi Tay, Vinh Q. Tran, Sebastian Ruder, Jai Gupta, Hyung Won Chung, Dara Bahri, Zhen Qin, Simon Baumgartner, Cong Yu, Donald Metzler

In this paper, we propose a new model inductive bias that learns a subword tokenization end-to-end as part of the model.

Ranked #3 on Paraphrase Identification on Quora Question Pairs

Inductive Bias Linguistic Acceptability +3

32,798

Paper
Code

Knowledge Router: Learning Disentangled Representations for Knowledge Graphs

no code implementations • NAACL 2021 • Shuai Zhang, Xi Rao, Yi Tay, Ce Zhang

To this end, this paper proposes to learn disentangled representations of KG entities - a new method that disentangles the inner latent properties of KG entities.

Knowledge Graphs Representation Learning

Paper
Add Code

How Reliable are Model Diagnostics?

no code implementations • Findings (ACL) 2021 • Vamsi Aribandi, Yi Tay, Donald Metzler

In the pursuit of a deeper understanding of a model's behaviour, there is recent impetus for developing suites of probes aimed at diagnosing models beyond simple metrics like accuracy or BLEU.

Paper
Add Code

Are Pre-trained Convolutions Better than Pre-trained Transformers?

1 code implementation • 7 May 2021 • Yi Tay, Mostafa Dehghani, Jai Gupta, Dara Bahri, Vamsi Aribandi, Zhen Qin, Donald Metzler

In the context of language models, are convolutional models competitive to Transformers when pre-trained?

32,798

Paper
Code

Rethinking Search: Making Domain Experts out of Dilettantes

no code implementations • 5 May 2021 • Donald Metzler, Yi Tay, Dara Bahri, Marc Najork

When experiencing an information need, users want to engage with a domain expert, but often turn to an information retrieval system, such as a search engine, instead.

Information Retrieval Question Answering +1

Paper
Add Code

OmniNet: Omnidirectional Representations from Transformers

1 code implementation • 1 Mar 2021 • Yi Tay, Mostafa Dehghani, Vamsi Aribandi, Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Donald Metzler

In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network.

Ranked #1 on Machine Translation on WMT2017 Russian-English

Few-Shot Learning Language Modelling +2

Paper
Code

Do Transformer Modifications Transfer Across Implementations and Applications?

1 code implementation • EMNLP 2021 • Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel

The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption.

32,804

Paper
Code

Switch Spaces: Learning Product Spaces with Sparse Gating

no code implementations • 17 Feb 2021 • Shuai Zhang, Yi Tay, Wenqi Jiang, Da-Cheng Juan, Ce Zhang

In order for learned representations to be effective and efficient, it is ideal that the geometric inductive bias aligns well with the underlying structure of the data.

Inductive Bias Knowledge Graph Completion +1

Paper
Add Code

Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters

3 code implementations • 17 Feb 2021 • Aston Zhang, Yi Tay, Shuai Zhang, Alvin Chan, Anh Tuan Luu, Siu Cheung Hui, Jie Fu

Recent works have demonstrated reasonable success of representation learning in hypercomplex space.

Machine Translation Natural Language Inference +4

Paper
Code

Label Smoothed Embedding Hypothesis for Out-of-Distribution Detection

no code implementations • 9 Feb 2021 • Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler

Detecting out-of-distribution (OOD) examples is critical in many applications.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

Paper
Add Code

Long Range Arena : A Benchmark for Efficient Transformers

no code implementations • ICLR 2021 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity.

16k Benchmarking

Paper
Add Code

HyperGrid Transformers: Towards A Single Model for Multiple Tasks

no code implementations • ICLR 2021 • Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

Specifically, we propose a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.

Multi-Task Learning Natural Language Understanding

Paper
Add Code

Parameterization of Hypercomplex Multiplications

no code implementations • ICLR 2021 • Aston Zhang, Yi Tay, Shuai Zhang, Alvin Chan, Anh Tuan Luu, Siu Hui, Jie Fu

Recent works have demonstrated reasonable success of representation learning in hypercomplex space.

Machine Translation Natural Language Inference +4

Paper
Add Code

Neural Rankers are hitherto Outperformed by Gradient Boosted Decision Trees

no code implementations • ICLR 2021 • Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork

We first validate this concern by showing that most recent neural LTR models are, by a large margin, inferior to the best publicly available Gradient Boosted Decision Trees (GBDT) in terms of their reported ranking accuracy on benchmark datasets.

Learning-To-Rank

Paper
Add Code

Synthesizer: Rethinking Self-Attention for Transformer Models

no code implementations • 1 Jan 2021 • Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

Language Modelling Machine Translation +2

Paper
Add Code

Recurrently Controlling a Recurrent Network with Recurrent Networks Controlled by More Recurrent Networks

no code implementations • 1 Jan 2021 • Yi Tay, Yikang Shen, Alvin Chan, Aston Zhang, Shuai Zhang

This paper explores an intriguing idea of recursively parameterizing recurrent nets.

Code Generation Inductive Bias +4

Paper
Add Code

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling

2 code implementations • ACL 2021 • Yikang Shen, Yi Tay, Che Zheng, Dara Bahri, Donald Metzler, Aaron Courville

There are two major classes of natural language grammar -- the dependency grammar that models one-to-one correspondences between words and the constituency grammar that models the assembly of one or several corresponded words.

Constituency Parsing Language Modelling +2

32,804

Paper
Code

Long Range Arena: A Benchmark for Efficient Transformers

5 code implementations • 8 Nov 2020 • Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.

Ranked #18 on Long-range modeling on LRA (Pathfinder metric)

16k Benchmarking +1

681

Paper
Code

Surprise: Result List Truncation via Extreme Value Theory

no code implementations • 19 Oct 2020 • Dara Bahri, Che Zheng, Yi Tay, Donald Metzler, Andrew Tomkins

Work in information retrieval has largely been centered around ranking and relevance: given a query, return some number of results ordered by relevance to the user.

Information Retrieval Retrieval +1

Paper
Add Code

Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder

2 code implementations • Findings of the Association for Computational Linguistics 2020 • Alvin Chan, Yi Tay, Yew-Soon Ong, Aston Zhang

This paper demonstrates a fatal vulnerability in natural language inference (NLI) and text classification systems.

General Classification Natural Language Inference +2

Paper
Code

Efficient Transformers: A Survey

no code implementations • 14 Sep 2020 • Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning.

Navigate reinforcement-learning +1

Paper
Add Code

Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

no code implementations • 17 Aug 2020 • Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Cliff Brunk, Andrew Tomkins

Large generative language models such as GPT-2 are well-known for their ability to generate text as well as their utility in supervised downstream tasks via fine-tuning.

Paper
Add Code

HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

no code implementations • 12 Jul 2020 • Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.

Multi-Task Learning Natural Language Understanding

Paper
Add Code

Would you Rather? A New Benchmark for Learning Machine Alignment with Cultural Values and Social Preferences

no code implementations • ACL 2020 • Yi Tay, Donovan Ong, Jie Fu, Alvin Chan, Nancy Chen, Anh Tuan Luu, Chris Pal

Understanding human preferences, along with cultural and social nuances, lives at the heart of natural language understanding.

Natural Language Inference Natural Language Understanding

Paper
Add Code

Synthesizer: Rethinking Self-Attention in Transformer Models

1 code implementation • 2 May 2020 • Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

Ranked #1 on Dialogue Generation on Persona-Chat (BLEU-1 metric, using extra training data)

Abstractive Text Summarization Dialogue Generation +6

Paper
Code

Choppy: Cut Transformer For Ranked List Truncation

no code implementations • 26 Apr 2020 • Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Andrew Tomkins

Work in information retrieval has traditionally focused on ranking and relevance: given a query, return some number of results ordered by relevance to the user.

Information Retrieval Retrieval

Paper
Add Code

Reverse Engineering Configurations of Neural Text Generation Models

no code implementations • ACL 2020 • Yi Tay, Dara Bahri, Che Zheng, Clifford Brunk, Donald Metzler, Andrew Tomkins

This paper seeks to develop a deeper understanding of the fundamental properties of neural text generations models.

Text Generation

Paper
Add Code

Sparse Sinkhorn Attention

1 code implementation • ICML 2020 • Yi Tay, Dara Bahri, Liu Yang, Donald Metzler, Da-Cheng Juan

We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend.

Document Classification Image Generation +2

247

Paper
Code

Multi-level Head-wise Match and Aggregation in Transformer for Textual Sequence Matching

no code implementations • 20 Jan 2020 • Shuohang Wang, Yunshi Lan, Yi Tay, Jing Jiang, Jingjing Liu

Transformer has been successfully applied to many natural language processing tasks.

QQP

Paper
Add Code

Metagross: Meta Gated Recursive Controller Units for Sequence Modeling

no code implementations • ICLR 2020 • Yi Tay, Yikang Shen, Alvin Chan, Yew Soon Ong

This paper proposes Metagross (Meta Gated Recursive Controller), a new neural sequence modeling unit.

Code Generation Inductive Bias +4

Paper
Add Code

Jacobian Adversarially Regularized Networks for Robustness

1 code implementation • ICLR 2020 • Alvin Chan, Yi Tay, Yew Soon Ong, Jie Fu

Adversarial examples are crafted with imperceptible perturbations with the intent to fool neural networks.

Paper
Code

What it Thinks is Important is Important: Robustness Transfers through Input Gradients

2 code implementations • CVPR 2020 • Alvin Chan, Yi Tay, Yew-Soon Ong

Learned weights of models robust to such perturbations are previously found to be transferable across different tasks but this applies only if the model architecture for the source and target tasks is the same.

Adversarial Robustness

Paper
Code

Compositional De-Attention Networks

no code implementations • NeurIPS 2019 • Yi Tay, Anh Tuan Luu, Aston Zhang, Shuohang Wang, Siu Cheung Hui

Attentional models are distinctly characterized by their ability to learn relative importance, i. e., assigning a different weight to input values.

Machine Translation Natural Language Inference +4

Paper
Add Code

Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling

no code implementations • IJCNLP 2019 • Jinfeng Rao, Linqing Liu, Yi Tay, Wei Yang, Peng Shi, Jimmy Lin

A core problem of information retrieval (IR) is relevance matching, which is to rank documents by relevance to a user{'}s query.

Information Retrieval Paraphrase Identification +3

Paper
Add Code

R2D2: Reuse & Reduce via Dynamic Weight Diffusion for Training Efficient NLP Models

no code implementations • 25 Sep 2019 • Yi Tay, Aston Zhang, Shuai Zhang, Alvin Chan, Luu Anh Tuan, Siu Cheung Hui

We propose R2D2 layers, a new neural block for training efficient NLP models.

Paper
Add Code

Interactive Machine Comprehension with Information Seeking Agents

1 code implementation • ACL 2020 • Xingdi Yuan, Jie Fu, Marc-Alexandre Cote, Yi Tay, Christopher Pal, Adam Trischler

Existing machine reading comprehension (MRC) models do not scale effectively to real-world applications like web-level information retrieval and question answering (QA).

Decision Making Information Retrieval +3

Paper
Code

Holographic Factorization Machines for Recommendation

1 code implementation • AAAI 2019 • Yi Tay, Shuai Zhang, Anh Tuan Luu, Siu Cheung Hui, Lina Yao, Tran Dang Quang Vinh

Factorization Machines (FMs) are a class of popular algorithms that have been widely adopted for collaborative filtering and recommendation tasks.

Collaborative Filtering Retrieval

780

Paper
Code

Robust Representation Learning of Biomedical Names

no code implementations • ACL 2019 • Minh C. Phan, Aixin Sun, Yi Tay

Moreover, our proposed method is also able to compute meaningful representations for unseen names, resulting in high practical utility in real-world applications.

Representation Learning Retrieval

Paper
Add Code

Confusionset-guided Pointer Networks for Chinese Spelling Check

no code implementations • ACL 2019 • Dingmin Wang, Yi Tay, Li Zhong

This paper proposes Confusionset-guided Pointer Networks for Chinese Spell Check (CSC) task.

Sentence

Paper
Add Code

Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks

1 code implementation • ACL 2019 • Yi Tay, Aston Zhang, Luu Anh Tuan, Jinfeng Rao, Shuai Zhang, Shuohang Wang, Jie Fu, Siu Cheung Hui

Many state-of-the-art neural models for NLP are heavily parameterized and thus memory inefficient.

Paper
Code

Quaternion Collaborative Filtering for Recommendation

no code implementations • 6 Jun 2019 • Shuai Zhang, Lina Yao, Lucas Vinh Tran, Aston Zhang, Yi Tay

All in all, we conduct extensive experiments on six real-world datasets, demonstrating the effectiveness of Quaternion algebra in recommender systems.

Collaborative Filtering Inductive Bias +2

Paper
Add Code

Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives

no code implementations • ACL 2019 • Yi Tay, Shuohang Wang, Luu Anh Tuan, Jie Fu, Minh C. Phan, Xingdi Yuan, Jinfeng Rao, Siu Cheung Hui, Aston Zhang

This paper tackles the problem of reading comprehension over long narratives where documents easily span over thousands of tokens.

Reading Comprehension

Paper
Add Code

DeepRec: An Open-source Toolkit for Deep Learning based Recommendation

4 code implementations • 25 May 2019 • Shuai Zhang, Yi Tay, Lina Yao, Bin Wu, Aixin Sun

In this toolkit, we have implemented a number of deep learning based recommendation algorithms using Python and the widely used deep learning package - Tensorflow.

Sequential Recommendation

7,345

Paper
Code

Quaternion Knowledge Graph Embeddings

1 code implementation • NeurIPS 2019 • Shuai Zhang, Yi Tay, Lina Yao, Qi Liu

In this work, we move beyond the traditional complex-valued representations, introducing more expressive hypercomplex representations to model entities and relations for knowledge graph embeddings.

Ranked #6 on Link Prediction on FB15k

Knowledge Graph Embedding Knowledge Graph Embeddings +1

103

Paper
Code

Recurrently Controlled Recurrent Networks

1 code implementation • NeurIPS 2018 • Yi Tay, Luu Anh Tuan, Siu Cheung Hui

Recurrent neural networks (RNNs) such as long short-term memory and gated recurrent units are pivotal building blocks across a broad spectrum of sequence modeling problems.

Answer Selection General Classification +2

Paper
Code

Holistic Multi-modal Memory Network for Movie Question Answering

no code implementations • 12 Nov 2018 • Anran Wang, Anh Tuan Luu, Chuan-Sheng Foo, Hongyuan Zhu, Yi Tay, Vijay Chandrasekhar

In this paper, we present the Holistic Multi-modal Memory Network (HMMN) framework which fully considers the interactions between different input sources (multi-modal context, question) in each hop.

Question Answering Retrieval +1

Paper
Add Code

Densely Connected Attention Propagation for Reading Comprehension

2 code implementations • NeurIPS 2018 • Yi Tay, Luu Anh Tuan, Siu Cheung Hui, Jian Su

Secondly, the dense connectors in our network are learned via attention instead of standard residual skip-connectors.

Ranked #2 on Question Answering on NewsQA

Open-Domain Question Answering Reading Comprehension

Paper
Code

Co-Stack Residual Affinity Networks with Multi-level Attention Refinement for Matching Text Sequences

no code implementations • EMNLP 2018 • Yi Tay, Luu Anh Tuan, Siu Cheung Hui

This task enables many potential applications such as question answering and paraphrase identification.

Paraphrase Identification Question Answering

Paper
Add Code

Multi-Granular Sequence Encoding via Dilated Compositional Units for Reading Comprehension

no code implementations • EMNLP 2018 • Yi Tay, Anh Tuan Luu, Siu Cheung Hui

Sequence encoders are crucial components in many neural architectures for learning to read and comprehend.

Ranked #7 on Question Answering on NarrativeQA

Open-Domain Question Answering Reading Comprehension

Paper
Add Code

Attentive Gated Lexicon Reader with Contrastive Contextual Co-Attention for Sentiment Classification

no code implementations • EMNLP 2018 • Yi Tay, Anh Tuan Luu, Siu Cheung Hui, Jian Su

This paper proposes a new neural architecture that exploits readily available sentiment lexicon resources.

General Classification Opinion Mining +4

Paper
Add Code

HyperML: A Boosting Metric Learning Approach in Hyperbolic Space for Recommender Systems

no code implementations • 5 Sep 2018 • Lucas Vinh Tran, Yi Tay, Shuai Zhang, Gao Cong, Xiao-Li Li

This paper investigates the notion of learning user and item representations in non-Euclidean space.

Ranked #1 on Recommendation Systems on MovieLens 20M (HR@10 metric)

Collaborative Filtering Metric Learning +2

Paper
Add Code

Next Item Recommendation with Self-Attention

no code implementations • 20 Aug 2018 • Shuai Zhang, Yi Tay, Lina Yao, Aixin Sun

In this paper, we propose a novel sequence-aware recommendation model.

Metric Learning

Paper
Add Code

Self-Attentive Neural Collaborative Filtering

no code implementations • 17 Jun 2018 • Yi Tay, Shuai Zhang, Luu Anh Tuan, Siu Cheung Hui

This paper has been withdrawn as we discovered a bug in our tensorflow implementation that involved accidental mixing of vectors across batches.

Collaborative Filtering

Paper
Add Code

Multi-Cast Attention Networks for Retrieval-based Question Answering and Response Prediction

no code implementations • 3 Jun 2018 • Yi Tay, Luu Anh Tuan, Siu Cheung Hui

Attention is typically used to select informative sub-phrases that are used for prediction.

Question Answering Representation Learning +2

Paper
Add Code

CoupleNet: Paying Attention to Couples with Coupled Attention for Relationship Recommendation

no code implementations • 29 May 2018 • Yi Tay, Anh Tuan Luu, Siu Cheung Hui

Our approach, the CoupleNet is an end-to-end deep learning based estimator that analyzes the social profiles of two users and subsequently performs a similarity match between the users.

Recommendation Systems

Paper
Add Code

Reasoning with Sarcasm by Reading In-between

no code implementations • ACL 2018 • Yi Tay, Luu Anh Tuan, Siu Cheung Hui, Jian Su

Sarcasm is a sophisticated speech act which commonly manifests on social communities such as Twitter and Reddit.

Sarcasm Detection

Paper
Add Code

Interact and Decide: Medley of Sub-Attention Networks for Effective Group Recommendation

no code implementations • 12 Apr 2018 • Lucas Vinh Tran, Tuan-Anh Nguyen Pham, Yi Tay, Yiding Liu, Gao Cong, Xiao-Li Li

Our proposed approach hinges upon the key intuition that the decision making process (in groups) is generally dynamic, i. e., a user's decision is highly dependent on the other group members.

Decision Making

Paper
Add Code

Multi-range Reasoning for Machine Comprehension

no code implementations • 24 Mar 2018 • Yi Tay, Luu Anh Tuan, Siu Cheung Hui

Similarly, we achieve competitive performance relative to AMANDA on the SearchQA benchmark and BiDAF on the NarrativeQA benchmark without using any LSTM/GRU layers.

Ranked #5 on Question Answering on RACE

Reading Comprehension

Paper
Add Code

Metric Factorization: Recommendation beyond Matrix Factorization

2 code implementations • 13 Feb 2018 • Shuai Zhang, Lina Yao, Yi Tay, Xiwei Xu, Xiang Zhang, Liming Zhu

In the past decade, matrix factorization has been extensively researched and has become one of the most popular techniques for personalized recommendations.

Paper
Code

Pair-Linking for Collective Entity Disambiguation: Two Could Be Better Than All

no code implementations • 4 Feb 2018 • Minh C. Phan, Aixin Sun, Yi Tay, Jialong Han, Chenliang Li

For the first time, we show that the semantic relationships between the mentioned entities are in fact less dense than expected.

Decision Making Entity Disambiguation

Paper
Add Code

Multi-Pointer Co-Attention Networks for Recommendation

2 code implementations • 28 Jan 2018 • Yi Tay, Luu Anh Tuan, Siu Cheung Hui

Our model operates on a multi-hierarchical paradigm and is based on the intuition that not all reviews are created equal, i. e., only a select few are important.

Recommendation Systems Representation Learning

151

Paper
Code

Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference

no code implementations • EMNLP 2018 • Yi Tay, Luu Anh Tuan, Siu Cheung Hui

Firstly, we introduce a new architecture where alignment pairs are compared, compressed and then propagated to upper layers for enhanced representation learning.

Ranked #7 on Natural Language Inference on SciTail

Natural Language Inference Representation Learning

Paper
Add Code

Learning to Attend via Word-Aspect Associative Fusion for Aspect-based Sentiment Analysis

1 code implementation • 14 Dec 2017 • Yi Tay, Anh Tuan Luu, Siu Cheung Hui

Our novel model, \textit{Aspect Fusion LSTM} (AF-LSTM) learns to attend based on associative relationships between sentence words and aspect which allows our model to adaptively focus on the correct words given an aspect term.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +1

Paper
Code

Cross Temporal Recurrent Networks for Ranking Question Answer Pairs

1 code implementation • 21 Nov 2017 • Yi Tay, Luu Anh Tuan, Siu Cheung Hui

This paper explores the idea of learning temporal gates for sequence pairs (question and answer), jointly influencing the learned representations in a pairwise manner.

Paper
Code

SkipFlow: Incorporating Neural Coherence Features for End-to-End Automatic Text Scoring

1 code implementation • 14 Nov 2017 • Yi Tay, Minh C. Phan, Luu Anh Tuan, Siu Cheung Hui

Our new method proposes a new \textsc{SkipFlow} mechanism that models relationships between snapshots of the hidden representations of a long short-term memory (LSTM) network as it reads.

Ranked #4 on Automated Essay Scoring on ASAP

Automated Essay Scoring Feature Engineering +1

Paper
Code

Multi-task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs

no code implementations • 16 Aug 2017 • Yi Tay, Luu Anh Tuan, Minh C. Phan, Siu Cheung Hui

Unfortunately, many state-of-the-art relational learning models ignore this information due to the challenging nature of dealing with non-discrete data types in the inherently binary-natured knowledge graphs.

Attribute Knowledge Graphs +3

Paper
Add Code

Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering

1 code implementation • 25 Jul 2017 • Yi Tay, Luu Anh Tuan, Siu Cheung Hui

The dominant neural architectures in question answer retrieval are based on recurrent or convolutional encoders configured with complex word matching layers.

Ranked #1 on Question Answering on SemEvalCQA