Search Results for author: Yi Tay

Found 114 papers, 47 papers with code

PaLM 2 Technical Report

1 code implementation17 May 2023 Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vlad Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, Guy Gur-Ari, Steven Hand, Hadi Hashemi, Le Hou, Joshua Howland, Andrea Hu, Jeffrey Hui, Jeremy Hurwitz, Michael Isard, Abe Ittycheriah, Matthew Jagielski, Wenhao Jia, Kathleen Kenealy, Maxim Krikun, Sneha Kudugunta, Chang Lan, Katherine Lee, Benjamin Lee, Eric Li, Music Li, Wei Li, Yaguang Li, Jian Li, Hyeontaek Lim, Hanzhao Lin, Zhongtao Liu, Frederick Liu, Marcello Maggioni, Aroma Mahendru, Joshua Maynez, Vedant Misra, Maysam Moussalem, Zachary Nado, John Nham, Eric Ni, Andrew Nystrom, Alicia Parrish, Marie Pellat, Martin Polacek, Alex Polozov, Reiner Pope, Siyuan Qiao, Emily Reif, Bryan Richter, Parker Riley, Alex Castro Ros, Aurko Roy, Brennan Saeta, Rajkumar Samuel, Renee Shelby, Ambrose Slone, Daniel Smilkov, David R. So, Daniel Sohn, Simon Tokumine, Dasha Valter, Vijay Vasudevan, Kiran Vodrahalli, Xuezhi Wang, Pidong Wang, ZiRui Wang, Tao Wang, John Wieting, Yuhuai Wu, Kelvin Xu, Yunhan Xu, Linting Xue, Pengcheng Yin, Jiahui Yu, Qiao Zhang, Steven Zheng, Ce Zheng, Weikang Zhou, Denny Zhou, Slav Petrov, Yonghui Wu

Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM.

Code Generation Common Sense Reasoning +6

Symbol tuning improves in-context learning in language models

no code implementations15 May 2023 Jerry Wei, Le Hou, Andrew Lampinen, Xiangning Chen, Da Huang, Yi Tay, Xinyun Chen, Yifeng Lu, Denny Zhou, Tengyu Ma, Quoc V. Le

We present symbol tuning - finetuning language models on in-context input-label pairs where natural language labels (e. g., "positive/negative sentiment") are replaced with arbitrary symbols (e. g., "foo/bar").

In-Context Learning

UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining

1 code implementation18 Apr 2023 Hyung Won Chung, Noah Constant, Xavier Garcia, Adam Roberts, Yi Tay, Sharan Narang, Orhan Firat

As part of our contribution, we release: (i) an improved and refreshed mC4 multilingual corpus consisting of 29 trillion characters across 107 languages, and (ii) a suite of pretrained umT5 model checkpoints trained with UniMax sampling.

CoLT5: Faster Long-Range Transformers with Conditional Computation

no code implementations17 Mar 2023 Joshua Ainslie, Tao Lei, Michiel de Jong, Santiago Ontañón, Siddhartha Brahma, Yury Zemlyanskiy, David Uthus, Mandy Guo, James Lee-Thorp, Yi Tay, Yun-Hsuan Sung, Sumit Sanghai

Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token.

Long-range modeling

Larger language models do in-context learning differently

no code implementations7 Mar 2023 Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma

We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e. g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task.

In-Context Learning

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

1 code implementation31 Jan 2023 Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, Adam Roberts

We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022).

DSI++: Updating Transformer Memory with New Documents

no code implementations19 Dec 2022 Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Jinfeng Rao, Marc Najork, Emma Strubell, Donald Metzler

In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents.

Continual Learning Natural Questions +1

Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search Classification

no code implementations16 Dec 2022 Jai Gupta, Yi Tay, Chaitanya Kamath, Vinh Q. Tran, Donald Metzler, Shailesh Bavadekar, Mimi Sun, Evgeniy Gabrilovich

With the devastating outbreak of COVID-19, vaccines are one of the crucial lines of defense against mass infection in this global pandemic.

Natural Language Understanding

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

1 code implementation9 Dec 2022 Aran Komatsuzaki, Joan Puigcerver, James Lee-Thorp, Carlos Riquelme Ruiz, Basil Mustafa, Joshua Ainslie, Yi Tay, Mostafa Dehghani, Neil Houlsby

In this work, we propose sparse upcycling -- a simple way to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint.

Inverse scaling can become U-shaped

no code implementations3 Nov 2022 Jason Wei, Najoung Kim, Yi Tay, Quoc V. Le

The Inverse Scaling Prize (McKenzie et al. 2022) identified eleven such inverse scaling tasks, evaluated on models of up to 280B parameters and up to 500 zettaFLOPs of training compute.


Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

2 code implementations17 Oct 2022 Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei

BIG-Bench (Srivastava et al., 2022) is a diverse evaluation suite that focuses on tasks believed to be beyond the capabilities of current language models.

Language Modelling

Language Models are Multilingual Chain-of-Thought Reasoners

4 code implementations6 Oct 2022 Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei

Finally, we show that the multilingual reasoning abilities of language models extend to other tasks such as commonsense reasoning and word-in-context semantic judgment.

GSM8K Math

Recitation-Augmented Language Models

1 code implementation4 Oct 2022 Zhiqing Sun, Xuezhi Wang, Yi Tay, Yiming Yang, Denny Zhou

We propose a new paradigm to help Large Language Models (LLMs) generate more accurate factual knowledge without retrieving from an external corpus, called RECITation-augmented gEneration (RECITE).

Natural Questions Question Answering +2

Confident Adaptive Language Modeling

no code implementations14 Jul 2022 Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Q. Tran, Yi Tay, Donald Metzler

Recent advances in Transformer-based large language models (LLMs) have led to significant performance improvements across many tasks.

Language Modelling Text Generation

UL2: Unifying Language Learning Paradigms

2 code implementations10 May 2022 Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Siamak Shakeri, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, Donald Metzler

Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.

 Ranked #1 on Long-range modeling on SCROLLS (CNLI metric)

Arithmetic Reasoning Common Sense Reasoning +11

Transformer Memory as a Differentiable Search Index

1 code implementation14 Feb 2022 Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler

In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model.

Information Retrieval Retrieval

Self-Instantiated Recurrent Units with Dynamic Soft Recursion

no code implementations NeurIPS 2021 Aston Zhang, Yi Tay, Yikang Shen, Alvin Chan Guo Wei, Shuai Zhang

On the other hand, the extent of the Self-IRU recursion is controlled by gates whose values are between 0 and 1 and may vary across the temporal dimension of sequences, enabling dynamic soft recursion depth at each time step.

Inductive Bias

PolyViT: Co-training Vision Transformers on Images, Videos and Audio

no code implementations25 Nov 2021 Valerii Likhosherstov, Anurag Arnab, Krzysztof Choromanski, Mario Lucic, Yi Tay, Adrian Weller, Mostafa Dehghani

Can we train a single transformer model capable of processing multiple modalities and datasets, whilst sharing almost all of its learnable parameters?

Audio Classification

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

3 code implementations ICLR 2022 Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler

Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during pre-training.

Denoising Multi-Task Learning

The Efficiency Misnomer

no code implementations ICLR 2022 Mostafa Dehghani, Anurag Arnab, Lucas Beyer, Ashish Vaswani, Yi Tay

We further present suggestions to improve reporting of efficiency metrics.

SCENIC: A JAX Library for Computer Vision Research and Beyond

1 code implementation CVPR 2022 Mostafa Dehghani, Alexey Gritsenko, Anurag Arnab, Matthias Minderer, Yi Tay

Scenic is an open-source JAX library with a focus on Transformer-based models for computer vision research and beyond.

Sharpness-Aware Minimization Improves Language Model Generalization

no code implementations ACL 2022 Dara Bahri, Hossein Mobahi, Yi Tay

The allure of superhuman-level capabilities has led to considerable interest in language models like GPT-3 and T5, wherein the research has, by and large, revolved around new model architectures, training tasks, and loss objectives, along with substantial engineering efforts to scale up model capacity and dataset size.

Language Modelling Natural Questions

Improving Neural Ranking via Lossless Knowledge Distillation

no code implementations30 Sep 2021 Zhen Qin, Le Yan, Yi Tay, Honglei Zhuang, Xuanhui Wang, Michael Bendersky, Marc Najork

We explore a novel perspective of knowledge distillation (KD) for learning to rank (LTR), and introduce Self-Distilled neural Rankers (SDR), where student rankers are parameterized identically to their teachers.

Knowledge Distillation Learning-To-Rank

Scale Efficiently: Insights from Pretraining and Finetuning Transformers

no code implementations ICLR 2022 Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

3 code implementations22 Sep 2021 Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.

Are Pretrained Convolutions Better than Pretrained Transformers?

1 code implementation ACL 2021 Yi Tay, Mostafa Dehghani, Jai Prakash Gupta, Vamsi Aribandi, Dara Bahri, Zhen Qin, Donald Metzler

In the context of language models, are convolutional models competitive to Transformers when pre-trained?

The Benchmark Lottery

no code implementations14 Jul 2021 Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals

The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods.

Benchmarking BIG-bench Machine Learning +3

SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption

1 code implementation ICLR 2022 Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler

Self-supervised contrastive representation learning has proved incredibly successful in the vision and natural language domains, enabling state-of-the-art performance with orders of magnitude less labeled data.

Contrastive Learning Representation Learning +1

Knowledge Router: Learning Disentangled Representations for Knowledge Graphs

no code implementations NAACL 2021 Shuai Zhang, Xi Rao, Yi Tay, Ce Zhang

To this end, this paper proposes to learn disentangled representations of KG entities - a new method that disentangles the inner latent properties of KG entities.

Knowledge Graphs Representation Learning

How Reliable are Model Diagnostics?

no code implementations Findings (ACL) 2021 Vamsi Aribandi, Yi Tay, Donald Metzler

In the pursuit of a deeper understanding of a model's behaviour, there is recent impetus for developing suites of probes aimed at diagnosing models beyond simple metrics like accuracy or BLEU.

Are Pre-trained Convolutions Better than Pre-trained Transformers?

1 code implementation7 May 2021 Yi Tay, Mostafa Dehghani, Jai Gupta, Dara Bahri, Vamsi Aribandi, Zhen Qin, Donald Metzler

In the context of language models, are convolutional models competitive to Transformers when pre-trained?

Rethinking Search: Making Domain Experts out of Dilettantes

no code implementations5 May 2021 Donald Metzler, Yi Tay, Dara Bahri, Marc Najork

When experiencing an information need, users want to engage with a domain expert, but often turn to an information retrieval system, such as a search engine, instead.

Information Retrieval Question Answering +1

OmniNet: Omnidirectional Representations from Transformers

1 code implementation1 Mar 2021 Yi Tay, Mostafa Dehghani, Vamsi Aribandi, Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Donald Metzler

In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network.

de-en Few-Shot Learning +3

Do Transformer Modifications Transfer Across Implementations and Applications?

1 code implementation EMNLP 2021 Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel

The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption.

Switch Spaces: Learning Product Spaces with Sparse Gating

no code implementations17 Feb 2021 Shuai Zhang, Yi Tay, Wenqi Jiang, Da-Cheng Juan, Ce Zhang

In order for learned representations to be effective and efficient, it is ideal that the geometric inductive bias aligns well with the underlying structure of the data.

Inductive Bias Knowledge Graph Completion +1

Neural Rankers are hitherto Outperformed by Gradient Boosted Decision Trees

no code implementations ICLR 2021 Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork

We first validate this concern by showing that most recent neural LTR models are, by a large margin, inferior to the best publicly available Gradient Boosted Decision Trees (GBDT) in terms of their reported ranking accuracy on benchmark datasets.


Synthesizer: Rethinking Self-Attention for Transformer Models

no code implementations1 Jan 2021 Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

Language Modelling Machine Translation +2

HyperGrid Transformers: Towards A Single Model for Multiple Tasks

no code implementations ICLR 2021 Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

Specifically, we propose a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.

Multi-Task Learning Natural Language Understanding

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling

2 code implementations ACL 2021 Yikang Shen, Yi Tay, Che Zheng, Dara Bahri, Donald Metzler, Aaron Courville

There are two major classes of natural language grammar -- the dependency grammar that models one-to-one correspondences between words and the constituency grammar that models the assembly of one or several corresponded words.

Constituency Parsing Language Modelling +2

Long Range Arena: A Benchmark for Efficient Transformers

5 code implementations8 Nov 2020 Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.

Ranked #18 on Long-range modeling on LRA (Pathfinder metric)

16k Benchmarking +1

Surprise: Result List Truncation via Extreme Value Theory

no code implementations19 Oct 2020 Dara Bahri, Che Zheng, Yi Tay, Donald Metzler, Andrew Tomkins

Work in information retrieval has largely been centered around ranking and relevance: given a query, return some number of results ordered by relevance to the user.

Information Retrieval Retrieval +1

Efficient Transformers: A Survey

no code implementations14 Sep 2020 Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning.

Navigate reinforcement-learning +1

Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

no code implementations17 Aug 2020 Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Cliff Brunk, Andrew Tomkins

Large generative language models such as GPT-2 are well-known for their ability to generate text as well as their utility in supervised downstream tasks via fine-tuning.

HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

no code implementations12 Jul 2020 Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.

Multi-Task Learning Natural Language Understanding

Synthesizer: Rethinking Self-Attention in Transformer Models

1 code implementation2 May 2020 Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

 Ranked #1 on Dialogue Generation on Persona-Chat (BLEU-1 metric, using extra training data)

Abstractive Text Summarization Dialogue Generation +6

Choppy: Cut Transformer For Ranked List Truncation

no code implementations26 Apr 2020 Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Andrew Tomkins

Work in information retrieval has traditionally focused on ranking and relevance: given a query, return some number of results ordered by relevance to the user.

Information Retrieval Retrieval

Reverse Engineering Configurations of Neural Text Generation Models

no code implementations ACL 2020 Yi Tay, Dara Bahri, Che Zheng, Clifford Brunk, Donald Metzler, Andrew Tomkins

This paper seeks to develop a deeper understanding of the fundamental properties of neural text generations models.

Text Generation

Sparse Sinkhorn Attention

1 code implementation ICML 2020 Yi Tay, Dara Bahri, Liu Yang, Donald Metzler, Da-Cheng Juan

We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend.

Document Classification Image Generation +2

Jacobian Adversarially Regularized Networks for Robustness

1 code implementation ICLR 2020 Alvin Chan, Yi Tay, Yew Soon Ong, Jie Fu

Adversarial examples are crafted with imperceptible perturbations with the intent to fool neural networks.

What it Thinks is Important is Important: Robustness Transfers through Input Gradients

2 code implementations CVPR 2020 Alvin Chan, Yi Tay, Yew-Soon Ong

Learned weights of models robust to such perturbations are previously found to be transferable across different tasks but this applies only if the model architecture for the source and target tasks is the same.

Adversarial Robustness

Compositional De-Attention Networks

no code implementations NeurIPS 2019 Yi Tay, Anh Tuan Luu, Aston Zhang, Shuohang Wang, Siu Cheung Hui

Attentional models are distinctly characterized by their ability to learn relative importance, i. e., assigning a different weight to input values.

Machine Translation Natural Language Inference +4

Interactive Machine Comprehension with Information Seeking Agents

1 code implementation ACL 2020 Xingdi Yuan, Jie Fu, Marc-Alexandre Cote, Yi Tay, Christopher Pal, Adam Trischler

Existing machine reading comprehension (MRC) models do not scale effectively to real-world applications like web-level information retrieval and question answering (QA).

Decision Making Information Retrieval +3

Holographic Factorization Machines for Recommendation

1 code implementation AAAI 2019 Yi Tay, Shuai Zhang, Anh Tuan Luu, Siu Cheung Hui, Lina Yao, Tran Dang Quang Vinh

Factorization Machines (FMs) are a class of popular algorithms that have been widely adopted for collaborative filtering and recommendation tasks.

Collaborative Filtering Retrieval

Robust Representation Learning of Biomedical Names

no code implementations ACL 2019 Minh C. Phan, Aixin Sun, Yi Tay

Moreover, our proposed method is also able to compute meaningful representations for unseen names, resulting in high practical utility in real-world applications.

Representation Learning Retrieval

Confusionset-guided Pointer Networks for Chinese Spelling Check

no code implementations ACL 2019 Dingmin Wang, Yi Tay, Li Zhong

This paper proposes Confusionset-guided Pointer Networks for Chinese Spell Check (CSC) task.


Quaternion Collaborative Filtering for Recommendation

no code implementations6 Jun 2019 Shuai Zhang, Lina Yao, Lucas Vinh Tran, Aston Zhang, Yi Tay

All in all, we conduct extensive experiments on six real-world datasets, demonstrating the effectiveness of Quaternion algebra in recommender systems.

Collaborative Filtering Inductive Bias +2

DeepRec: An Open-source Toolkit for Deep Learning based Recommendation

4 code implementations25 May 2019 Shuai Zhang, Yi Tay, Lina Yao, Bin Wu, Aixin Sun

In this toolkit, we have implemented a number of deep learning based recommendation algorithms using Python and the widely used deep learning package - Tensorflow.

Sequential Recommendation

Quaternion Knowledge Graph Embeddings

1 code implementation NeurIPS 2019 Shuai Zhang, Yi Tay, Lina Yao, Qi Liu

In this work, we move beyond the traditional complex-valued representations, introducing more expressive hypercomplex representations to model entities and relations for knowledge graph embeddings.

Knowledge Graph Embedding Knowledge Graph Embeddings +1

Recurrently Controlled Recurrent Networks

1 code implementation NeurIPS 2018 Yi Tay, Luu Anh Tuan, Siu Cheung Hui

Recurrent neural networks (RNNs) such as long short-term memory and gated recurrent units are pivotal building blocks across a broad spectrum of sequence modeling problems.

Answer Selection General Classification +2

Holistic Multi-modal Memory Network for Movie Question Answering

no code implementations12 Nov 2018 Anran Wang, Anh Tuan Luu, Chuan-Sheng Foo, Hongyuan Zhu, Yi Tay, Vijay Chandrasekhar

In this paper, we present the Holistic Multi-modal Memory Network (HMMN) framework which fully considers the interactions between different input sources (multi-modal context, question) in each hop.

Question Answering Retrieval +1

Next Item Recommendation with Self-Attention

no code implementations20 Aug 2018 Shuai Zhang, Yi Tay, Lina Yao, Aixin Sun

In this paper, we propose a novel sequence-aware recommendation model.

Metric Learning

Self-Attentive Neural Collaborative Filtering

no code implementations17 Jun 2018 Yi Tay, Shuai Zhang, Luu Anh Tuan, Siu Cheung Hui

This paper has been withdrawn as we discovered a bug in our tensorflow implementation that involved accidental mixing of vectors across batches.

Collaborative Filtering

CoupleNet: Paying Attention to Couples with Coupled Attention for Relationship Recommendation

no code implementations29 May 2018 Yi Tay, Anh Tuan Luu, Siu Cheung Hui

Our approach, the CoupleNet is an end-to-end deep learning based estimator that analyzes the social profiles of two users and subsequently performs a similarity match between the users.

Recommendation Systems

Reasoning with Sarcasm by Reading In-between

no code implementations ACL 2018 Yi Tay, Luu Anh Tuan, Siu Cheung Hui, Jian Su

Sarcasm is a sophisticated speech act which commonly manifests on social communities such as Twitter and Reddit.

Sarcasm Detection

Interact and Decide: Medley of Sub-Attention Networks for Effective Group Recommendation

no code implementations12 Apr 2018 Lucas Vinh Tran, Tuan-Anh Nguyen Pham, Yi Tay, Yiding Liu, Gao Cong, Xiao-Li Li

Our proposed approach hinges upon the key intuition that the decision making process (in groups) is generally dynamic, i. e., a user's decision is highly dependent on the other group members.

Decision Making

Multi-range Reasoning for Machine Comprehension

no code implementations24 Mar 2018 Yi Tay, Luu Anh Tuan, Siu Cheung Hui

Similarly, we achieve competitive performance relative to AMANDA on the SearchQA benchmark and BiDAF on the NarrativeQA benchmark without using any LSTM/GRU layers.

Reading Comprehension

Metric Factorization: Recommendation beyond Matrix Factorization

2 code implementations13 Feb 2018 Shuai Zhang, Lina Yao, Yi Tay, Xiwei Xu, Xiang Zhang, Liming Zhu

In the past decade, matrix factorization has been extensively researched and has become one of the most popular techniques for personalized recommendations.

Pair-Linking for Collective Entity Disambiguation: Two Could Be Better Than All

no code implementations4 Feb 2018 Minh C. Phan, Aixin Sun, Yi Tay, Jialong Han, Chenliang Li

For the first time, we show that the semantic relationships between the mentioned entities are in fact less dense than expected.

Decision Making Entity Disambiguation

Multi-Pointer Co-Attention Networks for Recommendation

2 code implementations28 Jan 2018 Yi Tay, Luu Anh Tuan, Siu Cheung Hui

Our model operates on a multi-hierarchical paradigm and is based on the intuition that not all reviews are created equal, i. e., only a select few are important.

Recommendation Systems Representation Learning

Learning to Attend via Word-Aspect Associative Fusion for Aspect-based Sentiment Analysis

no code implementations14 Dec 2017 Yi Tay, Anh Tuan Luu, Siu Cheung Hui

Our novel model, \textit{Aspect Fusion LSTM} (AF-LSTM) learns to attend based on associative relationships between sentence words and aspect which allows our model to adaptively focus on the correct words given an aspect term.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +1

Cross Temporal Recurrent Networks for Ranking Question Answer Pairs

1 code implementation21 Nov 2017 Yi Tay, Luu Anh Tuan, Siu Cheung Hui

This paper explores the idea of learning temporal gates for sequence pairs (question and answer), jointly influencing the learned representations in a pairwise manner.

SkipFlow: Incorporating Neural Coherence Features for End-to-End Automatic Text Scoring

1 code implementation14 Nov 2017 Yi Tay, Minh C. Phan, Luu Anh Tuan, Siu Cheung Hui

Our new method proposes a new \textsc{SkipFlow} mechanism that models relationships between snapshots of the hidden representations of a long short-term memory (LSTM) network as it reads.

Automated Essay Scoring Feature Engineering +1

Multi-task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs

no code implementations16 Aug 2017 Yi Tay, Luu Anh Tuan, Minh C. Phan, Siu Cheung Hui

Unfortunately, many state-of-the-art relational learning models ignore this information due to the challenging nature of dealing with non-discrete data types in the inherently binary-natured knowledge graphs.

Attribute Knowledge Graphs +3

Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering

1 code implementation25 Jul 2017 Yi Tay, Luu Anh Tuan, Siu Cheung Hui

The dominant neural architectures in question answer retrieval are based on recurrent or convolutional encoders configured with complex word matching layers.

Efficient Neural Network Feature Engineering +3

Deep Learning based Recommender System: A Survey and New Perspectives

8 code implementations24 Jul 2017 Shuai Zhang, Lina Yao, Aixin Sun, Yi Tay

This article aims to provide a comprehensive review of recent research efforts on deep learning based recommender systems.

Information Retrieval Recommendation Systems +1

Latent Relational Metric Learning via Memory-based Attention for Collaborative Ranking

1 code implementation17 Jul 2017 Yi Tay, Anh Tuan Luu, Siu Cheung Hui

Our model, LRML (\textit{Latent Relational Metric Learning}) is a novel metric learning approach for recommendation.

 Ranked #1 on Recommendation Systems on Netflix (nDCG@10 metric)

Attribute Collaborative Ranking +2

Cannot find the paper you are looking for? You can Submit a new open access paper.