1 code implementation • 13 Dec 2023 • Róbert Csordás, Piotr Piękos, Kazuki Irie, Jürgen Schmidhuber
The costly self-attention layers in modern Transformers require memory and compute quadratic in sequence length.
1 code implementation • 1 Dec 2023 • Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber
General-purpose learning systems should improve themselves in open-ended fashion in ever-changing environments.
1 code implementation • 24 Oct 2023 • Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber
Recent studies of the computational power of recurrent neural networks (RNNs) reveal a hierarchy of RNN architectures, given real-time and finite-precision assumptions.
2 code implementations • 16 Oct 2023 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
Unlike prior work that compares MoEs with dense baselines under the compute-equal condition, our evaluation condition is parameter-equal, which is crucial to properly evaluate LMs.
1 code implementation • 30 May 2023 • Kazuki Irie, Anand Gopalakrishnan, Jürgen Schmidhuber
To scale to such challenging tasks, we focus on certain well-known neural architectures with element-wise recurrence, allowing for tractable RTRL without approximation.
no code implementations • 26 May 2023 • Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, Jinjie Mai, Piotr Piękos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanić, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem, Jürgen Schmidhuber
What should be the social structure of an NLSOM?
1 code implementation • NeurIPS 2023 • Aleksandar Stanić, Anand Gopalakrishnan, Kazuki Irie, Jürgen Schmidhuber
Current state-of-the-art object-centric models use slots and attention-based routing for binding.
1 code implementation • 2 May 2023 • Kazuki Irie, Jürgen Schmidhuber
Few-shot learning with sequence-processing neural networks (NNs) has recently attracted a new wave of attention in the context of large language models.
1 code implementation • 15 Feb 2023 • Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber
Unsupervised learning of discrete representations from continuous ones in neural networks (NNs) is the cornerstone of several applications today.
no code implementations • 17 Nov 2022 • Kazuki Irie, Jürgen Schmidhuber
Short-term memory in standard, general-purpose, sequence-processing recurrent neural networks (RNNs) is stored as activations of nodes or "neurons."
1 code implementation • 12 Oct 2022 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
While the original CTL is used to test length generalization or productivity, CTL++ is designed to test systematicity of NNs, that is, their capability to generalize to unseen compositions of known functions.
1 code implementation • 7 Oct 2022 • Kazuki Irie, Jürgen Schmidhuber
Work on fast weight programmers has demonstrated the effectiveness of key/value outer product-based learning rules for sequentially generating a weight matrix (WM) of a neural net (NN) by another NN or itself.
2 code implementations • 3 Jun 2022 • Kazuki Irie, Francesco Faccio, Jürgen Schmidhuber
Neural ordinary differential equations (ODEs) have attracted much attention as continuous-time counterparts of deep residual neural networks (NNs), and numerous extensions for recurrent NNs have been proposed.
1 code implementation • 25 Mar 2022 • Anand Gopalakrishnan, Kazuki Irie, Jürgen Schmidhuber, Sjoerd van Steenkiste
The discovery of reusable sub-routines simplifies decision-making and planning in complex reinforcement learning problems.
1 code implementation • 11 Feb 2022 • Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber
Linear layers in neural networks (NNs) trained by gradient descent can be expressed as a key-value memory system which stores all training datapoints and the initial weights, and produces outputs using unnormalised dot attention over the entire training experience.
2 code implementations • 11 Feb 2022 • Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber
The weight matrix (WM) of a neural network (NN) is its program.
1 code implementation • ICLR Workshop Neural_Compression 2021 • Kazuki Irie, Jürgen Schmidhuber
The inputs and/or outputs of some neural nets are weight matrices of other neural nets.
1 code implementation • 31 Dec 2021 • Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber
We share our experience with the recently released WILDS benchmark, a collection of ten datasets dedicated to developing models and training strategies which are robust to domain shifts.
1 code implementation • 14 Oct 2021 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
Despite progress across a broad range of applications, Transformers have limited success in systematic generalization.
no code implementations • NeurIPS Workshop AIPLANS 2021 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
Despite successes across a broad range of applications, Transformers have limited capability in systematic generalization.
no code implementations • ICLR 2022 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
Despite successes across a broad range of applications, Transformers have limited capability in systematic generalization.
1 code implementation • EMNLP 2021 • Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
Our models improve accuracy from 50% to 85% on the PCFG productivity split, and from 35% to 81% on COGS.
5 code implementations • NeurIPS 2021 • Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber
Transformers with linearised attention (''linear Transformers'') have demonstrated the practical scalability and effectiveness of outer product-based Fast Weight Programmers (FWPs) from the '90s.
9 code implementations • 22 Feb 2021 • Imanol Schlag, Kazuki Irie, Jürgen Schmidhuber
We show the formal equivalence of linearised self-attention mechanisms and fast weight controllers from the early '90s, where a ``slow" neural net learns by gradient descent to program the ``fast weights" of another net through sequences of elementary programming instructions which are additive outer products of self-invented activation patterns (today called keys and values).
no code implementations • 2 Apr 2020 • Wei Zhou, Wilfried Michel, Kazuki Irie, Markus Kitza, Ralf Schlüter, Hermann Ney
We present a complete training pipeline to build a state-of-the-art hybrid HMM-based ASR system on the 2nd release of the TED-LIUM corpus.
no code implementations • 10 May 2019 • Kazuki Irie, Albert Zeyer, Ralf Schlüter, Hermann Ney
We explore deep autoregressive Transformer models in language modeling for speech recognition.
2 code implementations • 8 May 2019 • Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney
To the best knowledge of the authors, the results obtained when training on the full LibriSpeech training set, are the best published currently, both for the hybrid DNN/HMM and the attention-based systems.
Ranked #24 on Speech Recognition on LibriSpeech test-other
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
2 code implementations • 21 Feb 2019 • Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.
3 code implementations • 5 Feb 2019 • Kazuki Irie, Rohit Prabhavalkar, Anjuli Kannan, Antoine Bruguier, David Rybach, Patrick Nguyen
We also investigate model complementarity: we find that we can improve WERs by up to 9% relative by rescoring N-best lists generated from a strong word-piece based baseline with either the phoneme or the grapheme model.
Ranked #42 on Speech Recognition on LibriSpeech test-clean
Language Modelling Sequence-To-Sequence Speech Recognition +1
14 code implementations • 8 May 2018 • Albert Zeyer, Kazuki Irie, Ralf Schlüter, Hermann Ney
Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition.
Ranked #43 on Speech Recognition on LibriSpeech test-clean (using extra training data)