no code implementations • 26 May 2019 • Alex Lamb, Jonathan Binas, Anirudh Goyal, Sandeep Subramanian, Ioannis Mitliagkas, Denis Kazakov, Yoshua Bengio, Michael C. Mozer
Machine learning promises methods that generalize well from finite labeled data.
no code implementations • ICLR 2019 • Michael C. Mozer, Denis Kazakov, Robert V. Lindsey
Attractor dynamics are incorporated into the hidden state to `clean up' representations at each step of a sequence.
1 code implementation • 11 Oct 2017 • Michael C. Mozer, Denis Kazakov, Robert V. Lindsey
The CT-GRU arises by interpreting the gates of a GRU as selecting a time scale of memory, and the CT-GRU generalizes the GRU by incorporating multiple time scales of memory and performing context-dependent selection of time scales for information storage and retrieval.