1 code implementation • 11 Oct 2017 • Michael C. Mozer, Denis Kazakov, Robert V. Lindsey
The CT-GRU arises by interpreting the gates of a GRU as selecting a time scale of memory, and the CT-GRU generalizes the GRU by incorporating multiple time scales of memory and performing context-dependent selection of time scales for information storage and retrieval.
no code implementations • ICLR 2019 • Michael C. Mozer, Denis Kazakov, Robert V. Lindsey
Attractor dynamics are incorporated into the hidden state to `clean up' representations at each step of a sequence.
no code implementations • 26 May 2019 • Alex Lamb, Jonathan Binas, Anirudh Goyal, Sandeep Subramanian, Ioannis Mitliagkas, Denis Kazakov, Yoshua Bengio, Michael C. Mozer
Machine learning promises methods that generalize well from finite labeled data.