1 code implementation • ICCV 2015 • Zichao Yang, Marcin Moczulski, Misha Denil, Nando de Freitas, Alex Smola, Le Song, Ziyu Wang
The fully connected layers of a deep convolutional neural network typically contain over 90% of the network parameters, and consume the majority of the memory required to store the network parameters.
Ranked #54 on Image Classification on MNIST
no code implementations • 23 Dec 2014 • Caglar Gulcehre, Marcin Moczulski, Yoshua Bengio
The convergence of SGD depends on the careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients.
2 code implementations • 18 Nov 2015 • Marcin Moczulski, Misha Denil, Jeremy Appleyard, Nando de Freitas
Finally, this paper also provides a connection between structured linear transforms used in deep learning and the field of Fourier optics, illustrating how ACDC could in principle be implemented with lenses and diffractive elements.
no code implementations • 19 Nov 2015 • Marcin Moczulski, Kelvin Xu, Aaron Courville, Kyunghyun Cho
Recently there has been growing interest in building active visual object recognizers, as opposed to the usual passive recognizers which classifies a given static image into a predefined set of object categories.
1 code implementation • 1 Mar 2016 • Caglar Gulcehre, Marcin Moczulski, Misha Denil, Yoshua Bengio
Common nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only).
no code implementations • 17 Aug 2016 • Caglar Gulcehre, Marcin Moczulski, Francesco Visin, Yoshua Bengio
The optimization of deep neural networks can be more challenging than traditional convex optimization problems due to the highly non-convex nature of the loss function, e. g. it can involve pathological landscapes such as saddle-surfaces that can be difficult to escape for algorithms based on simple gradient descent.
1 code implementation • 2 Mar 2017 • Caglar Gulcehre, Jose Sotelo, Marcin Moczulski, Yoshua Bengio
The information about the element-wise curvature of the loss function is estimated from the local statistics of the stochastic first order gradients.
no code implementations • ICLR 2019 • Jongwook Choi, Yijie Guo, Marcin Moczulski, Junhyuk Oh, Neal Wu, Mohammad Norouzi, Honglak Lee
This paper investigates whether learning contingency-awareness and controllable aspects of an environment can lead to better exploration in reinforcement learning.
Ranked #8 on Atari Games on Atari 2600 Montezuma's Revenge
no code implementations • NeurIPS 2020 • Yijie Guo, Jongwook Choi, Marcin Moczulski, Shengyu Feng, Samy Bengio, Mohammad Norouzi, Honglak Lee
Reinforcement learning with sparse rewards is challenging because an agent can rarely obtain non-zero rewards and hence, gradient-based optimization of parameterized policies can be incremental and slow.
no code implementations • 25 Sep 2019 • Yijie Guo, Jongwook Choi, Marcin Moczulski, Samy Bengio, Mohammad Norouzi, Honglak Lee
We propose a new method of learning a trajectory-conditioned policy to imitate diverse trajectories from the agent's own past experiences and show that such self-imitation helps avoid myopic behavior and increases the chance of finding a globally optimal solution for hard-exploration tasks, especially when there are misleading rewards.