3 code implementations • 13 Oct 2021 • Shlomi Hod, Daniel Filan, Stephen Casper, Andrew Critch, Stuart Russell
These results suggest that graph-based partitioning can reveal local specialization and that statistical methods can be used to automatedly screen for sets of neurons that can be understood abstractly.
no code implementations • 29 Sep 2021 • Shlomi Hod, Stephen Casper, Daniel Filan, Cody Wild, Andrew Critch, Stuart Russell
These results suggest that graph-based partitioning can reveal modularity and help us understand how deep neural networks function.
2 code implementations • 4 Mar 2021 • Daniel Filan, Stephen Casper, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell
We also exhibit novel methods to promote clusterability in neural network training, and find that in multi-layer perceptrons they lead to more clusterable networks with little reduction in accuracy.
no code implementations • 1 Jan 2021 • Shlomi Hod, Stephen Casper, Daniel Filan, Cody Wild, Andrew Critch, Stuart Russell
We apply these methods on partitionings generated by a spectral clustering algorithm which uses a graph representation of the network's neurons and weights.
1 code implementation • 10 Mar 2020 • Daniel Filan, Shlomi Hod, Cody Wild, Andrew Critch, Stuart Russell
To discern structure in these weights, we introduce a measurable notion of modularity for multi-layer perceptrons (MLPs), and investigate the modular structure of MLPs trained on datasets of small images.
no code implementations • 13 Jul 2018 • Chris Cundy, Daniel Filan
We introduce a new generative model for human planning under the Bayesian Inverse Reinforcement Learning (BIRL) framework which takes into account the fact that humans often plan using hierarchical strategies.
no code implementations • 10 May 2016 • Tom Everitt, Daniel Filan, Mayank Daswani, Marcus Hutter
As we continue to create more and more intelligent agents, chances increase that they will learn about this ability.
no code implementations • 12 Apr 2016 • Daniel Filan, Marcus Hutter, Jan Leike
On a polynomial time computable sequence our speed prior is computable in exponential time.