no code implementations • 29 Jun 2022 • C. Lawrence Zitnick, Abhishek Das, Adeesh Kolluru, Janice Lan, Muhammed Shuaibi, Anuroop Sriram, Zachary Ulissi, Brandon Wood
We propose the Spherical Channel Network (SCN) to model atomic energies and forces.
1 code implementation • 17 Jun 2022 • Richard Tran, Janice Lan, Muhammed Shuaibi, Siddharth Goyal, Brandon M. Wood, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, Anuroop Sriram, Zachary Ulissi, C. Lawrence Zitnick
The dataset and baseline models are open sourced, and a public leaderboard will follow to encourage continued community developments on the total energy tasks and data.
no code implementations • 1 Jan 2021 • Janice Lan, Rudy Chin, Alexei Baevski, Ari S. Morcos
However, prior work has implicitly assumed that the best training configuration for model performance was also the best configuration for mask discovery.
6 code implementations • ICLR 2020 • Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu
Large transformer-based language models (LMs) trained on huge text corpora have shown unparalleled generation capabilities.
1 code implementation • 18 Oct 2019 • Ted Moskovitz, Rui Wang, Janice Lan, Sanyam Kapoor, Thomas Miconi, Jason Yosinski, Aditya Rawal
Standard gradient descent methods are susceptible to a range of issues that can impede training, such as high correlations and different scaling in parameter space. These difficulties can be addressed by second-order approaches that apply a pre-conditioning matrix to the gradient to improve convergence.
2 code implementations • NeurIPS 2019 • Janice Lan, Rosanne Liu, Hattie Zhou, Jason Yosinski
We propose a new window into training called Loss Change Allocation (LCA), in which credit for changes to the network loss is conservatively partitioned to the parameters.
6 code implementations • NeurIPS 2019 • Hattie Zhou, Janice Lan, Rosanne Liu, Jason Yosinski
The recent "Lottery Ticket Hypothesis" paper by Frankle & Carbin showed that a simple approach to creating sparse networks (keeping the large weights) results in models that are trainable from scratch, but only when starting from the same initial weights.