1 code implementation • ICLR 2022 • Xuechen Li, Florian Tramèr, Percy Liang, Tatsunori Hashimoto
Differentially Private (DP) learning has seen limited success for building large deep learning models of text, and attempts at straightforwardly applying Differentially Private Stochastic Gradient Descent (DP-SGD) to NLP tasks have resulted in large performance drops and high computational overhead.
no code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Kohd, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
no code implementations • 23 Jun 2021 • Zhongliang Li, Zhihao Jin, Xuechen Li, Linlin Shen
The pairs of normal and pseudo COVID-19 images were then used to train an encoder-decoder architecture based U-Net for image restoration, which does not require any labelled data.
no code implementations • 28 May 2021 • Xuechen Li, Chris J. Maddison, Daniel Tarlow
Source code spends most of its time in a broken or incomplete state during software development.
2 code implementations • NeurIPS 2021 • Patrick Kidger, James Foster, Xuechen Li, Terry Lyons
This reduces computational cost (giving up to a $1. 87\times$ speedup) and removes the numerical truncation errors associated with gradient penalty.
1 code implementation • 12 Feb 2021 • Winnie Xu, Ricky T. Q. Chen, Xuechen Li, David Duvenaud
We perform scalable approximate inference in continuous-depth Bayesian neural networks.
1 code implementation • 6 Feb 2021 • Patrick Kidger, James Foster, Xuechen Li, Harald Oberhauser, Terry Lyons
Stochastic differential equations (SDEs) are a staple of mathematical modelling of temporal dynamics.
no code implementations • 1 Jan 2021 • Patrick Kidger, James Foster, Xuechen Li, Harald Oberhauser, Terry Lyons
Several authors have introduced \emph{Neural Stochastic Differential Equations} (Neural SDEs), often involving complex theory with various limitations.
no code implementations • ICLR 2021 • Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu
While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.
2 code implementations • 5 Jan 2020 • Xuechen Li, Ting-Kam Leonard Wong, Ricky T. Q. Chen, David Duvenaud
The adjoint sensitivity method scalably computes gradients of solutions to ordinary differential equations.
Ranked #1 on
Video Prediction
on CMU Mocap-2
no code implementations • pproximateinference AABI Symposium 2019 • Xuechen Li, Ting-Kam Leonard Wong, Ricky T. Q. Chen, David K. Duvenaud
We derive reverse-mode (or adjoint) automatic differentiation for solutions of stochastic differential equations (SDEs), allowing time-efficient and constant-memory computation of pathwise gradients, a continuous-time analogue of the reparameterization trick.
no code implementations • NeurIPS 2019 • Xuechen Li, Denny Wu, Lester Mackey, Murat A. Erdogdu
In this paper, we establish the convergence rate of sampling algorithms obtained by discretizing smooth It\^o diffusions exhibiting fast Wasserstein-$2$ contraction, based on local deviation properties of the integration scheme.
no code implementations • 25 Mar 2019 • Yinlong Liu, Xuechen Li, Manning Wang, Guang Chen, Zhijian Song, Alois Knoll
In this paper, we consider pairwise constraints and propose a globally optimal algorithm for solving the absolute pose estimation problem.
no code implementations • 29 Dec 2018 • Xuechen Li, Yinlong Liu, Yiru Wang, Chen Wang, Manning Wang, Zhijian Song
However, the existing global methods are slow for two main reasons: the computational complexity of BnB is exponential to the problem dimensionality (which is six for 3D rigid registration), and the bound evaluation used in BnB is inefficient.
1 code implementation • 30 Apr 2018 • George Barmpalias, Neng Huang, Andrew Lewis-Pye, Angsheng Li, Xuechen Li, YiCheng Pan, Tim Roughgarden
We introduce the \emph{idemetric} property, which formalises the idea that most nodes in a graph have similar distances between them, and which turns out to be quite standard amongst small-world network models.
Social and Information Networks Discrete Mathematics
8 code implementations • NeurIPS 2018 • Ricky T. Q. Chen, Xuechen Li, Roger Grosse, David Duvenaud
We decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables.
2 code implementations • ICML 2018 • Chris Cremer, Xuechen Li, David Duvenaud
Furthermore, we show that the parameters used to increase the expressiveness of the approximation play a role in generalizing inference rather than simply improving the complexity of the approximation.