1 code implementation • 7 Sep 2023 • Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil Purushwalkam, Tong Niu, Wojciech Kryściński, Lidiya Murakhovs'ka, Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat, Chien-Sheng Wu, Silvio Savarese, Yingbo Zhou, Shafiq Joty, Caiming Xiong
Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context.
2 code implementations • 3 May 2023 • Erik Nijkamp, Hiroaki Hayashi, Caiming Xiong, Silvio Savarese, Yingbo Zhou
In this study, we attempt to render the training of LLMs for program synthesis more efficient by unifying four key components: (1) model architectures, (2) learning methods, (3) infill sampling, and, (4) data distributions.
no code implementations • 31 Jan 2023 • Omead Pooladzandi, Pasha Khosravi, Erik Nijkamp, Baharan Mirzasoleiman
Generative models have the ability to synthesize data points drawn from the data distribution, however, not all generated samples are high quality.
1 code implementation • 29 Oct 2022 • Mitch Hill, Erik Nijkamp, Jonathan Mitchell, Bo Pang, Song-Chun Zhu
This work proposes a method for using any generator network as the foundation of an Energy-Based Model (EBM).
no code implementations • 21 Jul 2022 • Paul Kassianik, Erik Nijkamp, Bo Pang, Yingbo Zhou, Caiming Xiong
As machine learning tools progress, the inevitable question arises: How can machine learning help us write better code?
4 code implementations • 27 Jun 2022 • Erik Nijkamp, Jeffrey Ruffolo, Eli N. Weinstein, Nikhil Naik, Ali Madani
Attention-based models trained on protein sequences have demonstrated incredible success at classification and generation tasks relevant for artificial intelligence-driven protein design.
7 code implementations • 25 Mar 2022 • Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong
To democratize this, we train and release a family of large language models up to 16. 1B parameters, called CODEGEN, on natural language and programming language data, and open source the training library JAXFORMER.
Ranked #89 on Code Generation on HumanEval
no code implementations • 15 Mar 2022 • Bo Pang, Erik Nijkamp, Wojciech Kryściński, Silvio Savarese, Yingbo Zhou, Caiming Xiong
Critical to the success of a summarization model is the faithful inference of latent representations of words or tokens in the source documents.
Ranked #1 on Text Summarization on Pubmed
no code implementations • ICLR 2022 • Erik Nijkamp, Ruiqi Gao, Pavel Sountsov, Srinivas Vasudevan, Bo Pang, Song-Chun Zhu, Ying Nian Wu
However, MCMC sampling of EBMs in high-dimensional data space is generally not mixing, because the energy function, which is usually parametrized by deep network, is highly multi-modal in the data space.
no code implementations • 29 Sep 2021 • Bo Pang, Erik Nijkamp, Wojciech Maciej Kryscinski, Silvio Savarese, Yingbo Zhou, Caiming Xiong
Critical to the success of a summarization model is the faithful inference of latent representations of words or tokens in the source documents.
no code implementations • NAACL 2021 • Erik Nijkamp, Bo Pang, Ying Nian Wu, Caiming Xiong
We introduce Self-CRItic Pretraining Transformers (SCRIPT) for representation learning of text.
1 code implementation • EACL 2021 • Bo Pang, Erik Nijkamp, Tian Han, Ying Nian Wu
It is initialized from the prior distribution of the latent variable and then runs a small number (e. g., 20) of Langevin dynamics steps guided by its posterior distribution.
no code implementations • NeurIPS Workshop ICBINB 2020 • Bo Pang, Erik Nijkamp, Jiali Cui, Tian Han, Ying Nian Wu
This paper proposes a latent space energy-based prior model for semi-supervised learning.
1 code implementation • ACL 2020 • Bo Pang, Erik Nijkamp, Wenjuan Han, Linqi Zhou, Yixian Liu, Kewei Tu
Open-domain dialogue generation has gained increasing attention in Natural Language Processing.
1 code implementation • NeurIPS 2020 • Bo Pang, Tian Han, Erik Nijkamp, Song-Chun Zhu, Ying Nian Wu
Due to the low dimensionality of the latent space and the expressiveness of the top-down network, a simple EBM in latent space can capture regularities in the data effectively, and MCMC sampling in latent space is efficient and mixes well.
no code implementations • 12 Jun 2020 • Erik Nijkamp, Ruiqi Gao, Pavel Sountsov, Srinivas Vasudevan, Bo Pang, Song-Chun Zhu, Ying Nian Wu
Learning energy-based model (EBM) requires MCMC sampling of the learned model as an inner loop of the learning algorithm.
no code implementations • CVPR 2020 • Tian Han, Erik Nijkamp, Linqi Zhou, Bo Pang, Song-Chun Zhu, Ying Nian Wu
This paper proposes a joint training method to learn both the variational auto-encoder (VAE) and the latent energy-based model (EBM).
no code implementations • ECCV 2020 • Erik Nijkamp, Bo Pang, Tian Han, Linqi Zhou, Song-Chun Zhu, Ying Nian Wu
Learning such a generative model requires inferring the latent variables for each training example based on the posterior distribution of these latent variables.
2 code implementations • CVPR 2020 • Ruiqi Gao, Erik Nijkamp, Diederik P. Kingma, Zhen Xu, Andrew M. Dai, Ying Nian Wu
(2) The update of the flow model approximately minimizes the Jensen-Shannon divergence between the flow model and the data distribution.
no code implementations • 26 Nov 2019 • Jianwen Xie, Ruiqi Gao, Erik Nijkamp, Song-Chun Zhu, Ying Nian Wu
Learning representations of data is an important problem in statistics and machine learning.
no code implementations • 7 May 2019 • Lior Deutsch, Erik Nijkamp, Yu Yang
Recent work on mode connectivity in the loss landscape of deep neural networks has demonstrated that the locus of (sub-)optimal weight vectors lies on continuous paths.
no code implementations • NeurIPS 2019 • Erik Nijkamp, Mitch Hill, Song-Chun Zhu, Ying Nian Wu
We treat this non-convergent short-run MCMC as a learned generator model or a flow model.
3 code implementations • 29 Mar 2019 • Erik Nijkamp, Mitch Hill, Tian Han, Song-Chun Zhu, Ying Nian Wu
On the other hand, ConvNet potentials learned with non-convergent MCMC do not have a valid steady-state and cannot be considered approximate unnormalized densities of the training data because long-run MCMC samples differ greatly from observed images.
1 code implementation • 28 Dec 2018 • Tian Han, Erik Nijkamp, Xiaolin Fang, Mitch Hill, Song-Chun Zhu, Ying Nian Wu
This paper proposes the divergence triangle as a framework for joint training of generator model, energy-based model and inference model.
no code implementations • 2 Mar 2018 • Mitch Hill, Erik Nijkamp, Song-Chun Zhu
However, characterizing a learned probability density to uncover the Hopfield memories of the model, encoded by the structure of the local modes, remains an open challenge.
no code implementations • CVPR 2017 • Jianwen Xie, Yifei Xu, Erik Nijkamp, Ying Nian Wu, Song-Chun Zhu
This paper proposes a method for generative learning of hierarchical random field models.