no code implementations • 23 Aug 2024 • Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox, Rameswar Panda
This is not only because there is a complicated correlation between learning rate, batch size, number of training tokens, model size, and other hyperparameters but also because it is prohibitively expensive to perform a hyperparameter search for large language models with Billions or Trillions of parameters.
1 code implementation • 18 Jul 2024 • Matt Stallone, Vaibhav Saxena, Leonid Karlinsky, Bridget McGinn, Tim Bula, Mayank Mishra, Adriana Meza Soria, Gaoyuan Zhang, Aditya Prasad, Yikang Shen, Saptha Surendran, Shanmukha Guttula, Hima Patel, Parameswaran Selvam, Xuan-Hong Dang, Yan Koyfman, Atin Sood, Rogerio Feris, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda
This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens.
2 code implementations • 7 May 2024 • Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal, Yi Zhou, Chris Johnson, Aanchal Goyal, Hima Patel, Yousaf Shah, Petros Zerfos, Heiko Ludwig, Asim Munawar, Maxwell Crouse, Pavan Kapanipathi, Shweta Salaria, Bob Calio, Sophia Wen, Seetharami Seelam, Brian Belgodere, Carlos Fonseca, Amith Singhee, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda
Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously.
no code implementations • 2 Mar 2024 • Shivchander Sudalairaj, Abhishek Bhandwaldar, Aldo Pareja, Kai Xu, David D. Cox, Akash Srivastava
This work introduces LAB (Large-scale Alignment for chatBots), a novel methodology designed to overcome the scalability challenges in the instruction-tuning phase of large language model (LLM) training.
1 code implementation • NeurIPS 2021 • Joel Dapello, Jenelle Feather, Hang Le, Tiago Marques, David D. Cox, Josh H. McDermott, James J. DiCarlo, SueYeon Chung
Adversarial examples are often cited by neuroscientists and machine learning researchers as an example of how computational models diverge from biological sensory systems.
no code implementations • 21 Dec 2020 • Jianwei Yang, Jiayuan Mao, Jiajun Wu, Devi Parikh, David D. Cox, Joshua B. Tenenbaum, Chuang Gan
In contrast, symbolic and modular models have a relatively better grounding and robustness, though at the cost of accuracy.
no code implementations • 25 Oct 2020 • Akash Srivastava, Yamini Bansal, Yukun Ding, Cole Lincoln Hurwitz, Kai Xu, Bernhard Egger, Prasanna Sattigeri, Joshua B. Tenenbaum, Agus Sudjianto, Phuong Le, Arun Prakash R, Nengfeng Zhou, Joel Vaughan, Yaqun Wang, Anwesha Bhattacharyya, Kristjan Greenewald, David D. Cox, Dan Gutfreund
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the (aggregate) posterior to encourage statistical independence of the latent factors.
no code implementations • 9 Sep 2020 • Seungwook Han, Akash Srivastava, Cole Hurwitz, Prasanna Sattigeri, David D. Cox
First, we generate images in low-frequency bands by training a sampler in the wavelet domain.
no code implementations • ICLR 2020 • Akash Srivastava, Yamini Bansal, Yukun Ding, Bernhard Egger, Prasanna Sattigeri, Josh Tenenbaum, David D. Cox, Dan Gutfreund
In this work, we tackle a slightly more intricate scenario where the observations are generated from a conditional distribution of some known control variate and some latent noise variate.
no code implementations • 19 Nov 2019 • Akash Srivastava, Jessie Rosenberg, Dan Gutfreund, David D. Cox
Then an inference network (encoder)is trained to invert the decoder.
no code implementations • 3 Jun 2018 • Yamini Bansal, Madhu Advani, David D. Cox, Andrew M. Saxe
To solve this constrained optimization problem, our method employs Lagrange multipliers that act as integrators of error over training and identify `support vector'-like examples.
no code implementations • 17 Feb 2015 • Chuan-Yung Tsai, David D. Cox
A central challenge in sensory neuroscience is describing how the activity of populations of neurons can represent useful features of the external environment.
1 code implementation • SCIPY 2013 2013 • James Bergstra, Dan Yamins, David D. Cox
Sequential model-based optimization (also known as Bayesian optimization) is one of the most efficient methods (per function evaluation) of function minimization.
no code implementations • 14 Jun 2013 • James Bergstra, David D. Cox
This paper also introduces a new ensemble construction variant that combines hyperparameter optimization with the construction of ensembles.
Facial Expression Recognition Facial Expression Recognition (FER) +2