no code implementations • 6 Jan 2025 • Junhong Shen, Kushal Tirumala, Michihiro Yasunaga, Ishan Misra, Luke Zettlemoyer, Lili Yu, Chunting Zhou
Most existing image tokenizers encode images into a fixed number of tokens or patches, overlooking the inherent variability in image complexity.
no code implementations • 20 Dec 2024 • Vivek Ramanujan, Kushal Tirumala, Armen Aghajanyan, Luke Zettlemoyer, Ali Farhadi
Current image generation methods, such as latent diffusion and discrete token-based generation, depend on a two-stage training approach.
3 code implementations • 20 Aug 2024 • Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy
Our experiments show that Transfusion scales significantly better than quantizing images and training a language model over discrete image tokens.
no code implementations • 29 Jun 2024 • Aaditya K. Singh, Yu Yang, Kushal Tirumala, Mostafa Elhoushi, Ari S. Morcos
Specifically, many have shown that de-duplicating data, or sub-selecting higher quality data, can lead to efficiency or performance improvements.
no code implementations • 27 May 2024 • Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie, Pietro Astolfi, Reyhane Askari Hemmat, Jun Chen, Kushal Tirumala, Rim Assouel, Mazda Moayeri, Arjang Talattof, Kamalika Chaudhuri, Zechun Liu, Xilun Chen, Quentin Garrido, Karen Ullrich, Aishwarya Agrawal, Kate Saenko, Asli Celikyilmaz, Vikas Chandra
Then, we present and discuss approaches to evaluate VLMs.
no code implementations • 26 Apr 2024 • Vasu Sharma, Karthik Padthe, Newsha Ardalani, Kushal Tirumala, Russell Howes, Hu Xu, Po-Yao Huang, Shang-Wen Li, Armen Aghajanyan, Gargi Ghosh, Luke Zettlemoyer
In recent times training Language Models (LMs) have relied on computationally heavy training over massive datasets which makes this training process extremely laborious.
2 code implementations • 26 Mar 2024 • Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts
We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed.
1 code implementation • 9 Jan 2024 • Amro Abbas, Evgenia Rusak, Kushal Tirumala, Wieland Brendel, Kamalika Chaudhuri, Ari S. Morcos
Using a simple and intuitive complexity measure, we are able to reduce the training cost to a quarter of regular training.
no code implementations • 5 Dec 2023 • Yu Yang, Aaditya K. Singh, Mostafa Elhoushi, Anas Mahmoud, Kushal Tirumala, Fabian Gloeckle, Baptiste Rozière, Carole-Jean Wu, Ari S. Morcos, Newsha Ardalani
Armed with this knowledge, we devise novel pruning metrics that operate in embedding space to identify and remove low-quality entries in the Stack dataset.
2 code implementations • 16 Mar 2023 • Amro Abbas, Kushal Tirumala, Dániel Simig, Surya Ganguli, Ari S. Morcos
Analyzing a subset of LAION, we show that SemDeDup can remove 50% of the data with minimal performance loss, effectively halving training time.
no code implementations • 22 May 2022 • Kushal Tirumala, Aram H. Markosyan, Luke Zettlemoyer, Armen Aghajanyan
Despite their wide adoption, the underlying training and memorization dynamics of very large language models is not well understood.
1 code implementation • 8 May 2022 • Alexander R. Farhang, Jeremy Bernstein, Kushal Tirumala, Yang Liu, Yisong Yue
Weight norm $\|w\|$ and margin $\gamma$ participate in learning theory via the normalized margin $\gamma/\|w\|$.
1 code implementation • ACL 2022 • Tristan Thrush, Kushal Tirumala, Anmol Gupta, Max Bartolo, Pedro Rodriguez, Tariq Kane, William Gaviria Rojas, Peter Mattson, Adina Williams, Douwe Kiela
We introduce Dynatask: an open source system for setting up custom NLP tasks that aims to greatly lower the technical knowledge and effort required for hosting and evaluating state-of-the-art NLP models, as well as for conducting model in the loop data collection with crowdworkers.
1 code implementation • 11 Apr 2019 • Dmitry A. Duev, Ashish Mahabal, Quan-Zhi Ye, Kushal Tirumala, Justin Belicki, Richard Dekany, Sara Frederick, Matthew J. Graham, Russ R. Laher, Frank J. Masci, Thomas A. Prince, Reed Riddle, Philippe Rosnet, Maayane T. Soumagnac
We present DeepStreaks, a convolutional-neural-network, deep-learning system designed to efficiently identify streaking fast-moving near-Earth objects that are detected in the data of the Zwicky Transient Facility (ZTF), a wide-field, time-domain survey using a dedicated 47 sq.
Instrumentation and Methods for Astrophysics Earth and Planetary Astrophysics
1 code implementation • 5 Feb 2019 • Ashish Mahabal, Umaa Rebbapragada, Richard Walters, Frank J. Masci, Nadejda Blagorodnova, Jan van Roestel, Quan-Zhi Ye, Rahul Biswas, Kevin Burdge, Chan-Kao Chang, Dmitry A. Duev, V. Zach Golkhou, Adam A. Miller, Jakob Nordin, Charlotte Ward, Scott Adams, Eric C. Bellm, Doug Branton, Brian Bue, Chris Cannella, Andrew Connolly, Richard Dekany, Ulrich Feindt, Tiara Hung, Lucy Fortson, Sara Frederick, C. Fremling, Suvi Gezari, Matthew Graham, Steven Groom, Mansi M. Kasliwal, Shrinivas Kulkarni, Thomas Kupfer, Hsing Wen Lin, Chris Lintott, Ragnhild Lunnan, John Parejko, Thomas A. Prince, Reed Riddle, Ben Rusholme, Nicholas Saunders, Nima Sedaghat, David L. Shupe, Leo P. Singer, Maayane T. Soumagnac, Paula Szkody, Yutaro Tachibana, Kushal Tirumala, Sjoert van Velzen, Darryl Wright
The Zwicky Transient Facility is a large optical survey in multiple filters producing hundreds of thousands of transient alerts per night.
Instrumentation and Methods for Astrophysics