Search Results for author: Kushal Tirumala

Found 15 papers, 8 papers with code

CAT: Content-Adaptive Image Tokenization

no code implementations6 Jan 2025 Junhong Shen, Kushal Tirumala, Michihiro Yasunaga, Ishan Misra, Luke Zettlemoyer, Lili Yu, Chunting Zhou

Most existing image tokenizers encode images into a fixed number of tokens or patches, overlooking the inherent variability in image complexity.

Image Reconstruction

When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization

no code implementations20 Dec 2024 Vivek Ramanujan, Kushal Tirumala, Armen Aghajanyan, Luke Zettlemoyer, Ali Farhadi

Current image generation methods, such as latent diffusion and discrete token-based generation, depend on a two-stage training approach.

Image Generation

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

3 code implementations20 Aug 2024 Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy

Our experiments show that Transfusion scales significantly better than quantizing images and training a language model over discrete image tokens.

Language Modeling Language Modelling

Brevity is the soul of wit: Pruning long files for code generation

no code implementations29 Jun 2024 Aaditya K. Singh, Yu Yang, Kushal Tirumala, Mostafa Elhoushi, Ari S. Morcos

Specifically, many have shown that de-duplicating data, or sub-selecting higher quality data, can lead to efficiency or performance improvements.

Code Generation HumanEval

Text Quality-Based Pruning for Efficient Training of Language Models

no code implementations26 Apr 2024 Vasu Sharma, Karthik Padthe, Newsha Ardalani, Kushal Tirumala, Russell Howes, Hu Xu, Po-Yao Huang, Shang-Wen Li, Armen Aghajanyan, Gargi Ghosh, Luke Zettlemoyer

In recent times training Language Models (LMs) have relied on computationally heavy training over massive datasets which makes this training process extremely laborious.

The Unreasonable Ineffectiveness of the Deeper Layers

2 code implementations26 Mar 2024 Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts

We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed.

Quantization Question Answering

Effective pruning of web-scale datasets based on complexity of concept clusters

1 code implementation9 Jan 2024 Amro Abbas, Evgenia Rusak, Kushal Tirumala, Wieland Brendel, Kamalika Chaudhuri, Ari S. Morcos

Using a simple and intuitive complexity measure, we are able to reduce the training cost to a quarter of regular training.

SemDeDup: Data-efficient learning at web-scale through semantic deduplication

2 code implementations16 Mar 2023 Amro Abbas, Kushal Tirumala, Dániel Simig, Surya Ganguli, Ari S. Morcos

Analyzing a subset of LAION, we show that SemDeDup can remove 50% of the data with minimal performance loss, effectively halving training time.

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

no code implementations22 May 2022 Kushal Tirumala, Aram H. Markosyan, Luke Zettlemoyer, Armen Aghajanyan

Despite their wide adoption, the underlying training and memorization dynamics of very large language models is not well understood.

Language Modeling Language Modelling +2

Investigating Generalization by Controlling Normalized Margin

1 code implementation8 May 2022 Alexander R. Farhang, Jeremy Bernstein, Kushal Tirumala, Yang Liu, Yisong Yue

Weight norm $\|w\|$ and margin $\gamma$ participate in learning theory via the normalized margin $\gamma/\|w\|$.

Learning Theory

Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks

1 code implementation ACL 2022 Tristan Thrush, Kushal Tirumala, Anmol Gupta, Max Bartolo, Pedro Rodriguez, Tariq Kane, William Gaviria Rojas, Peter Mattson, Adina Williams, Douwe Kiela

We introduce Dynatask: an open source system for setting up custom NLP tasks that aims to greatly lower the technical knowledge and effort required for hosting and evaluating state-of-the-art NLP models, as well as for conducting model in the loop data collection with crowdworkers.

Benchmarking

DeepStreaks: identifying fast-moving objects in the Zwicky Transient Facility data with deep learning

1 code implementation11 Apr 2019 Dmitry A. Duev, Ashish Mahabal, Quan-Zhi Ye, Kushal Tirumala, Justin Belicki, Richard Dekany, Sara Frederick, Matthew J. Graham, Russ R. Laher, Frank J. Masci, Thomas A. Prince, Reed Riddle, Philippe Rosnet, Maayane T. Soumagnac

We present DeepStreaks, a convolutional-neural-network, deep-learning system designed to efficiently identify streaking fast-moving near-Earth objects that are detected in the data of the Zwicky Transient Facility (ZTF), a wide-field, time-domain survey using a dedicated 47 sq.

Instrumentation and Methods for Astrophysics Earth and Planetary Astrophysics

Cannot find the paper you are looking for? You can Submit a new open access paper.