Search Results for author: Kushal Tirumala

Found 9 papers, 6 papers with code

The Unreasonable Ineffectiveness of the Deeper Layers

no code implementations26 Mar 2024 Andrey Gromov, Kushal Tirumala, Hassan Shapourian, Paolo Glorioso, Daniel A. Roberts

We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed.

Quantization Question Answering

Effective pruning of web-scale datasets based on complexity of concept clusters

1 code implementation9 Jan 2024 Amro Abbas, Evgenia Rusak, Kushal Tirumala, Wieland Brendel, Kamalika Chaudhuri, Ari S. Morcos

Using a simple and intuitive complexity measure, we are able to reduce the training cost to a quarter of regular training.

Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data

no code implementations5 Dec 2023 Yu Yang, Aaditya K. Singh, Mostafa Elhoushi, Anas Mahmoud, Kushal Tirumala, Fabian Gloeckle, Baptiste Rozière, Carole-Jean Wu, Ari S. Morcos, Newsha Ardalani

Armed with this knowledge, we devise novel pruning metrics that operate in embedding space to identify and remove low-quality entries in the Stack dataset.

Code Generation

SemDeDup: Data-efficient learning at web-scale through semantic deduplication

1 code implementation16 Mar 2023 Amro Abbas, Kushal Tirumala, Dániel Simig, Surya Ganguli, Ari S. Morcos

Analyzing a subset of LAION, we show that SemDeDup can remove 50% of the data with minimal performance loss, effectively halving training time.

Investigating Generalization by Controlling Normalized Margin

1 code implementation8 May 2022 Alexander R. Farhang, Jeremy Bernstein, Kushal Tirumala, Yang Liu, Yisong Yue

Weight norm $\|w\|$ and margin $\gamma$ participate in learning theory via the normalized margin $\gamma/\|w\|$.

Learning Theory

Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks

1 code implementation ACL 2022 Tristan Thrush, Kushal Tirumala, Anmol Gupta, Max Bartolo, Pedro Rodriguez, Tariq Kane, William Gaviria Rojas, Peter Mattson, Adina Williams, Douwe Kiela

We introduce Dynatask: an open source system for setting up custom NLP tasks that aims to greatly lower the technical knowledge and effort required for hosting and evaluating state-of-the-art NLP models, as well as for conducting model in the loop data collection with crowdworkers.

Benchmarking

DeepStreaks: identifying fast-moving objects in the Zwicky Transient Facility data with deep learning

1 code implementation11 Apr 2019 Dmitry A. Duev, Ashish Mahabal, Quan-Zhi Ye, Kushal Tirumala, Justin Belicki, Richard Dekany, Sara Frederick, Matthew J. Graham, Russ R. Laher, Frank J. Masci, Thomas A. Prince, Reed Riddle, Philippe Rosnet, Maayane T. Soumagnac

We present DeepStreaks, a convolutional-neural-network, deep-learning system designed to efficiently identify streaking fast-moving near-Earth objects that are detected in the data of the Zwicky Transient Facility (ZTF), a wide-field, time-domain survey using a dedicated 47 sq.

Instrumentation and Methods for Astrophysics Earth and Planetary Astrophysics

Cannot find the paper you are looking for? You can Submit a new open access paper.