no code implementations • 13 Oct 2024 • Vithursan Thangarasa, Ganesh Venkatesh, Mike Lasby, Nish Sinnadurai, Sean Lie
Specifically, when pruning six decoder blocks on Llama3. 1-8B Instruct (i. e., 32 to 26 layers, reducing the model size from 8. 03B to 6. 72B parameters), our method retains 91. 2% of the original model's accuracy compared to 81. 7% with SFT, while reducing real-world FLOPs by 16. 3%.
1 code implementation • 18 Apr 2024 • Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Srijan Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Sarah Luger, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin Vanschoren
We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0. 5 benchmark.
no code implementations • 1 Mar 2024 • Vithursan Thangarasa, Mahmoud Salem, Shreyas Saxena, Kevin Leong, Joel Hestness, Sean Lie
Large language models (LLMs) are typically trained on general source data for various domains, but a recent surge in domain-specific LLMs has shown their potential to outperform general-purpose models in domain-specific tasks (e. g., biomedicine).
Ranked #11 on
Question Answering
on PubMedQA
2 code implementations • 21 Mar 2023 • Vithursan Thangarasa, Shreyas Saxena, Abhay Gupta, Sean Lie
To the best of our knowledge, this is the first work to demonstrate the use of sparsity for improving the accuracy of dense models through a set of simple-to-use sparse transformations.
no code implementations • 18 Mar 2023 • Vithursan Thangarasa, Abhay Gupta, William Marshall, Tianda Li, Kevin Leong, Dennis Decoste, Sean Lie, Shreyas Saxena
In this work, we show the benefits of using unstructured weight sparsity to train only a subset of weights during pre-training (Sparse Pre-training) and then recover the representational capacity by allowing the zeroed weights to learn (Dense Fine-tuning).
1 code implementation • 28 Jun 2022 • Vitaliy Chiley, Vithursan Thangarasa, Abhay Gupta, Anshul Samar, Joel Hestness, Dennis Decoste
However, training them requires substantial accelerator memory for saving large, multi-resolution activations.
Ranked #339 on
Image Classification
on ImageNet
no code implementations • 19 Apr 2021 • Mihir Pendse, Vithursan Thangarasa, Vitaliy Chiley, Ryan Holmdahl, Joel Hestness, Dennis Decoste
The inverted residual bottleneck block uses lightweight depthwise separable convolutions to reduce computation by decomposing convolutions into a pointwise convolution and a depthwise convolution.
no code implementations • 30 Jun 2020 • Vithursan Thangarasa, Thomas Miconi, Graham W. Taylor
Continual learning is the problem of sequentially learning new tasks or knowledge while protecting previously acquired knowledge.
no code implementations • 25 Sep 2019 • Vithursan Thangarasa, Thomas Miconi, Graham W. Taylor
Continual learning is the problem of sequentially learning new tasks or knowledge while protecting previously acquired knowledge.
1 code implementation • 24 Jul 2018 • Vithursan Thangarasa, Graham W. Taylor
Selecting the most appropriate data examples to present a deep neural network (DNN) at different stages of training is an unsolved challenge.
no code implementations • ICLR 2018 • Vithursan Thangarasa, Graham W. Taylor
The \textit{student} CNN classifier dynamically selects samples to form a mini-batch based on the \textit{easiness} from cross-entropy losses and \textit{true diverseness} of examples from the representation space sculpted by the \textit{embedding} CNN.