1 code implementation • 2 Apr 2024 • Sherin Muckatira, Vijeta Deshpande, Vladislav Lialin, Anna Rumshisky
Large language models can solve new tasks without task-specific fine-tuning.
1 code implementation • 2 Apr 2024 • Namrata Shivagunde, Vladislav Lialin, Sherin Muckatira, Anna Rumshisky
In contrast, the underlying pre-trained LLMs they use as a backbone are known to be brittle in this respect.
no code implementations • 3 Mar 2024 • Hyewon Jeong, Sarah Jabbour, Yuzhe Yang, Rahul Thapta, Hussein Mozannar, William Jongwon Han, Nikita Mehandru, Michael Wornow, Vladislav Lialin, Xin Liu, Alejandro Lozano, Jiacheng Zhu, Rafal Dariusz Kocielnik, Keith Harrigian, Haoran Zhang, Edward Lee, Milos Vukadinovic, Aparna Balagopalan, Vincent Jeanselme, Katherine Matton, Ilker Demirel, Jason Fries, Parisa Rashidi, Brett Beaulieu-Jones, Xuhai Orson Xu, Matthew McDermott, Tristan Naumann, Monica Agrawal, Marinka Zitnik, Berk Ustun, Edward Choi, Kristen Yeom, Gamze Gursoy, Marzyeh Ghassemi, Emma Pierson, George Chen, Sanjat Kanjilal, Michael Oberst, Linying Zhang, Harvineet Singh, Tom Hartvigsen, Helen Zhou, Chinasa T. Okolo
The organization of the research roundtables at the conference involved 17 Senior Chairs and 19 Junior Chairs across 11 tables.
no code implementations • 10 Nov 2023 • Sarah Pan, Vladislav Lialin, Sherin Muckatira, Anna Rumshisky
While recent advances have boosted LM proficiency in linguistic benchmarks, LMs consistently struggle to reason correctly on complex tasks like mathematics.
3 code implementations • 11 Jul 2023 • Vladislav Lialin, Namrata Shivagunde, Sherin Muckatira, Anna Rumshisky
Despite the dominance and effectiveness of scaling, resulting in large networks with hundreds of billions of parameters, the necessity to train overparameterized models remains poorly understood, while training costs grow exponentially.
1 code implementation • 26 May 2023 • Vijeta Deshpande, Dan Pechi, Shree Thatte, Vladislav Lialin, Anna Rumshisky
The majority of recent scaling laws studies focused on high-compute high-parameter count settings, leaving the question of when these abilities begin to emerge largely unanswered.
no code implementations • 4 Apr 2023 • Vladislav Lialin, Stephen Rawls, David Chan, Shalini Ghosh, Anna Rumshisky, Wael Hamza
Currently popular video-text data mining approach via automatic speech recognition (ASR) used in HowTo100M provides low-quality captions that often do not refer to the video content.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
1 code implementation • 29 Mar 2023 • Namrata Shivagunde, Vladislav Lialin, Anna Rumshisky
Finally, we observe that while GPT3 has generated all the examples in ROLE-1500 is only able to solve 24. 6% of them during probing.
no code implementations • 28 Mar 2023 • Vladislav Lialin, Vijeta Deshpande, Anna Rumshisky
This paper presents a systematic overview and comparison of parameter-efficient fine-tuning methods covering over 40 papers published between February 2019 and February 2023.
1 code implementation • NAACL (ClinicalNLP) 2022 • Eric Lehman, Vladislav Lialin, Katelyn Y. Legaspi, Anne Janelle R. Sy, Patricia Therese S. Pile, Nicole Rose I. Alberto, Richard Raymund R. Ragasa, Corinna Victoria M. Puyat, Isabelle Rose I. Alberto, Pia Gabrielle I. Alfonso, Marianne Taliño, Dana Moukheiber, Byron C. Wallace, Anna Rumshisky, Jenifer J. Liang, Preethi Raghavan, Leo Anthony Celi, Peter Szolovits
The questions are generated by medical experts from 100+ MIMIC-III discharge summaries.
1 code implementation • ACL 2022 • Vladislav Lialin, Kevin Zhao, Namrata Shivagunde, Anna Rumshisky
Existing pre-trained transformer analysis works usually focus only on one or two model families at a time, overlooking the variability of the architecture and pre-training objectives.
no code implementations • 15 Oct 2020 • Vladislav Lialin, Rahul Goel, Andrey Simanovsky, Anna Rumshisky, Rushin Shah
To reduce training time, one can fine-tune the previously trained model on each patch, but naive fine-tuning exhibits catastrophic forgetting - degradation of the model performance on the data not represented in the data patch.
2 code implementations • 16 Oct 2019 • David Donahue, Vladislav Lialin, Anna Rumshisky
The Transformer architecture has become increasingly popular over the past two years, owing to its impressive performance on a number of natural language processing (NLP) tasks.
no code implementations • 29 Aug 2019 • Anna Rogers, Marzena Karpinska, Ankita Gupta, Vladislav Lialin, Gregory Smelkov, Anna Rumshisky
For the past decade, temporal annotation has been sparse: only a small portion of event pairs in a text was annotated.