no code implementations • 30 Mar 2024 • Taishi Nakamura, Mayank Mishra, Simone Tedeschi, Yekun Chai, Jason T Stillerman, Felix Friedrich, Prateek Yadav, Tanmay Laud, Vu Minh Chien, Terry Yue Zhuo, Diganta Misra, Ben Bogin, Xuan-Son Vu, Marzena Karpinska, Arnav Varma Dantuluri, Wojciech Kusa, Tommaso Furlanello, Rio Yokota, Niklas Muennighoff, Suhas Pai, Tosin Adewumi, Veronika Laippala, Xiaozhe Yao, Adalberto Junior, Alpay Ariyak, Aleksandr Drozd, Jordan Clive, Kshitij Gupta, Liangyu Chen, Qi Sun, Ken Tsui, Noah Persaud, Nour Fahmy, Tianlong Chen, Mohit Bansal, Nicolo Monti, Tai Dang, Ziyang Luo, Tien-Tung Bui, Roberto Navigli, Virendra Mehta, Matthew Blumberg, Victor May, Huu Nguyen, Sampo Pyysalo
Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility.
1 code implementation • 22 Nov 2023 • Prateek Yadav, Leshem Choshen, Colin Raffel, Mohit Bansal
Despite the efficiency of PEFT methods, the size of expert models can make it onerous to retrieve expert models per query over high-latency networks like the Internet or serve multiple experts on a single GPU.
1 code implementation • 11 Oct 2023 • Adyasha Maharana, Prateek Yadav, Mohit Bansal
There are two dominant approaches: (1) geometry-based data selection for maximizing data diversity in the coreset, and (2) functions that assign difficulty scores to samples based on training dynamics.
1 code implementation • 2 Oct 2023 • Pingzhi Li, Zhenyu Zhang, Prateek Yadav, Yi-Lin Sung, Yu Cheng, Mohit Bansal, Tianlong Chen
Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks, however, they have issues like (a) High Memory Usage, due to duplication of the network layers into multiple copies as experts; and (b) Redundancy in Experts, as common learning-based routing policies suffer from representational collapse.
no code implementations • 5 Jul 2023 • Prateek Yadav, Qing Sun, Hantian Ding, Xiaopeng Li, Dejiao Zhang, Ming Tan, Xiaofei Ma, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Mohit Bansal, Bing Xiang
Large-scale code generation models such as Codex and CodeT5 have achieved impressive performance.
2 code implementations • NeurIPS 2023 • Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, Mohit Bansal
To address this, we propose our method, TRIM, ELECT SIGN & MERGE (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign.
1 code implementation • NeurIPS 2023 • Shoubin Yu, Jaemin Cho, Prateek Yadav, Mohit Bansal
SeViLA framework consists of two modules: Localizer and Answerer, where both are parameter-efficiently fine-tuned from BLIP-2.
Ranked #3 on Zero-Shot Video Question Answer on IntentQA (using extra training data)
1 code implementation • 18 Oct 2022 • Prateek Yadav, Mohit Bansal
Although there is no forgetting, the performance of SupSup is sub-optimal because fixed weights restrict its representational power.
1 code implementation • ACL 2022 • Swarnadeep Saha, Prateek Yadav, Mohit Bansal
In this work, we study pre-trained language models that generate explanation graphs in an end-to-end manner and analyze their ability to learn the structural constraints and semantics of such graphs.
1 code implementation • 1 Nov 2021 • Prateek Yadav, Peter Hase, Mohit Bansal
Current approaches try to optimize for the cost incurred by users when adopting a recourse, but they assume that all users share the same cost function.
1 code implementation • NAACL 2021 • Swarnadeep Saha, Prateek Yadav, Mohit Bansal
In order to jointly learn from all proof graphs and exploit the correlations between multiple proofs for a question, we pose this task as a set generation problem over structured output spaces where each proof is represented as a directed graph.
1 code implementation • EMNLP 2021 • Swarnadeep Saha, Prateek Yadav, Lisa Bauer, Mohit Bansal
Recent commonsense-reasoning tasks are typically discriminative in nature, where a model answers a multiple-choice question for a certain context.
1 code implementation • NeurIPS 2019 • Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Vikram Nitin, Anand Louis, Partha Talukdar
In many real-world network datasets such as co-authorship, co-citation, email communication, etc., relationships are complex and go beyond pairwise.
no code implementations • ICLR 2019 • Naganand Yadati, Vikram Nitin, Madhav Nimishakavi, Prateek Yadav, Anand Louis, Partha Talukdar
Additionally, there is need to represent the direction from reactants to products.
1 code implementation • 24 Jan 2019 • Shikhar Vashishth, Prateek Yadav, Manik Bhandari, Partha Talukdar
Graph-based Semi-Supervised Learning (SSL) methods aim to address this problem by labeling a small subset of the nodes as seeds and then utilizing the graph structure to predict label scores for the rest of the nodes in the graph.
no code implementations • 14 Jan 2019 • CHIME/FRB Collaboration, :, Mandana Amiri, Kevin Bandura, Mohit Bhardwaj, Paula Boubel, Michelle M. Boyce, Patrick J. Boyle, Charanjot Brar, Maya Burhanpurkar, Pragya Chawla, Jean F. Cliche, Davor Cubranic, Meiling Deng, Nolan Denman, Matthew Dobbs, M. Fandino, Emmanuel Fonseca, Bryan M. Gaensler, Adam J. Gilbert, Utkarsh Giri, Deborah C. Good, Mark Halpern, David Hanna, Alexander S. Hill, Gary Hinshaw, C. Höfer, Alexander Josephy, Victoria M. Kaspi, Thomas L. Landecker, Dustin A. Lang, Kiyoshi W. Masui, Ryan Mckinven, Juan Mena-Parra, Marcus Merryfield, Nikola Milutinovic, Charles Moatti, Arun Naidu, Laura B. Newburgh, Cherry Ng, Chitrang Patel, Ue-Li Pen, Tristan Pinsonneault-Marotte, Ziggy Pleunis, Masoud Rafiei-Ravandi, Scott M. Ransom, Andre Renard, Paul Scholz, J. R. Shaw, Seth R. Siegel, Kendrick M. Smith, Ingrid H. Stairs, Shriharsh P. Tendulkar, Ian Tretyakov, Keith Vanderlinde, Prateek Yadav
Emission in multiple events is seen down to 400 MHz, the lowest radio frequency to which we are sensitive.
High Energy Astrophysical Phenomena
1 code implementation • ACL 2019 • Shikhar Vashishth, Manik Bhandari, Prateek Yadav, Piyush Rai, Chiranjib Bhattacharyya, Partha Talukdar
Word embeddings have been widely adopted across several NLP applications.
1 code implementation • 7 Sep 2018 • Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Vikram Nitin, Anand Louis, Partha Talukdar
In many real-world network datasets such as co-authorship, co-citation, email communication, etc., relationships are complex and go beyond pairwise.
1 code implementation • 29 May 2018 • Prateek Yadav, Madhav Nimishakavi, Naganand Yadati, Shikhar Vashishth, Arun Rajkumar, Partha Talukdar
We analyse local and global properties of graphs and demonstrate settings where LCNs tend to work better than GCNs.