Search Results for author: Prateek Yadav

Found 19 papers, 15 papers with code

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

no code implementations • 30 Mar 2024 • Taishi Nakamura, Mayank Mishra, Simone Tedeschi, Yekun Chai, Jason T Stillerman, Felix Friedrich, Prateek Yadav, Tanmay Laud, Vu Minh Chien, Terry Yue Zhuo, Diganta Misra, Ben Bogin, Xuan-Son Vu, Marzena Karpinska, Arnav Varma Dantuluri, Wojciech Kusa, Tommaso Furlanello, Rio Yokota, Niklas Muennighoff, Suhas Pai, Tosin Adewumi, Veronika Laippala, Xiaozhe Yao, Adalberto Junior, Alpay Ariyak, Aleksandr Drozd, Jordan Clive, Kshitij Gupta, Liangyu Chen, Qi Sun, Ken Tsui, Noah Persaud, Nour Fahmy, Tianlong Chen, Mohit Bansal, Nicolo Monti, Tai Dang, Ziyang Luo, Tien-Tung Bui, Roberto Navigli, Virendra Mehta, Matthew Blumberg, Victor May, Huu Nguyen, Sampo Pyysalo

Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility.

Continual Pretraining Language Modelling

Paper
Add Code

ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization

1 code implementation • 22 Nov 2023 • Prateek Yadav, Leshem Choshen, Colin Raffel, Mohit Bansal

Despite the efficiency of PEFT methods, the size of expert models can make it onerous to retrieve expert models per query over high-latency networks like the Internet or serve multiple experts on a single GPU.

Language Modelling Quantization

Paper
Code

D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning

1 code implementation • 11 Oct 2023 • Adyasha Maharana, Prateek Yadav, Mohit Bansal

There are two dominant approaches: (1) geometry-based data selection for maximizing data diversity in the coreset, and (2) functions that assign difficulty scores to samples based on training dynamics.

Paper
Code

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

1 code implementation • 2 Oct 2023 • Pingzhi Li, Zhenyu Zhang, Prateek Yadav, Yi-Lin Sung, Yu Cheng, Mohit Bansal, Tianlong Chen

Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks, however, they have issues like (a) High Memory Usage, due to duplication of the network layers into multiple copies as experts; and (b) Redundancy in Experts, as common learning-based routing policies suffer from representational collapse.

Paper
Code

Exploring Continual Learning for Code Generation Models

no code implementations • 5 Jul 2023 • Prateek Yadav, Qing Sun, Hantian Ding, Xiaopeng Li, Dejiao Zhang, Ming Tan, Xiaofei Ma, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Mohit Bansal, Bing Xiang

Large-scale code generation models such as Codex and CodeT5 have achieved impressive performance.

Code Generation Continual Learning

Paper
Add Code

TIES-Merging: Resolving Interference When Merging Models

2 code implementations • NeurIPS 2023 • Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, Mohit Bansal

To address this, we propose our method, TRIM, ELECT SIGN & MERGE (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign.

Transfer Learning

103

Paper
Code

Self-Chained Image-Language Model for Video Localization and Question Answering

1 code implementation • NeurIPS 2023 • Shoubin Yu, Jaemin Cho, Prateek Yadav, Mohit Bansal

SeViLA framework consists of two modules: Localizer and Answerer, where both are parameter-efficiently fine-tuned from BLIP-2.

Ranked #3 on Zero-Shot Video Question Answer on IntentQA (using extra training data)

Language Modelling Representation Learning +2

160

Paper
Code

Exclusive Supermask Subnetwork Training for Continual Learning

1 code implementation • 18 Oct 2022 • Prateek Yadav, Mohit Bansal

Although there is no forgetting, the performance of SupSup is sub-optimal because fixed weights restrict its representational power.

Continual Learning Text Classification +1

Paper
Code

Explanation Graph Generation via Pre-trained Language Models: An Empirical Study with Contrastive Learning

1 code implementation • ACL 2022 • Swarnadeep Saha, Prateek Yadav, Mohit Bansal

In this work, we study pre-trained language models that generate explanation graphs in an end-to-end manner and analyze their ability to learn the structural constraints and semantics of such graphs.

Contrastive Learning Graph Generation +1

Paper
Code

Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions

1 code implementation • 1 Nov 2021 • Prateek Yadav, Peter Hase, Mohit Bansal

Current approaches try to optimize for the cost incurred by users when adopting a recourse, but they assume that all users share the same cost function.

Fairness

Paper
Code

multiPRover: Generating Multiple Proofs for Improved Interpretability in Rule Reasoning

1 code implementation • NAACL 2021 • Swarnadeep Saha, Prateek Yadav, Mohit Bansal

In order to jointly learn from all proof graphs and exploit the correlations between multiple proofs for a question, we pose this task as a set generation problem over structured output spaces where each proof is represented as a directed graph.

Multi-Label Classification

Paper
Code

ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning

1 code implementation • EMNLP 2021 • Swarnadeep Saha, Prateek Yadav, Lisa Bauer, Mohit Bansal

Recent commonsense-reasoning tasks are typically discriminative in nature, where a model answers a multiple-choice question for a certain context.

Graph Generation Multiple-choice +1

Paper
Code

HyperGCN: A New Method For Training Graph Convolutional Networks on Hypergraphs

1 code implementation • NeurIPS 2019 • Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Vikram Nitin, Anand Louis, Partha Talukdar

In many real-world network datasets such as co-authorship, co-citation, email communication, etc., relationships are complex and go beyond pairwise.

173

Paper
Code

Link Prediction in Hypergraphs using Graph Convolutional Networks

no code implementations • ICLR 2019 • Naganand Yadati, Vikram Nitin, Madhav Nimishakavi, Prateek Yadav, Anand Louis, Partha Talukdar

Additionally, there is need to represent the direction from reactants to products.

Link Prediction

Paper
Add Code

Confidence-based Graph Convolutional Networks for Semi-Supervised Learning

1 code implementation • 24 Jan 2019 • Shikhar Vashishth, Prateek Yadav, Manik Bhandari, Partha Talukdar

Graph-based Semi-Supervised Learning (SSL) methods aim to address this problem by labeling a small subset of the nodes as seeds and then utilizing the graph structure to predict label scores for the rest of the nodes in the graph.