no code implementations • EMNLP 2021 • Meghana Moorthy Bhat, Alessandro Sordoni, Subhabrata Mukherjee
While pre-trained language models have obtained state-of-the-art performance for several natural language understanding tasks, they are quite opaque in terms of their decision-making process.
1 code implementation • 26 Sep 2024 • Yifan Jiang, Kriti Aggarwal, Tanmay Laud, Kashif Munir, Jay Pujara, Subhabrata Mukherjee
Our experiments reveal that all LLMs are vulnerable to RED QUEEN ATTACK, reaching 87. 62% attack success rate on GPT-4o and 75. 4% on Llama3-70B.
no code implementations • 22 Apr 2024 • Dujian Ding, Ankur Mallick, Chi Wang, Robert Sim, Subhabrata Mukherjee, Victor Ruhle, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah
Large language models (LLMs) excel in most NLP tasks but also require expensive cloud servers for deployment due to their size, while smaller models that can be deployed on lower cost (e. g., edge) devices, tend to lag behind in terms of response quality.
no code implementations • 20 Mar 2024 • Subhabrata Mukherjee, Paul Gamble, Markel Sanz Ausin, Neel Kant, Kriti Aggarwal, Neha Manjunath, Debajyoti Datta, Zhengliang Liu, Jiayuan Ding, Sophia Busacca, Cezanne Bianco, Swapnil Sharma, Rae Lasko, Michelle Voisard, Sanchay Harneja, Darya Filippova, Gerry Meixiong, Kevin Cha, Amir Youssefi, Meyhaa Buvanesh, Howard Weingram, Sebastian Bierman-Lytle, Harpreet Singh Mangat, Kim Parikh, Saad Godil, Alex Miller
We train our models on proprietary data, clinical care plans, healthcare regulatory documents, medical manuals, and other medical reasoning documents.
no code implementations • 10 Oct 2023 • Erik Jones, Hamid Palangi, Clarisse Simões, Varun Chandrasekaran, Subhabrata Mukherjee, Arindam Mitra, Ahmed Awadallah, Ece Kamar
We also find that optimizing the system message rather than the model weights can be critical; fine-tuning the entire model on the synthetic task can counterintuitively increase hallucination.
no code implementations • 30 Aug 2023 • Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff, Barnabas Poczos, Hany Hassan Awadalla
Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications.
no code implementations • 5 Jul 2023 • Luciano del Corro, Allie Del Giorno, Sahaj Agarwal, Bin Yu, Ahmed Awadallah, Subhabrata Mukherjee
While existing token-level early exit methods show promising results for online inference, they cannot be readily applied for batch inferencing and Key-Value caching.
1 code implementation • 19 Jun 2023 • Venkata Prabhakara Sarath Nookala, Gaurav Verma, Subhabrata Mukherjee, Srijan Kumar
Our results on six GLUE tasks indicate that compared to fully fine-tuned models, vanilla FSL methods lead to a notable relative drop in task performance (i. e., are less robust) in the face of adversarial perturbations.
4 code implementations • 5 Jun 2023 • Subhabrata Mukherjee, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi, Ahmed Awadallah
To address these challenges, we develop Orca (We are working with our legal team to publicly release a diff of the model weights in accordance with LLaMA's release policy to be published at https://aka. ms/orca-lm), a 13-billion parameter model that learns to imitate the reasoning process of LFMs.
no code implementations • 24 May 2023 • Woojeong Jin, Subhabrata Mukherjee, Yu Cheng, Yelong Shen, Weizhu Chen, Ahmed Hassan Awadallah, Damien Jose, Xiang Ren
Generalization to unseen tasks is an important ability for few-shot learners to achieve better zero-/few-shot performance on diverse tasks.
2 code implementations • 23 May 2023 • Binfeng Xu, Zhiyuan Peng, Bowen Lei, Subhabrata Mukherjee, Yuchen Liu, Dongkuan Xu
Augmented Language Models (ALMs) blend the reasoning capabilities of Large Language Models (LLMs) with tools that allow for knowledge retrieval and action execution.
1 code implementation • 3 May 2023 • Nitay Calderon, Subhabrata Mukherjee, Roi Reichart, Amir Kantor
In this work, we study the potential of compressing them, which is crucial for real-world applications serving millions of users.
2 code implementations • CVPR 2023 • Lei Zhang, Jie Zhang, Bowen Lei, Subhabrata Mukherjee, Xiang Pan, Bo Zhao, Caiwen Ding, Yao Li, Dongkuan Xu
Dataset Distillation (DD), a newly emerging field, aims at generating much smaller but efficient synthetic training datasets from large ones.
1 code implementation • 31 Oct 2022 • Yaqing Wang, Sahaj Agarwal, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updating hundreds of millions to billions of parameters, and storing a large copy of the PLM weights for every task resulting in increased cost for storing, sharing and serving the models.
1 code implementation • 14 Oct 2022 • Ganesh Jawahar, Subhabrata Mukherjee, Xiaodong Liu, Young Jin Kim, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah, Sebastien Bubeck, Jianfeng Gao
Furthermore, existing MoE works do not consider computational constraints (e. g., FLOPs, latency) to guide their design.
no code implementations • 6 Oct 2022 • Ganesh Jawahar, Subhabrata Mukherjee, Debadeepta Dey, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Caio Cesar Teodoro Mendes, Gustavo Henrique de Rosa, Shital Shah
In this work, we study the more challenging open-domain setting consisting of low frequency user prompt patterns (or broad prompts, e. g., prompt about 93rd academy awards) and demonstrate the effectiveness of character-based language models.
1 code implementation • 24 Aug 2022 • Yue Zhao, Guoqing Zheng, Subhabrata Mukherjee, Robert McCann, Ahmed Awadallah
In this work, we propose a method to leverage weak/noisy labels (e. g., risk scores generated by machine rules for detecting malware) that are cheaper to obtain for anomaly detection.
1 code implementation • 24 May 2022 • Yaqing Wang, Sahaj Agarwal, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updating hundreds of millions to billions of parameters, and storing a large copy of the PLM weights for every task resulting in increased cost for storing, sharing and serving the models.
Natural Language Understanding parameter-efficient fine-tuning +1
no code implementations • 16 Apr 2022 • Shashank Gupta, Subhabrata Mukherjee, Krishan Subudhi, Eduardo Gonzalez, Damien Jose, Ahmed H. Awadallah, Jianfeng Gao
Traditional multi-task learning (MTL) methods use dense networks that use the same set of shared weights across several different tasks.
1 code implementation • 4 Mar 2022 • Mojan Javaheripi, Gustavo H. de Rosa, Subhabrata Mukherjee, Shital Shah, Tomasz L. Religa, Caio C. T. Mendes, Sebastien Bubeck, Farinaz Koushanfar, Debadeepta Dey
Results show that the perplexity of 16-layer GPT-2 and Transformer-XL can be achieved with up to 1. 5x, 2. 5x faster runtime and 1. 2x, 2. 0x lower peak memory utilization.
no code implementations • 29 Jan 2022 • Dongkuan Xu, Subhabrata Mukherjee, Xiaodong Liu, Debadeepta Dey, Wenhui Wang, Xiang Zhang, Ahmed Hassan Awadallah, Jianfeng Gao
Our framework AutoDistil addresses above challenges with the following steps: (a) Incorporates inductive bias and heuristics to partition Transformer search space into K compact sub-spaces (K=3 for typical student sizes of base, small and tiny); (b) Trains one SuperLM for each sub-space using task-agnostic objective (e. g., self-attention distillation) with weight-sharing of students; (c) Lightweight search for the optimal student without re-training.
1 code implementation • 4 Nov 2021 • Subhabrata Mukherjee, Xiaodong Liu, Guoqing Zheng, Saghar Hosseini, Hao Cheng, Greg Yang, Christopher Meek, Ahmed Hassan Awadallah, Jianfeng Gao
We demonstrate that while recent models reach human performance when they have access to large amounts of labeled data, there is a huge gap in performance in the few-shot setting for most tasks.
no code implementations • 16 Oct 2021 • Mengnan Du, Subhabrata Mukherjee, Yu Cheng, Milad Shokouhi, Xia Hu, Ahmed Hassan Awadallah
Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the in-distribution performance for downstream tasks.
1 code implementation • Findings (NAACL) 2022 • Yaqing Wang, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao
The first is the use of self-training to leverage large amounts of unlabeled data for prompt-based FN in few-shot settings.
no code implementations • 17 Sep 2021 • Meghana Moorthy Bhat, Alessandro Sordoni, Subhabrata Mukherjee
While pre-trained language models have obtained state-of-the-art performance for several natural language understanding tasks, they are quite opaque in terms of their decision-making process.
no code implementations • NeurIPS 2021 • Mengnan Du, Subhabrata Mukherjee, Guanchu Wang, Ruixiang Tang, Ahmed Hassan Awadallah, Xia Hu
This process not only requires a lot of instance-level annotations for sensitive attributes, it also does not guarantee that all fairness sensitive information has been removed from the encoder.
1 code implementation • 8 Jun 2021 • Subhabrata Mukherjee, Ahmed Hassan Awadallah, Jianfeng Gao
While deep and large pre-trained models are the state-of-the-art for various natural language processing tasks, their huge size poses significant challenges for practical uses in resource constrained settings.
2 code implementations • NAACL 2021 • Mengzhou Xia, Guoqing Zheng, Subhabrata Mukherjee, Milad Shokouhi, Graham Neubig, Ahmed Hassan Awadallah
Extensive experiments on real-world low-resource languages - without access to large-scale monolingual corpora or large amounts of labeled data - for tasks like cross-lingual sentiment analysis and named entity recognition show the effectiveness of our approach.
1 code implementation • NAACL 2021 • Giannis Karamanolakis, Subhabrata Mukherjee, Guoqing Zheng, Ahmed Hassan Awadallah
In this work, we develop a weak supervision framework (ASTRA) that leverages all the available data for a given task.
no code implementations • 1 Jan 2021 • Yaqing Wang, Subhabrata Mukherjee, Haoda Chu, Yuancheng Tu, Ming Wu, Jing Gao, Ahmed Hassan Awadallah
Neural sequence labeling is an important technique employed for many Natural Language Processing (NLP) tasks, such as Named Entity Recognition (NER), slot tagging for dialog systems and semantic parsing.
no code implementations • NeurIPS 2020 • Subhabrata Mukherjee, Ahmed Awadallah
Recent success of pre-trained language models crucially hinges on fine-tuning them on large amounts of labeled data for the downstream task, that are typically expensive to acquire or difficult to access for many applications.
no code implementations • 7 Oct 2020 • Yaqing Wang, Subhabrata Mukherjee, Haoda Chu, Yuancheng Tu, Ming Wu, Jing Gao, Ahmed Hassan Awadallah
While self-training serves as an effective mechanism to learn from large amounts of unlabeled data -- meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.
no code implementations • ACL 2020 • Sudipto Mukherjee, Subhabrata Mukherjee, Marcello Hasegawa, Ahmed Hassan Awadallah, Ryen White
Intelligent features in email service applications aim to increase productivity by helping people organize their folders, compose their emails and respond to pending tasks.
no code implementations • NeurIPS 2020 • Subhabrata Mukherjee, Ahmed Hassan Awadallah
Standard self-training mechanism randomly samples instances from the unlabeled pool to pseudo-label and augment labeled data.
no code implementations • 26 May 2020 • Kai Shu, Subhabrata Mukherjee, Guoqing Zheng, Ahmed Hassan Awadallah, Milad Shokouhi, Susan Dumais
In this paper, we propose to leverage user actions as a source of weak supervision, in addition to a limited set of annotated examples, to detect intents in emails.
no code implementations • 18 May 2020 • Nikitha Rao, Chetan Bansal, Subhabrata Mukherjee, Chandra Maddila
Web search engines are frequently used to access information about products.
no code implementations • 5 May 2020 • Sudipto Mukherjee, Subhabrata Mukherjee, Marcello Hasegawa, Ahmed Hassan Awadallah, Ryen White
Intelligent features in email service applications aim to increase productivity by helping people organize their folders, compose their emails and respond to pending tasks.
1 code implementation • ACL 2020 • Jieyu Zhao, Subhabrata Mukherjee, Saghar Hosseini, Kai-Wei Chang, Ahmed Hassan Awadallah
In this paper, we study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
no code implementations • ACL 2020 • Subhabrata Mukherjee, Ahmed Awadallah
Deep and large pre-trained language models are the state-of-the-art for various natural language processing tasks.
no code implementations • 3 Apr 2020 • Kai Shu, Guoqing Zheng, Yichuan Li, Subhabrata Mukherjee, Ahmed Hassan Awadallah, Scott Ruston, Huan Liu
Social media has greatly enabled people to participate in online activities at an unprecedented rate.
1 code implementation • IJCNLP 2019 • Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, Gerhard Weikum
Controversial claims are abundant in online media and discussion forums.
no code implementations • 4 Oct 2019 • Subhabrata Mukherjee, Ahmed Hassan Awadallah
We show that our student models can compress the huge teacher by up to 26x while still matching or even marginally exceeding the teacher performance in low-resource settings with small amount of labeled data.
no code implementations • 15 May 2019 • Subhabrata Mukherjee, Stephan Guennemann
As additional use-cases, we show that GhostLink can be used to differentiate between users' latent preferences and influenced ones, as well as to detect influential users based on the learned influence graph.
1 code implementation • NAACL 2019 • Dongxu Zhang, Subhabrata Mukherjee, Colin Lockard, Xin Luna Dong, Andrew McCallum
In this paper, we consider advancing web-scale knowledge extraction and alignment by integrating OpenIE extractions in the form of (subject, predicate, object) triples with Knowledge Bases (KB).
2 code implementations • EMNLP 2018 • Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, Gerhard Weikum
Misinformation such as fake news is one of the big challenges of our society.
2 code implementations • 1 Jun 2018 • Guineng Zheng, Subhabrata Mukherjee, Xin Luna Dong, Fei-Fei Li
We study this problem in the context of product catalogs that often have missing values for many attributes of interest.
no code implementations • 26 Jul 2017 • Subhabrata Mukherjee
To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation.
no code implementations • 7 May 2017 • Subhabrata Mukherjee, Sourav Dutta, Gerhard Weikum
Online reviews provide viewpoints on the strengths and shortcomings of products/services, influencing potential customers' purchasing decisions.
no code implementations • 7 May 2017 • Subhabrata Mukherjee, Stephan Guennemann, Gerhard Weikum
Online review communities are dynamic as users join and leave, adopt new vocabulary, and adapt to evolving trends.
no code implementations • 7 May 2017 • Subhabrata Mukherjee, Gerhard Weikum
This paper presents a model to systematically analyze the different interactions in a news community between users, news, and sources.
no code implementations • 6 May 2017 • Subhabrata Mukherjee, Kashyap Popat, Gerhard Weikum
In this work, we attempt to automatically identify review quality in terms of its helpfulness to the end consumers.
no code implementations • 6 May 2017 • Subhabrata Mukherjee, Gerhard Weikum, Cristian Danescu-Niculescu-Mizil
Online health communities are a valuable source of information for patients and physicians.
no code implementations • 6 May 2017 • Subhabrata Mukherjee, Hemank Lamba, Gerhard Weikum
As only item ratings and review texts are observables, we capture the user's experience and interests in a latent model learned from her reviews, vocabulary and writing style.
no code implementations • LREC 2014 • Subhabrata Mukherjee, Sachindra Joshi
Furthermore, we also show the effectiveness of our approach in capturing thwarting in reviews, achieving an accuracy improvement of 11. 53{\%} over the SVM baseline.
no code implementations • 16 Apr 2013 • Subhabrata Mukherjee, Pushpak Bhattacharyya
We will discuss in details various approaches to perform a computational treatment of sentiments and opinions.