Search Results for author: Subhabrata Mukherjee

Found 58 papers, 21 papers with code

Self-training with Few-shot Rationalization

no code implementations EMNLP 2021 Meghana Moorthy Bhat, Alessandro Sordoni, Subhabrata Mukherjee

While pre-trained language models have obtained state-of-the-art performance for several natural language understanding tasks, they are quite opaque in terms of their decision-making process.

Decision Making Natural Language Understanding

RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking

1 code implementation26 Sep 2024 Yifan Jiang, Kriti Aggarwal, Tanmay Laud, Kashif Munir, Jay Pujara, Subhabrata Mukherjee

Our experiments reveal that all LLMs are vulnerable to RED QUEEN ATTACK, reaching 87. 62% attack success rate on GPT-4o and 75. 4% on Llama3-70B.

Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing

no code implementations22 Apr 2024 Dujian Ding, Ankur Mallick, Chi Wang, Robert Sim, Subhabrata Mukherjee, Victor Ruhle, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah

Large language models (LLMs) excel in most NLP tasks but also require expensive cloud servers for deployment due to their size, while smaller models that can be deployed on lower cost (e. g., edge) devices, tend to lag behind in terms of response quality.

Teaching Language Models to Hallucinate Less with Synthetic Tasks

no code implementations10 Oct 2023 Erik Jones, Hamid Palangi, Clarisse Simões, Varun Chandrasekaran, Subhabrata Mukherjee, Arindam Mitra, Ahmed Awadallah, Ece Kamar

We also find that optimizing the system message rather than the model weights can be critical; fine-tuning the entire model on the synthetic task can counterintuitively increase hallucination.

Abstractive Text Summarization Hallucination +3

Task-Based MoE for Multitask Multilingual Machine Translation

no code implementations30 Aug 2023 Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff, Barnabas Poczos, Hany Hassan Awadalla

Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications.

Machine Translation Translation

SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference

no code implementations5 Jul 2023 Luciano del Corro, Allie Del Giorno, Sahaj Agarwal, Bin Yu, Ahmed Awadallah, Subhabrata Mukherjee

While existing token-level early exit methods show promising results for online inference, they cannot be readily applied for batch inferencing and Key-Value caching.

Text Generation

Adversarial Robustness of Prompt-based Few-Shot Learning for Natural Language Understanding

1 code implementation19 Jun 2023 Venkata Prabhakara Sarath Nookala, Gaurav Verma, Subhabrata Mukherjee, Srijan Kumar

Our results on six GLUE tasks indicate that compared to fully fine-tuned models, vanilla FSL methods lead to a notable relative drop in task performance (i. e., are less robust) in the face of adversarial perturbations.

Adversarial Robustness Few-Shot Learning +1

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

4 code implementations5 Jun 2023 Subhabrata Mukherjee, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi, Ahmed Awadallah

To address these challenges, we develop Orca (We are working with our legal team to publicly release a diff of the model weights in accordance with LLaMA's release policy to be published at https://aka. ms/orca-lm), a 13-billion parameter model that learns to imitate the reasoning process of LFMs.

Imitation Learning Knowledge Distillation

GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions

no code implementations24 May 2023 Woojeong Jin, Subhabrata Mukherjee, Yu Cheng, Yelong Shen, Weizhu Chen, Ahmed Hassan Awadallah, Damien Jose, Xiang Ren

Generalization to unseen tasks is an important ability for few-shot learners to achieve better zero-/few-shot performance on diverse tasks.

Object Question Answering +2

ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models

2 code implementations23 May 2023 Binfeng Xu, Zhiyuan Peng, Bowen Lei, Subhabrata Mukherjee, Yuchen Liu, Dongkuan Xu

Augmented Language Models (ALMs) blend the reasoning capabilities of Large Language Models (LLMs) with tools that allow for knowledge retrieval and action execution.

Retrieval

Accelerating Dataset Distillation via Model Augmentation

2 code implementations CVPR 2023 Lei Zhang, Jie Zhang, Bowen Lei, Subhabrata Mukherjee, Xiang Pan, Bo Zhao, Caiwen Ding, Yao Li, Dongkuan Xu

Dataset Distillation (DD), a newly emerging field, aims at generating much smaller but efficient synthetic training datasets from large ones.

Dataset Distillation

AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

1 code implementation31 Oct 2022 Yaqing Wang, Sahaj Agarwal, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao

Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updating hundreds of millions to billions of parameters, and storing a large copy of the PLM weights for every task resulting in increased cost for storing, sharing and serving the models.

parameter-efficient fine-tuning

Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints

no code implementations6 Oct 2022 Ganesh Jawahar, Subhabrata Mukherjee, Debadeepta Dey, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Caio Cesar Teodoro Mendes, Gustavo Henrique de Rosa, Shital Shah

In this work, we study the more challenging open-domain setting consisting of low frequency user prompt patterns (or broad prompts, e. g., prompt about 93rd academy awards) and demonstrate the effectiveness of character-based language models.

Inductive Bias

ADMoE: Anomaly Detection with Mixture-of-Experts from Noisy Labels

1 code implementation24 Aug 2022 Yue Zhao, Guoqing Zheng, Subhabrata Mukherjee, Robert McCann, Ahmed Awadallah

In this work, we propose a method to leverage weak/noisy labels (e. g., risk scores generated by machine rules for detecting malware) that are cheaper to obtain for anomaly detection.

Anomaly Detection

AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

1 code implementation24 May 2022 Yaqing Wang, Sahaj Agarwal, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao

Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updating hundreds of millions to billions of parameters, and storing a large copy of the PLM weights for every task resulting in increased cost for storing, sharing and serving the models.

Natural Language Understanding parameter-efficient fine-tuning +1

Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners

no code implementations16 Apr 2022 Shashank Gupta, Subhabrata Mukherjee, Krishan Subudhi, Eduardo Gonzalez, Damien Jose, Ahmed H. Awadallah, Jianfeng Gao

Traditional multi-task learning (MTL) methods use dense networks that use the same set of shared weights across several different tasks.

Multi-Task Learning

LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models

1 code implementation4 Mar 2022 Mojan Javaheripi, Gustavo H. de Rosa, Subhabrata Mukherjee, Shital Shah, Tomasz L. Religa, Caio C. T. Mendes, Sebastien Bubeck, Farinaz Koushanfar, Debadeepta Dey

Results show that the perplexity of 16-layer GPT-2 and Transformer-XL can be achieved with up to 1. 5x, 2. 5x faster runtime and 1. 2x, 2. 0x lower peak memory utilization.

Decoder Language Modelling +1

AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models

no code implementations29 Jan 2022 Dongkuan Xu, Subhabrata Mukherjee, Xiaodong Liu, Debadeepta Dey, Wenhui Wang, Xiang Zhang, Ahmed Hassan Awadallah, Jianfeng Gao

Our framework AutoDistil addresses above challenges with the following steps: (a) Incorporates inductive bias and heuristics to partition Transformer search space into K compact sub-spaces (K=3 for typical student sizes of base, small and tiny); (b) Trains one SuperLM for each sub-space using task-agnostic objective (e. g., self-attention distillation) with weight-sharing of students; (c) Lightweight search for the optimal student without re-training.

Inductive Bias Knowledge Distillation +1

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

1 code implementation4 Nov 2021 Subhabrata Mukherjee, Xiaodong Liu, Guoqing Zheng, Saghar Hosseini, Hao Cheng, Greg Yang, Christopher Meek, Ahmed Hassan Awadallah, Jianfeng Gao

We demonstrate that while recent models reach human performance when they have access to large amounts of labeled data, there is a huge gap in performance in the few-shot setting for most tasks.

Few-Shot Learning Natural Language Understanding

Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding

no code implementations16 Oct 2021 Mengnan Du, Subhabrata Mukherjee, Yu Cheng, Milad Shokouhi, Xia Hu, Ahmed Hassan Awadallah

Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the in-distribution performance for downstream tasks.

Knowledge Distillation Model Compression +1

Self-training with Few-shot Rationalization: Teacher Explanations Aid Student in Few-shot NLU

no code implementations17 Sep 2021 Meghana Moorthy Bhat, Alessandro Sordoni, Subhabrata Mukherjee

While pre-trained language models have obtained state-of-the-art performance for several natural language understanding tasks, they are quite opaque in terms of their decision-making process.

Decision Making Natural Language Understanding

Fairness via Representation Neutralization

no code implementations NeurIPS 2021 Mengnan Du, Subhabrata Mukherjee, Guanchu Wang, Ruixiang Tang, Ahmed Hassan Awadallah, Xia Hu

This process not only requires a lot of instance-level annotations for sensitive attributes, it also does not guarantee that all fairness sensitive information has been removed from the encoder.

Attribute Classification +1

XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation

1 code implementation8 Jun 2021 Subhabrata Mukherjee, Ahmed Hassan Awadallah, Jianfeng Gao

While deep and large pre-trained models are the state-of-the-art for various natural language processing tasks, their huge size poses significant challenges for practical uses in resource constrained settings.

Knowledge Distillation NER +1

MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning

2 code implementations NAACL 2021 Mengzhou Xia, Guoqing Zheng, Subhabrata Mukherjee, Milad Shokouhi, Graham Neubig, Ahmed Hassan Awadallah

Extensive experiments on real-world low-resource languages - without access to large-scale monolingual corpora or large amounts of labeled data - for tasks like cross-lingual sentiment analysis and named entity recognition show the effectiveness of our approach.

Cross-Lingual Transfer Meta-Learning +5

Self-Training with Weak Supervision

1 code implementation NAACL 2021 Giannis Karamanolakis, Subhabrata Mukherjee, Guoqing Zheng, Ahmed Hassan Awadallah

In this work, we develop a weak supervision framework (ASTRA) that leverages all the available data for a given task.

text-classification Text Classification

Adaptive Self-training for Neural Sequence Labeling with Few Labels

no code implementations1 Jan 2021 Yaqing Wang, Subhabrata Mukherjee, Haoda Chu, Yuancheng Tu, Ming Wu, Jing Gao, Ahmed Hassan Awadallah

Neural sequence labeling is an important technique employed for many Natural Language Processing (NLP) tasks, such as Named Entity Recognition (NER), slot tagging for dialog systems and semantic parsing.

Meta-Learning named-entity-recognition +3

Uncertainty-aware Self-training for Few-shot Text Classification

no code implementations NeurIPS 2020 Subhabrata Mukherjee, Ahmed Awadallah

Recent success of pre-trained language models crucially hinges on fine-tuning them on large amounts of labeled data for the downstream task, that are typically expensive to acquire or difficult to access for many applications.

Few-Shot Text Classification General Classification +1

Adaptive Self-training for Few-shot Neural Sequence Labeling

no code implementations7 Oct 2020 Yaqing Wang, Subhabrata Mukherjee, Haoda Chu, Yuancheng Tu, Ming Wu, Jing Gao, Ahmed Hassan Awadallah

While self-training serves as an effective mechanism to learn from large amounts of unlabeled data -- meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels.

Meta-Learning named-entity-recognition +3

Smart To-Do: Automatic Generation of To-Do Items from Emails

no code implementations ACL 2020 Sudipto Mukherjee, Subhabrata Mukherjee, Marcello Hasegawa, Ahmed Hassan Awadallah, Ryen White

Intelligent features in email service applications aim to increase productivity by helping people organize their folders, compose their emails and respond to pending tasks.

Management Text Generation

Learning with Weak Supervision for Email Intent Detection

no code implementations26 May 2020 Kai Shu, Subhabrata Mukherjee, Guoqing Zheng, Ahmed Hassan Awadallah, Milad Shokouhi, Susan Dumais

In this paper, we propose to leverage user actions as a source of weak supervision, in addition to a limited set of annotated examples, to detect intents in emails.

intent-classification Intent Classification +2

Product Insights: Analyzing Product Intents in Web Search

no code implementations18 May 2020 Nikitha Rao, Chetan Bansal, Subhabrata Mukherjee, Chandra Maddila

Web search engines are frequently used to access information about products.

Smart To-Do : Automatic Generation of To-Do Items from Emails

no code implementations5 May 2020 Sudipto Mukherjee, Subhabrata Mukherjee, Marcello Hasegawa, Ahmed Hassan Awadallah, Ryen White

Intelligent features in email service applications aim to increase productivity by helping people organize their folders, compose their emails and respond to pending tasks.

Management Text Generation

Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data

no code implementations4 Oct 2019 Subhabrata Mukherjee, Ahmed Hassan Awadallah

We show that our student models can compress the huge teacher by up to 26x while still matching or even marginally exceeding the teacher performance in low-resource settings with small amount of labeled data.

Knowledge Distillation NER

GhostLink: Latent Network Inference for Influence-aware Recommendation

no code implementations15 May 2019 Subhabrata Mukherjee, Stephan Guennemann

As additional use-cases, we show that GhostLink can be used to differentiate between users' latent preferences and influenced ones, as well as to detect influential users based on the learned influence graph.

OpenKI: Integrating Open Information Extraction and Knowledge Bases with Relation Inference

1 code implementation NAACL 2019 Dongxu Zhang, Subhabrata Mukherjee, Colin Lockard, Xin Luna Dong, Andrew McCallum

In this paper, we consider advancing web-scale knowledge extraction and alignment by integrating OpenIE extractions in the form of (subject, predicate, object) triples with Knowledge Bases (KB).

Open Information Extraction Relation

Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities

no code implementations26 Jul 2017 Subhabrata Mukherjee

To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation.

Language Modelling Recommendation Systems +2

Credible Review Detection with Limited Information using Consistency Analysis

no code implementations7 May 2017 Subhabrata Mukherjee, Sourav Dutta, Gerhard Weikum

Online reviews provide viewpoints on the strengths and shortcomings of products/services, influencing potential customers' purchasing decisions.

Topic Models

People on Media: Jointly Identifying Credible News and Trustworthy Citizen Journalists in Online Communities

no code implementations7 May 2017 Subhabrata Mukherjee, Gerhard Weikum

This paper presents a model to systematically analyze the different interactions in a news community between users, news, and sources.

Fairness

Exploring Latent Semantic Factors to Find Useful Product Reviews

no code implementations6 May 2017 Subhabrata Mukherjee, Kashyap Popat, Gerhard Weikum

In this work, we attempt to automatically identify review quality in terms of its helpfulness to the end consumers.

Item Recommendation with Evolving User Preferences and Experience

no code implementations6 May 2017 Subhabrata Mukherjee, Hemank Lamba, Gerhard Weikum

As only item ratings and review texts are observables, we capture the user's experience and interests in a latent model learned from her reviews, vocabulary and writing style.

Collaborative Filtering Recommendation Systems

Author-Specific Sentiment Aggregation for Polarity Prediction of Reviews

no code implementations LREC 2014 Subhabrata Mukherjee, Sachindra Joshi

Furthermore, we also show the effectiveness of our approach in capturing thwarting in reviews, achieving an accuracy improvement of 11. 53{\%} over the SVM baseline.

Dependency Parsing General Classification +2

Sentiment Analysis : A Literature Survey

no code implementations16 Apr 2013 Subhabrata Mukherjee, Pushpak Bhattacharyya

We will discuss in details various approaches to perform a computational treatment of sentiments and opinions.

Opinion Mining Sentiment Analysis +1

Cannot find the paper you are looking for? You can Submit a new open access paper.