Search Results for author: Dheeraj Mekala

Found 20 papers, 12 papers with code

SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations

4 code implementations • EMNLP 2017 • Dheeraj Mekala, Vivek Gupta, Bhargavi Paranjape, Harish Karnick

We present a feature vector formation technique for documents - Sparse Composite Document Vector (SCDV) - which overcomes several shortcomings of the current distributional paragraph vector representations that are widely used for text representation.

Clustering Information Retrieval +3

Paper
Code

X-Class: Text Classification with Extremely Weak Supervision

3 code implementations • NAACL 2021 • Zihan Wang, Dheeraj Mekala, Jingbo Shang

Finally, we pick the most confident documents from each cluster to train a text classifier.

Clustering General Classification +3

Paper
Code

Contextualized Weak Supervision for Text Classification

1 code implementation • ACL 2020 • Dheeraj Mekala, Jingbo Shang

Weakly supervised text classification based on a few user-provided seed words has recently attracted much attention from researchers.

General Classification text-classification +1

Paper
Code

TOOLVERIFIER: Generalization to New Tools via Self-Verification

1 code implementation • 21 Feb 2024 • Dheeraj Mekala, Jason Weston, Jack Lanchantin, Roberta Raileanu, Maria Lomeli, Jingbo Shang, Jane Dwivedi-Yu

Teaching language models to use tools is an important milestone towards building general assistants, but remains an open problem.

Paper
Code

Smaller Language Models are capable of selecting Instruction-Tuning Training Data for Larger Language Models

1 code implementation • 16 Feb 2024 • Dheeraj Mekala, Alex Nguyen, Jingbo Shang

In this paper, we introduce a novel training data selection based on the learning percentage of the samples.

Paper
Code

META: Metadata-Empowered Weak Supervision for Text Classification

1 code implementation • EMNLP 2020 • Dheeraj Mekala, Xinyang Zhang, Jingbo Shang

Based on seed words, we rank and filter motif instances to distill highly label-indicative ones as {``}seed motifs{''}, which provide additional weak supervision.

General Classification text-classification +2

Paper
Code

Leveraging QA Datasets to Improve Generative Data Augmentation

2 code implementations • 25 May 2022 • Dheeraj Mekala, Tu Vu, Timo Schick, Jingbo Shang

The ability of generative language models (GLMs) to generate text has improved considerably in the last few years, enabling their use for generative data augmentation.

Common Sense Reasoning Data Augmentation +3

Paper
Code

LOPS: Learning Order Inspired Pseudo-Label Selection for Weakly Supervised Text Classification

1 code implementation • 25 May 2022 • Dheeraj Mekala, chengyu dong, Jingbo Shang

Weakly supervised text classification methods typically train a deep neural classifier based on pseudo-labels.

Memorization Pseudo Label +2

Paper
Code

A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches

1 code implementation • 22 May 2023 • Zihan Wang, Tianle Wang, Dheeraj Mekala, Jingbo Shang

Etremely Weakly Supervised Text Classification (XWS-TC) refers to text classification based on minimal high-level human guidance, such as a few label-indicative seed words or classification instructions.

Benchmarking text-classification +1

Paper
Code

Progressive Sentiment Analysis for Code-Switched Text Data

1 code implementation • 25 Oct 2022 • Sudhanshu Ranjan, Dheeraj Mekala, Jingbo Shang

Instead of training on the entire code-switched corpus at once, we create buckets based on the fraction of words in the resource-rich language and progressively train from resource-rich language dominated samples to low-resource language dominated samples.

Cross-Lingual Transfer named-entity-recognition +6

Paper
Code

Bayes-optimal Hierarchical Classification over Asymmetric Tree-Distance Loss

no code implementations • 17 Feb 2018 • Dheeraj Mekala, Vivek Gupta, Purushottam Kar, Harish Karnick

We extend the consistency of hierarchical classification algorithm over asymmetric tree distance loss.

Classification General Classification +1

Paper
Add Code

User Bias Removal in Review Score Prediction

no code implementations • 20 Dec 2016 • Rahul Wadbude, Vivek Gupta, Dheeraj Mekala, Harish Karnick

Review score prediction of text reviews has recently gained a lot of attention in recommendation systems.

Recommendation Systems

Paper
Add Code

News Meets Microblog: Hashtag Annotation via Retriever-Generator

1 code implementation • 18 Apr 2021 • Xiuwen Zheng, Dheeraj Mekala, Amarnath Gupta, Jingbo Shang

Hashtag annotation for microblog posts has been recently formulated as a sequence generation problem to handle emerging hashtags that are unseen in the training set.

Paper
Code

Coarse2Fine: Fine-grained Text Classification on Coarsely-grained Annotated Data

no code implementations • EMNLP 2021 • Dheeraj Mekala, Varun Gangal, Jingbo Shang

Existing text classification methods mainly focus on a fixed label set, whereas many real-world applications require extending to new fine-grained classes as the number of samples per label increases.

text-classification Text Classification +1

Paper
Add Code

BFClass: A Backdoor-free Text Classification Framework

no code implementations • Findings (EMNLP) 2021 • Zichao Li, Dheeraj Mekala, chengyu dong, Jingbo Shang

To recognize the poisoned subset, we examine the training samples with these identified triggers as the most suspicious token, and check if removing the trigger will change the poisoned model's prediction.

Backdoor Attack Language Modelling +2

Paper
Add Code

ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language Models

no code implementations • 21 Dec 2022 • Dheeraj Mekala, Jason Wolfe, Subhro Roy

For each utterance, we prompt the LLM with questions corresponding to its top-level intent and a set of slots and use the LLM generations to construct the target meaning representation.

Extractive Question-Answering Language Modelling +3

Paper
Add Code

SELFOOD: Self-Supervised Out-Of-Distribution Detection via Learning to Rank

1 code implementation • 24 May 2023 • Dheeraj Mekala, Adithya Samavedhi, chengyu dong, Jingbo Shang

To address the annotation bottleneck, we introduce SELFOOD, a self-supervised OOD detection method that requires only in-distribution samples as supervision.

Learning-To-Rank Out-of-Distribution Detection +1

Paper
Code

DAIL: Data Augmentation for In-Context Learning via Self-Paraphrase

no code implementations • 6 Nov 2023 • Dawei Li, Yaxuan Li, Dheeraj Mekala, Shuyao Li, Yulin Wang, Xueqi Wang, William Hogan, Jingbo Shang

DAIL leverages the intuition that large language models are more familiar with the content generated by themselves.

Data Augmentation In-Context Learning +1

Paper
Add Code

MORL-Prompt: An Empirical Analysis of Multi-Objective Reinforcement Learning for Discrete Prompt Optimization

no code implementations • 18 Feb 2024 • Yasaman Jafari, Dheeraj Mekala, Rose Yu, Taylor Berg-Kirkpatrick

RL-based techniques can be used to search for prompts that when fed into a target language model maximize a set of user-specified reward functions.

Language Modelling Machine Translation +2

Paper
Add Code

DOCMASTER: A Unified Platform for Annotation, Training, & Inference in Document Question-Answering

no code implementations • 30 Mar 2024 • Alex Nguyen, Zilong Wang, Jingbo Shang, Dheeraj Mekala

The application of natural language processing models to PDF documents is pivotal for various business applications yet the challenge of training models for this purpose persists in businesses due to specific hurdles.

Privacy Preserving Question Answering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.