4 code implementations • EMNLP 2017 • Dheeraj Mekala, Vivek Gupta, Bhargavi Paranjape, Harish Karnick
We present a feature vector formation technique for documents - Sparse Composite Document Vector (SCDV) - which overcomes several shortcomings of the current distributional paragraph vector representations that are widely used for text representation.
3 code implementations • NAACL 2021 • Zihan Wang, Dheeraj Mekala, Jingbo Shang
Finally, we pick the most confident documents from each cluster to train a text classifier.
1 code implementation • ACL 2020 • Dheeraj Mekala, Jingbo Shang
Weakly supervised text classification based on a few user-provided seed words has recently attracted much attention from researchers.
1 code implementation • 21 Feb 2024 • Dheeraj Mekala, Jason Weston, Jack Lanchantin, Roberta Raileanu, Maria Lomeli, Jingbo Shang, Jane Dwivedi-Yu
Teaching language models to use tools is an important milestone towards building general assistants, but remains an open problem.
1 code implementation • 16 Feb 2024 • Dheeraj Mekala, Alex Nguyen, Jingbo Shang
In this paper, we introduce a novel training data selection based on the learning percentage of the samples.
1 code implementation • EMNLP 2020 • Dheeraj Mekala, Xinyang Zhang, Jingbo Shang
Based on seed words, we rank and filter motif instances to distill highly label-indicative ones as {``}seed motifs{''}, which provide additional weak supervision.
2 code implementations • 25 May 2022 • Dheeraj Mekala, Tu Vu, Timo Schick, Jingbo Shang
The ability of generative language models (GLMs) to generate text has improved considerably in the last few years, enabling their use for generative data augmentation.
1 code implementation • 25 May 2022 • Dheeraj Mekala, chengyu dong, Jingbo Shang
Weakly supervised text classification methods typically train a deep neural classifier based on pseudo-labels.
1 code implementation • 22 May 2023 • Zihan Wang, Tianle Wang, Dheeraj Mekala, Jingbo Shang
Etremely Weakly Supervised Text Classification (XWS-TC) refers to text classification based on minimal high-level human guidance, such as a few label-indicative seed words or classification instructions.
1 code implementation • 25 Oct 2022 • Sudhanshu Ranjan, Dheeraj Mekala, Jingbo Shang
Instead of training on the entire code-switched corpus at once, we create buckets based on the fraction of words in the resource-rich language and progressively train from resource-rich language dominated samples to low-resource language dominated samples.
no code implementations • 17 Feb 2018 • Dheeraj Mekala, Vivek Gupta, Purushottam Kar, Harish Karnick
We extend the consistency of hierarchical classification algorithm over asymmetric tree distance loss.
no code implementations • 20 Dec 2016 • Rahul Wadbude, Vivek Gupta, Dheeraj Mekala, Harish Karnick
Review score prediction of text reviews has recently gained a lot of attention in recommendation systems.
1 code implementation • 18 Apr 2021 • Xiuwen Zheng, Dheeraj Mekala, Amarnath Gupta, Jingbo Shang
Hashtag annotation for microblog posts has been recently formulated as a sequence generation problem to handle emerging hashtags that are unseen in the training set.
no code implementations • EMNLP 2021 • Dheeraj Mekala, Varun Gangal, Jingbo Shang
Existing text classification methods mainly focus on a fixed label set, whereas many real-world applications require extending to new fine-grained classes as the number of samples per label increases.
no code implementations • Findings (EMNLP) 2021 • Zichao Li, Dheeraj Mekala, chengyu dong, Jingbo Shang
To recognize the poisoned subset, we examine the training samples with these identified triggers as the most suspicious token, and check if removing the trigger will change the poisoned model's prediction.
no code implementations • 21 Dec 2022 • Dheeraj Mekala, Jason Wolfe, Subhro Roy
For each utterance, we prompt the LLM with questions corresponding to its top-level intent and a set of slots and use the LLM generations to construct the target meaning representation.
1 code implementation • 24 May 2023 • Dheeraj Mekala, Adithya Samavedhi, chengyu dong, Jingbo Shang
To address the annotation bottleneck, we introduce SELFOOD, a self-supervised OOD detection method that requires only in-distribution samples as supervision.
no code implementations • 6 Nov 2023 • Dawei Li, Yaxuan Li, Dheeraj Mekala, Shuyao Li, Yulin Wang, Xueqi Wang, William Hogan, Jingbo Shang
DAIL leverages the intuition that large language models are more familiar with the content generated by themselves.
no code implementations • 18 Feb 2024 • Yasaman Jafari, Dheeraj Mekala, Rose Yu, Taylor Berg-Kirkpatrick
RL-based techniques can be used to search for prompts that when fed into a target language model maximize a set of user-specified reward functions.
no code implementations • 30 Mar 2024 • Alex Nguyen, Zilong Wang, Jingbo Shang, Dheeraj Mekala
The application of natural language processing models to PDF documents is pivotal for various business applications yet the challenge of training models for this purpose persists in businesses due to specific hurdles.