Search Results for author: Vivek Gupta

Found 44 papers, 20 papers with code

Effective Dimensionality Reduction for Word Embeddings

1 code implementation WS 2019 Vikas Raunak, Vivek Gupta, Florian Metze

Pre-trained word embeddings are used in several downstream applications as well as for constructing representations for sentences, paragraphs and documents.

Dimensionality Reduction Word Embeddings

On Dimensional Linguistic Properties of the Word Embedding Space

2 code implementations WS 2020 Vikas Raunak, Vaibhav Kumar, Vivek Gupta, Florian Metze

Word embeddings have become a staple of several natural language processing tasks, yet much remains to be understood about their properties.

Machine Translation Sentence +3

SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations

4 code implementations EMNLP 2017 Dheeraj Mekala, Vivek Gupta, Bhargavi Paranjape, Harish Karnick

We present a feature vector formation technique for documents - Sparse Composite Document Vector (SCDV) - which overcomes several shortcomings of the current distributional paragraph vector representations that are widely used for text representation.

Clustering Information Retrieval +3

P-SIF: Document Embeddings Using Partition Averaging

1 code implementation18 May 2020 Vivek Gupta, Ankit Saw, Pegah Nokhiz, Praneeth Netrapalli, Piyush Rai, Partha Talukdar

One of the key reasons is that a longer document is likely to contain words from many different topics; hence, creating a single vector while ignoring all the topical structure is unlikely to yield an effective document representation.

A Logic-Driven Framework for Consistency of Neural Models

1 code implementation IJCNLP 2019 Tao Li, Vivek Gupta, Maitrey Mehta, Vivek Srikumar

While neural models show remarkable accuracy on individual predictions, their internal beliefs can be inconsistent across examples.

Natural Language Inference

Incorporating External Knowledge to Enhance Tabular Reasoning

1 code implementation NAACL 2021 J. Neeraja, Vivek Gupta, Vivek Srikumar

Reasoning about tabular information presents unique challenges to modern NLP approaches which largely rely on pre-trained contextualized embeddings of text.

Natural Language Inference

Improving Document Classification with Multi-Sense Embeddings

1 code implementation18 Nov 2019 Vivek Gupta, Ankit Saw, Pegah Nokhiz, Harshit Gupta, Partha Talukdar

Through extensive experiments on multiple real-world datasets, we show that SCDV-MS embeddings outperform previous state-of-the-art embeddings on multi-class and multi-label text categorization tasks.

Classification Clustering +5

IndicXNLI: Evaluating Multilingual Inference for Indian Languages

1 code implementation19 Apr 2022 Divyanshu Aggarwal, Vivek Gupta, Anoop Kunchukuttan

While Indic NLP has made rapid advances recently in terms of the availability of corpora and pre-trained models, benchmark datasets on standard NLU tasks are limited.

Cross-Lingual Transfer Machine Translation +1

ChartCheck: Explainable Fact-Checking over Real-World Chart Images

1 code implementation13 Nov 2023 Mubashara Akhtar, Nikesh Subedi, Vivek Gupta, Sahar Tahmasebi, Oana Cocarascu, Elena Simperl

Whilst fact verification has attracted substantial interest in the natural language processing community, verifying misinforming statements against data visualizations such as charts has so far been overlooked.

Fact Checking Fact Verification +1

Text Summarization using Abstract Meaning Representation

2 code implementations6 Jun 2017 Shibhansh Dohare, Harish Karnick, Vivek Gupta

With an ever increasing size of text present on the Internet, automatic summary generation remains an important problem for natural language understanding.

Natural Language Understanding Text Summarization

Unsupervised Contextualized Document Representation

1 code implementation EMNLP (sustainlp) 2021 Ankur Gupta, Vivek Gupta

In this paper, we address this issue by proposing SCDV+BERT(ctxd), a simple and effective unsupervised representation that combines contextualized BERT (Devlin et al., 2019) based word embedding for word sense disambiguation with SCDV soft clustering approach.

Clustering Sentence +2

SLATE: A Sequence Labeling Approach for Task Extraction from Free-form Inked Content

1 code implementation8 Nov 2022 Apurva Gandhi, Ryan Serrao, Biyi Fang, Gilbert Antonius, Jenna Hong, Tra My Nguyen, Sheng Yi, Ehi Nosakhare, Irene Shaffer, Soundararajan Srinivasan, Vivek Gupta

We present SLATE, a sequence labeling approach for extracting tasks from free-form content such as digitally handwritten (or "inked") notes on a virtual whiteboard.

Segmentation Sentence +1

TabPert: An Effective Platform for Tabular Perturbation

1 code implementation2 Aug 2021 Nupur Jain, Vivek Gupta, Anshul Rai, Gaurav Kumar

To truly grasp reasoning ability, a Natural Language Inference model should be evaluated on counterfactual data.

counterfactual Natural Language Inference

TabPert : An Effective Platform for Tabular Perturbation

1 code implementation EMNLP (ACL) 2021 Nupur Jain, Vivek Gupta, Anshul Rai, Gaurav Kumar

To grasp the true reasoning ability, the Natural Language Inference model should be evaluated on counterfactual data.

counterfactual Natural Language Inference

Efficient Estimation of Generalization Error and Bias-Variance Components of Ensembles

no code implementations15 Nov 2017 Dhruv Mahajan, Vivek Gupta, S. Sathiya Keerthi, Sellamanickam Sundararajan, Shravan Narayanamurthy, Rahul Kidambi

We also demonstrate their usefulness in making design choices such as the number of classifiers in the ensemble and the size of a subset of data used for training that is needed to achieve a certain value of generalization error.

Leveraging Distributional Semantics for Multi-Label Learning

no code implementations18 Sep 2017 Rahul Wadbude, Vivek Gupta, Piyush Rai, Nagarajan Natarajan, Harish Karnick, Prateek Jain

Our approach is novel in that it highlights interesting connections between label embedding methods used for multi-label learning and paragraph/document embedding methods commonly used for learning representations of text data.

Document Embedding Missing Labels +1

User Bias Removal in Review Score Prediction

no code implementations20 Dec 2016 Rahul Wadbude, Vivek Gupta, Dheeraj Mekala, Harish Karnick

Review score prediction of text reviews has recently gained a lot of attention in recommendation systems.

Recommendation Systems

Product Classification in E-Commerce using Distributional Semantics

no code implementations COLING 2016 Vivek Gupta, Harish Karnick, Ashendra Bansal, Pradhuman Jhala

Product classification is the task of automatically predicting a taxonomy path for a product in a predefined taxonomy hierarchy given a textual product description or title.

Classification General Classification

Equalizing Recourse across Groups

no code implementations7 Sep 2019 Vivek Gupta, Pegah Nokhiz, Chitradeep Dutta Roy, Suresh Venkatasubramanian

We measure recourse as the distance of an individual from the decision boundary of a classifier.

Decision Making Fairness

DeepSumm -- Deep Code Summaries using Neural Transformer Architecture

no code implementations31 Mar 2020 Vivek Gupta

In this paper, we employ neural techniques to solve the task of source code summarizing and specifically compare NMT based techniques to more simplified and appealing Transformer architecture on a dataset of Java methods and comments.

NMT

INFOTABS: Inference on Tables as Semi-structured Data

no code implementations ACL 2020 Vivek Gupta, Maitrey Mehta, Pegah Nokhiz, Vivek Srikumar

In this paper, we observe that semi-structured tabulated text is ubiquitous; understanding them requires not only comprehending the meaning of text fragments, but also implicit relationships between them.

Is My Model Using The Right Evidence? Systematic Probes for Examining Evidence-Based Tabular Reasoning

no code implementations2 Aug 2021 Vivek Gupta, Riyaz A. Bhat, Atreya Ghosal, Manish Shrivastava, Maneesh Singh, Vivek Srikumar

Our experiments demonstrate that a RoBERTa-based model, representative of the current state-of-the-art, fails at reasoning on the following counts: it (a) ignores relevant parts of the evidence, (b) is over-sensitive to annotation artifacts, and (c) relies on the knowledge encoded in the pre-trained language model rather than the evidence presented in its tabular inputs.

Language Modelling

RETRONLU: Retrieval Augmented Task-Oriented Semantic Parsing

no code implementations NLP4ConvAI (ACL) 2022 Vivek Gupta, Akshat Shrivastava, Adithya Sagar, Armen Aghajanyan, Denis Savenkov

While large pre-trained language models accumulate a lot of knowledge in their parameters, it has been demonstrated that augmenting it with non-parametric retrieval-based memory has a number of benefits from accuracy improvements to data efficiency for knowledge-focused tasks, such as question answering.

Question Answering Retrieval +1

Unsupervised Document Representation using Partition Word-Vectors Averaging

no code implementations27 Sep 2018 Vivek Gupta, Ankit Kumar Saw, Partha Pratim Talukdar, Praneeth Netrapalli

One reason for this degradation is due to the fact that a longer document is likely to contain words from many different themes (or topics), and hence creating a single vector while ignoring all the thematic structure is unlikely to yield an effective representation of the document.

Document Classification Sentence

XInfoTabS: Evaluating Multilingual Tabular Natural Language Inference

no code implementations FEVER (ACL) 2022 Bhavnick Minhas, Anant Shankhdhar, Vivek Gupta, Divyanshu Aggarwal, Shuo Zhang

In this paper, we use machine translation methods to construct a multilingual tabular NLI dataset, namely XINFOTABS, which expands the English tabular NLI dataset of INFOTABS to ten diverse languages.

Machine Translation Natural Language Inference +1

Right for the Right Reason: Evidence Extraction for Trustworthy Tabular Reasoning

no code implementations ACL 2022 Vivek Gupta, Shuo Zhang, Alakananda Vempala, Yujie He, Temma Choji, Vivek Srikumar

On the downstream tabular inference task, using only the automatically extracted evidence as the premise, our approach outperforms prior benchmarks.

Bilingual Tabular Inference: A Case Study on Indic Languages

no code implementations NAACL 2022 Chaitanya Agarwal, Vivek Gupta, Anoop Kunchukuttan, Manish Shrivastava

Existing research on Tabular Natural Language Inference (TNLI) exclusively examines the task in a monolingual setting where the tabular premise and hypothesis are in the same language.

Natural Language Inference

Realistic Data Augmentation Framework for Enhancing Tabular Reasoning

no code implementations23 Oct 2022 Dibyakanti Kumar, Vivek Gupta, Soumya Sharma, Shuo Zhang

We observed that our framework could generate human-like tabular inference examples, which could benefit training data augmentation, especially in the scenario with limited supervision.

counterfactual Data Augmentation +1

Enhancing Tabular Reasoning with Pattern Exploiting Training

no code implementations21 Oct 2022 Abhilash Reddy Shankarampeta, Vivek Gupta, Shuo Zhang

Recent methods based on pre-trained language models have exhibited superior performance over tabular tasks (e. g., tabular NLI), despite showing inherent problems such as not using the right evidence and inconsistent predictions across inputs while reasoning over the tabular data.

Evaluating Inter-Bilingual Semantic Parsing for Indian Languages

1 code implementation25 Apr 2023 Divyanshu Aggarwal, Vivek Gupta, Anoop Kunchukuttan

Despite significant progress in Natural Language Generation for Indian languages (IndicNLP), there is a lack of datasets around complex structured tasks such as semantic parsing.

Semantic Parsing Text Generation +1

MANER: Multi-Agent Neural Rearrangement Planning of Objects in Cluttered Environments

no code implementations10 Jun 2023 Vivek Gupta, Praphpreet Dhir, Jeegn Dani, Ahmed H. Qureshi

Object rearrangement is a fundamental problem in robotics with various practical applications ranging from managing warehouses to cleaning and organizing home kitchens.

InfoSync: Information Synchronization across Multilingual Semi-structured Tables

no code implementations6 Jul 2023 Siddharth Khincha, Chelsi Jain, Vivek Gupta, Tushar Kataria, Shuo Zhang

The proposed method includes 1) Information Alignment to map rows and 2) Information Update for updating missing/outdated information for aligned tables across multilingual tables.

TempTabQA: Temporal Question Answering for Semi-Structured Tables

no code implementations14 Nov 2023 Vivek Gupta, Pranshu Kandoi, Mahek Bhavesh Vora, Shuo Zhang, Yujie He, Ridho Reinanda, Vivek Srikumar

Given these results, our dataset has the potential to serve as a challenging benchmark to improve the temporal reasoning capabilities of NLP models.

Question Answering

Multi-Set Inoculation: Assessing Model Robustness Across Multiple Challenge Sets

no code implementations15 Nov 2023 Vatsal Gupta, Pranshu Pandya, Tushar Kataria, Vivek Gupta, Dan Roth

Language models, given their black-box nature, often exhibit sensitivity to input perturbations, leading to trust issues due to hallucinations.

Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering

no code implementations17 Feb 2024 Pragya Srivastava, Manuj Malik, Vivek Gupta, Tanuja Ganu, Dan Roth

Large Language Models (LLMs), excel in natural language understanding, but their capability for complex mathematical reasoning with an amalgamation of structured tables and unstructured text is uncertain.

Arithmetic Reasoning Mathematical Reasoning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.