1 code implementation • WS 2019 • Vikas Raunak, Vivek Gupta, Florian Metze
Pre-trained word embeddings are used in several downstream applications as well as for constructing representations for sentences, paragraphs and documents.
2 code implementations • WS 2020 • Vikas Raunak, Vaibhav Kumar, Vivek Gupta, Florian Metze
Word embeddings have become a staple of several natural language processing tasks, yet much remains to be understood about their properties.
4 code implementations • EMNLP 2017 • Dheeraj Mekala, Vivek Gupta, Bhargavi Paranjape, Harish Karnick
We present a feature vector formation technique for documents - Sparse Composite Document Vector (SCDV) - which overcomes several shortcomings of the current distributional paragraph vector representations that are widely used for text representation.
1 code implementation • 18 May 2020 • Vivek Gupta, Ankit Saw, Pegah Nokhiz, Praneeth Netrapalli, Piyush Rai, Partha Talukdar
One of the key reasons is that a longer document is likely to contain words from many different topics; hence, creating a single vector while ignoring all the topical structure is unlikely to yield an effective document representation.
1 code implementation • IJCNLP 2019 • Tao Li, Vivek Gupta, Maitrey Mehta, Vivek Srikumar
While neural models show remarkable accuracy on individual predictions, their internal beliefs can be inconsistent across examples.
1 code implementation • ACL 2021 • Vivek Gupta, Prerna Bharti, Pegah Nokhiz, Harish Karnick
Most earlier work on text summarization is carried out on news article datasets.
1 code implementation • NAACL 2021 • J. Neeraja, Vivek Gupta, Vivek Srikumar
Reasoning about tabular information presents unique challenges to modern NLP approaches which largely rely on pre-trained contextualized embeddings of text.
2 code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Shagun Uppal, Vivek Gupta, Avinash Swaminathan, Haimin Zhang, Debanjan Mahata, Rakesh Gosangi, Rajiv Ratn Shah, Amanda Stent
We further improve the performance by using a joint-objective for classification and textual entailment.
1 code implementation • 18 Nov 2019 • Vivek Gupta, Ankit Saw, Pegah Nokhiz, Harshit Gupta, Partha Talukdar
Through extensive experiments on multiple real-world datasets, we show that SCDV-MS embeddings outperform previous state-of-the-art embeddings on multi-class and multi-label text categorization tasks.
Ranked #5 on Document Classification on Reuters-21578 (F1 metric)
1 code implementation • 19 Apr 2022 • Divyanshu Aggarwal, Vivek Gupta, Anoop Kunchukuttan
While Indic NLP has made rapid advances recently in terms of the availability of corpora and pre-trained models, benchmark datasets on standard NLU tasks are limited.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Vikas Raunak, Siddharth Dalmia, Vivek Gupta, Florian Metze
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens, tackling which remains a major challenge.
1 code implementation • 13 Nov 2023 • Mubashara Akhtar, Nikesh Subedi, Vivek Gupta, Sahar Tahmasebi, Oana Cocarascu, Elena Simperl
Whilst fact verification has attracted substantial interest in the natural language processing community, verifying misinforming statements against data visualizations such as charts has so far been overlooked.
2 code implementations • 6 Jun 2017 • Shibhansh Dohare, Harish Karnick, Vivek Gupta
With an ever increasing size of text present on the Internet, automatic summary generation remains an important problem for natural language understanding.
1 code implementation • ACL 2018 • Shibhansh Dohare, Vivek Gupta, Harish Karnick
Automatic abstractive summary generation remains a significant open problem for natural language processing.
1 code implementation • EMNLP (sustainlp) 2021 • Ankur Gupta, Vivek Gupta
In this paper, we address this issue by proposing SCDV+BERT(ctxd), a simple and effective unsupervised representation that combines contextualized BERT (Devlin et al., 2019) based word embedding for word sense disambiguation with SCDV soft clustering approach.
1 code implementation • 8 Nov 2022 • Apurva Gandhi, Ryan Serrao, Biyi Fang, Gilbert Antonius, Jenna Hong, Tra My Nguyen, Sheng Yi, Ehi Nosakhare, Irene Shaffer, Soundararajan Srinivasan, Vivek Gupta
We present SLATE, a sequence labeling approach for extracting tasks from free-form content such as digitally handwritten (or "inked") notes on a virtual whiteboard.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Pranshi Yadav, Priya Yadav, Pegah Nokhiz, Vivek Gupta
User-generated contents{'} score-based prediction and item recommendation has become an inseparable part of the online recommendation systems.
1 code implementation • 2 Aug 2021 • Nupur Jain, Vivek Gupta, Anshul Rai, Gaurav Kumar
To truly grasp reasoning ability, a Natural Language Inference model should be evaluated on counterfactual data.
1 code implementation • EMNLP (ACL) 2021 • Nupur Jain, Vivek Gupta, Anshul Rai, Gaurav Kumar
To grasp the true reasoning ability, the Natural Language Inference model should be evaluated on counterfactual data.
no code implementations • 17 Feb 2018 • Dheeraj Mekala, Vivek Gupta, Purushottam Kar, Harish Karnick
We extend the consistency of hierarchical classification algorithm over asymmetric tree distance loss.
no code implementations • 15 Nov 2017 • Dhruv Mahajan, Vivek Gupta, S. Sathiya Keerthi, Sellamanickam Sundararajan, Shravan Narayanamurthy, Rahul Kidambi
We also demonstrate their usefulness in making design choices such as the number of classifiers in the ensemble and the size of a subset of data used for training that is needed to achieve a certain value of generalization error.
no code implementations • 18 Sep 2017 • Rahul Wadbude, Vivek Gupta, Piyush Rai, Nagarajan Natarajan, Harish Karnick, Prateek Jain
Our approach is novel in that it highlights interesting connections between label embedding methods used for multi-label learning and paragraph/document embedding methods commonly used for learning representations of text data.
no code implementations • 20 Dec 2016 • Rahul Wadbude, Vivek Gupta, Dheeraj Mekala, Harish Karnick
Review score prediction of text reviews has recently gained a lot of attention in recommendation systems.
no code implementations • COLING 2016 • Vivek Gupta, Harish Karnick, Ashendra Bansal, Pradhuman Jhala
Product classification is the task of automatically predicting a taxonomy path for a product in a predefined taxonomy hierarchy given a textual product description or title.
no code implementations • 7 Sep 2019 • Vivek Gupta, Pegah Nokhiz, Chitradeep Dutta Roy, Suresh Venkatasubramanian
We measure recourse as the distance of an individual from the decision boundary of a classifier.
no code implementations • 31 Mar 2020 • Vivek Gupta
In this paper, we employ neural techniques to solve the task of source code summarizing and specifically compare NMT based techniques to more simplified and appealing Transformer architecture on a dataset of Java methods and comments.
no code implementations • ACL 2020 • Vivek Gupta, Maitrey Mehta, Pegah Nokhiz, Vivek Srikumar
In this paper, we observe that semi-structured tabulated text is ubiquitous; understanding them requires not only comprehending the meaning of text fragments, but also implicit relationships between them.
no code implementations • 2 Aug 2021 • Vivek Gupta, Riyaz A. Bhat, Atreya Ghosal, Manish Shrivastava, Maneesh Singh, Vivek Srikumar
Our experiments demonstrate that a RoBERTa-based model, representative of the current state-of-the-art, fails at reasoning on the following counts: it (a) ignores relevant parts of the evidence, (b) is over-sensitive to annotation artifacts, and (c) relies on the knowledge encoded in the pre-trained language model rather than the evidence presented in its tabular inputs.
no code implementations • NLP4ConvAI (ACL) 2022 • Vivek Gupta, Akshat Shrivastava, Adithya Sagar, Armen Aghajanyan, Denis Savenkov
While large pre-trained language models accumulate a lot of knowledge in their parameters, it has been demonstrated that augmenting it with non-parametric retrieval-based memory has a number of benefits from accuracy improvements to data efficiency for knowledge-focused tasks, such as question answering.
no code implementations • 27 Sep 2018 • Vivek Gupta, Ankit Kumar Saw, Partha Pratim Talukdar, Praneeth Netrapalli
One reason for this degradation is due to the fact that a longer document is likely to contain words from many different themes (or topics), and hence creating a single vector while ignoring all the thematic structure is unlikely to yield an effective representation of the document.
no code implementations • FEVER (ACL) 2022 • Bhavnick Minhas, Anant Shankhdhar, Vivek Gupta, Divyanshu Aggarwal, Shuo Zhang
In this paper, we use machine translation methods to construct a multilingual tabular NLI dataset, namely XINFOTABS, which expands the English tabular NLI dataset of INFOTABS to ten diverse languages.
no code implementations • DeeLIO (ACL) 2022 • Yerram Varun, Aayush Sharma, Vivek Gupta
Natural language inference on tabular data is a challenging task.
no code implementations • ACL 2022 • Vivek Gupta, Shuo Zhang, Alakananda Vempala, Yujie He, Temma Choji, Vivek Srikumar
On the downstream tabular inference task, using only the automatically extracted evidence as the premise, our approach outperforms prior benchmarks.
no code implementations • NAACL 2022 • Chaitanya Agarwal, Vivek Gupta, Anoop Kunchukuttan, Manish Shrivastava
Existing research on Tabular Natural Language Inference (TNLI) exclusively examines the task in a monolingual setting where the tabular premise and hypothesis are in the same language.
no code implementations • 23 Oct 2022 • Dibyakanti Kumar, Vivek Gupta, Soumya Sharma, Shuo Zhang
We observed that our framework could generate human-like tabular inference examples, which could benefit training data augmentation, especially in the scenario with limited supervision.
no code implementations • 21 Oct 2022 • Abhilash Reddy Shankarampeta, Vivek Gupta, Shuo Zhang
Recent methods based on pre-trained language models have exhibited superior performance over tabular tasks (e. g., tabular NLI), despite showing inherent problems such as not using the right evidence and inconsistent predictions across inputs while reasoning over the tabular data.
no code implementations • 23 Nov 2022 • Aashna Jena, Vivek Gupta, Manish Shrivastava, Julian Martin Eisenschlos
Creating challenging tabular inference data is essential for learning complex reasoning.
1 code implementation • 25 Apr 2023 • Divyanshu Aggarwal, Vivek Gupta, Anoop Kunchukuttan
Despite significant progress in Natural Language Generation for Indian languages (IndicNLP), there is a lack of datasets around complex structured tasks such as semantic parsing.
no code implementations • 10 Jun 2023 • Vivek Gupta, Praphpreet Dhir, Jeegn Dani, Ahmed H. Qureshi
Object rearrangement is a fundamental problem in robotics with various practical applications ranging from managing warehouses to cleaning and organizing home kitchens.
no code implementations • 6 Jul 2023 • Siddharth Khincha, Chelsi Jain, Vivek Gupta, Tushar Kataria, Shuo Zhang
The proposed method includes 1) Information Alignment to map rows and 2) Information Update for updating missing/outdated information for aligned tables across multilingual tables.
no code implementations • 3 Nov 2023 • Mubashara Akhtar, Abhilash Shankarampeta, Vivek Gupta, Arpit Patil, Oana Cocarascu, Elena Simperl
Thus, understanding and reasoning with numbers are essential skills for language models to solve different tasks.
no code implementations • 14 Nov 2023 • Vivek Gupta, Pranshu Kandoi, Mahek Bhavesh Vora, Shuo Zhang, Yujie He, Ridho Reinanda, Vivek Srikumar
Given these results, our dataset has the potential to serve as a challenging benchmark to improve the temporal reasoning capabilities of NLP models.
no code implementations • 15 Nov 2023 • Vatsal Gupta, Pranshu Pandya, Tushar Kataria, Vivek Gupta, Dan Roth
Language models, given their black-box nature, often exhibit sensitivity to input perturbations, leading to trust issues due to hallucinations.
no code implementations • 17 Feb 2024 • Pragya Srivastava, Manuj Malik, Vivek Gupta, Tanuja Ganu, Dan Roth
Large Language Models (LLMs), excel in natural language understanding, but their capability for complex mathematical reasoning with an amalgamation of structured tables and unstructured text is uncertain.