Search Results for author: Mihir Parmar

Found 23 papers, 15 papers with code

Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark

1 code implementation6 Oct 2024 Himanshu Gupta, Shreyas Verma, Ujjwala Anantheswaran, Kevin Scaria, Mihir Parmar, Swaroop Mishra, Chitta Baral

The best scores achieved on PolyMATH are ~41%, ~36%, and ~27%, obtained by Claude-3. 5 Sonnet, GPT-4o and Gemini-1. 5 Pro respectively - highlighting the logical and visual complexity of these questions.

Mathematical Reasoning Spatial Reasoning

Step-by-Step Reasoning to Solve Grid Puzzles: Where do LLMs Falter?

1 code implementation20 Jul 2024 Nemika Tyagi, Mihir Parmar, Mohith Kulkarni, Aswin RRV, Nisarg Patel, Mutsumi Nakamura, Arindam Mitra, Chitta Baral

Then, we develop an LLM-based framework for large-scale subjective evaluation (i. e., identifying errors) and an objective metric, PuzzleEval, to evaluate the correctness of reasoning chains.

Logical Reasoning

Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs

1 code implementation5 Jul 2024 Mihir Parmar, Hanieh Deilamsalehy, Franck Dernoncourt, Seunghyun Yoon, Ryan A. Rossi, Trung Bui

Motivated by this, we propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback, offering valuable insights into how to improve coherence in extractive summaries.

Extractive Summarization

Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models

1 code implementation24 Jun 2024 Nisarg Patel, Mohith Kulkarni, Mihir Parmar, Aashna Budhiraja, Mutsumi Nakamura, Neeraj Varshney, Chitta Baral

Experimental results show that there is a significant drop in the performance of LLMs as the reasoning steps/depth increases (average accuracy of ~68% at depth-1 to ~43% at depth-5).

Logical Reasoning Natural Language Understanding

LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

1 code implementation23 Apr 2024 Mihir Parmar, Nisarg Patel, Neeraj Varshney, Mutsumi Nakamura, Man Luo, Santosh Mashetty, Arindam Mitra, Chitta Baral

Existing work investigating this reasoning ability of LLMs has focused only on a couple of inference rules (such as modus ponens and modus tollens) of propositional and first-order logic.

Logical Reasoning Question Answering

LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks

1 code implementation16 Nov 2023 Mihir Parmar, Aakanksha Naik, Himanshu Gupta, Disha Agrawal, Chitta Baral

Assessing these models on long sequences is crucial since prior work in the general domain has demonstrated performance degradation of LLMs on longer texts.

Decoder

Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with LITE

no code implementations28 Oct 2023 Neeraj Varshney, Agneet Chatterjee, Mihir Parmar, Chitta Baral

Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks; however, their large size makes their inference slow and computationally expensive.

Semantic Similarity Semantic Textual Similarity +1

TarGEN: Targeted Data Generation with Large Language Models

1 code implementation27 Oct 2023 Himanshu Gupta, Kevin Scaria, Ujjwala Anantheswaran, Shreyas Verma, Mihir Parmar, Saurabh Arjun Sawant, Chitta Baral, Swaroop Mishra

Finally, when pre-finetuned on our synthetic SuperGLUE dataset, T5-3B yields impressive results on the OpenLLM leaderboard, surpassing the model trained on the Self-Instruct dataset by 4. 14% points.

Decoder Diversity

Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models

no code implementations2 Oct 2023 Man Luo, Shrinidhi Kumbhar, Ming Shen, Mihir Parmar, Neeraj Varshney, Pratyay Banerjee, Somak Aditya, Chitta Baral

This work strives to understand the proficiency of LLMs in logical reasoning by offering a brief review of the latest progress in this area; with a focus on the logical reasoning datasets, tasks, and the methods adopted to utilize LLMs for reasoning.

Knowledge Distillation Language Modelling +1

Can NLP Models 'Identify', 'Distinguish', and 'Justify' Questions that Don't have a Definitive Answer?

no code implementations8 Sep 2023 Ayushi Agarwal, Nisarg Patel, Neeraj Varshney, Mihir Parmar, Pavan Mallina, Aryan Bhavin Shah, Srihari Raju Sangaraju, Tirth Patel, Nihar Thakkar, Chitta Baral

Though state-of-the-art (SOTA) NLP systems have achieved remarkable performance on a variety of language understanding tasks, they primarily focus on questions that have a correct and a definitive answer.

MDDial: A Multi-turn Differential Diagnosis Dialogue Dataset with Reliability Evaluation

1 code implementation16 Aug 2023 Srija Macherla, Man Luo, Mihir Parmar, Chitta Baral

We introduce a unified score for the ADD system that takes into account the interplay between symptoms and diagnosis.

Natural Language Understanding

EDM3: Event Detection as Multi-task Text Generation

1 code implementation25 May 2023 Ujjwala Anantheswaran, Himanshu Gupta, Mihir Parmar, Kuntal Kumar Pal, Chitta Baral

We show that EDM3 helps to learn transferable knowledge that can be leveraged to perform Event Detection and its subtasks concurrently, mitigating the error propagation inherent in pipelined approaches.

Event Detection Sentence +1

BioTABQA: Instruction Learning for Biomedical Table Question Answering

no code implementations6 Jul 2022 Man Luo, Sharad Saxena, Swaroop Mishra, Mihir Parmar, Chitta Baral

To the best of our knowledge, none of TQA datasets exist in the biomedical domain where tables are frequently used to present information.

Question Answering

Is a Question Decomposition Unit All We Need?

1 code implementation25 May 2022 Pruthvi Patel, Swaroop Mishra, Mihir Parmar, Chitta Baral

Large Language Models (LMs) have achieved state-of-the-art performance on many Natural Language Processing (NLP) benchmarks.

Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions

no code implementations1 May 2022 Mihir Parmar, Swaroop Mishra, Mor Geva, Chitta Baral

In this work, we hypothesize that annotators pick up on patterns in the crowdsourcing instructions, which bias them to write many similar examples that are then over-represented in the collected data.

In-BoXBART: Get Instructions into Biomedical Multi-Task Learning

2 code implementations Findings (NAACL) 2022 Mihir Parmar, Swaroop Mishra, Mirali Purohit, Man Luo, M. Hassan Murad, Chitta Baral

Recently, instructional prompts have shown significant improvement towards multi-task generalization; however, the effect of instructional prompts and Multi-Task Learning (MTL) has not been systematically studied in the biomedical domain.

Few-Shot Learning Multi-Task Learning

How Many Data Samples is an Additional Instruction Worth?

1 code implementation17 Mar 2022 Ravsehaj Singh Puri, Swaroop Mishra, Mihir Parmar, Chitta Baral

However, they can write alternate instructions to represent an instruction task.

AdaGAN: Adaptive GAN for Many-to-Many Non-Parallel Voice Conversion

1 code implementation25 Sep 2019 Maitreya Patel, Mirali Purohit, Mihir Parmar, Nirmesh J. Shah, Hemant A. Patil

In this paper, we propose a novel style transfer architecture, which can also be extended to generate voices even for target speakers whose data were not used in the training (i. e., case of zero-shot learning).

Generative Adversarial Network Style Transfer +2

Cannot find the paper you are looking for? You can Submit a new open access paper.