Search Results for author: Anjana Arunkumar

Found 12 papers, 3 papers with code

Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow

1 code implementation9 Feb 2023 Anjana Arunkumar, Swaroop Mishra, Bhavdeep Sachdeva, Chitta Baral, Chris Bryan

In pursuit of creating better benchmarks, we propose VAIDA, a novel benchmark creation paradigm for NLP, that focuses on guiding crowdworkers, an under-explored facet of addressing benchmark idiosyncrasies.

Hardness of Samples Need to be Quantified for a Reliable Evaluation System: Exploring Potential Opportunities with a New Task

no code implementations14 Oct 2022 Swaroop Mishra, Anjana Arunkumar, Chris Bryan, Chitta Baral

Evaluation of models on benchmarks is unreliable without knowing the degree of sample hardness; this subsequently overestimates the capability of AI systems and limits their adoption in real world applications.

Semantic Textual Similarity STS

A Survey of Parameters Associated with the Quality of Benchmarks in NLP

no code implementations14 Oct 2022 Swaroop Mishra, Anjana Arunkumar, Chris Bryan, Chitta Baral

Inspired by successful quality indices in several domains such as power, food, and water, we take the first step towards a metric by identifying certain language properties that can represent various possible interactions leading to biases in a benchmark.

Benchmarking

Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications

no code implementations10 Oct 2022 Swaroop Mishra, Anjana Arunkumar, Chitta Baral

We find limitations in AUC; e. g., a model having higher AUC is not always better in performing selective answering.

A Proposal to Study "Is High Quality Data All We Need?"

no code implementations12 Mar 2022 Swaroop Mishra, Anjana Arunkumar

Our hypothesis is based on the fact that deep neural networks are data driven models, and data is what leads/misleads models.

Vocal Bursts Intensity Prediction

How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation

no code implementations10 Jun 2021 Swaroop Mishra, Anjana Arunkumar

Models that top leaderboards often perform unsatisfactorily when deployed in real world applications; this has necessitated rigorous and expensive pre-deployment model testing.

Front Contribution instead of Back Propagation

no code implementations10 Jun 2021 Swaroop Mishra, Anjana Arunkumar

We show that our algorithm produces the exact same output as BP, in contrast to several recently proposed algorithms approximating BP.

DQI: A Guide to Benchmark Evaluation

no code implementations10 Aug 2020 Swaroop Mishra, Anjana Arunkumar, Bhavdeep Sachdeva, Chris Bryan, Chitta Baral

A `state of the art' model A surpasses humans in a benchmark B, but fails on similar benchmarks C, D, and E. What does B have that the other benchmarks do not?

Our Evaluation Metric Needs an Update to Encourage Generalization

no code implementations14 Jul 2020 Swaroop Mishra, Anjana Arunkumar, Chris Bryan, Chitta Baral

In order to stop the inflation in model performance -- and thus overestimation in AI systems' capabilities -- we propose a simple and novel evaluation metric, WOOD Score, that encourages generalization during evaluation.

DQI: Measuring Data Quality in NLP

1 code implementation2 May 2020 Swaroop Mishra, Anjana Arunkumar, Bhavdeep Sachdeva, Chris Bryan, Chitta Baral

The data creation paradigm consists of several data visualizations to help data creators (i) understand the quality of data and (ii) visualize the impact of the created data instance on the overall quality.

Active Learning Benchmarking

Cannot find the paper you are looking for? You can Submit a new open access paper.