Search Results for author: Rotem Dror

Found 15 papers, 9 papers with code

State of What Art? A Call for Multi-Prompt LLM Evaluation

1 code implementation31 Dec 2023 Moran Mizrahi, Guy Kaplan, Dan Malkin, Rotem Dror, Dafna Shahaf, Gabriel Stanovsky

Recent advances in large language models (LLMs) have led to the development of various evaluation benchmarks.

The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics

1 code implementation30 Oct 2023 Christoph Leiter, Juri Opitz, Daniel Deutsch, Yang Gao, Rotem Dror, Steffen Eger

Specifically, we propose a novel competition setting in which we select a list of allowed LLMs and disallow fine-tuning to ensure a focus on prompting.

Machine Translation Text Generation

On the Limitations of Reference-Free Evaluations of Generated Text

no code implementations22 Oct 2022 Daniel Deutsch, Rotem Dror, Dan Roth

There is significant interest in developing evaluation metrics which accurately estimate the quality of generated text without the aid of a human-written reference text, which can be time consuming and expensive to collect or entirely unavailable in online applications.

Machine Translation

Zero-Shot On-the-Fly Event Schema Induction

no code implementations12 Oct 2022 Rotem Dror, Haoyu Wang, Dan Roth

The answers to these questions can be found by collecting many documents on the complex event of interest, extracting relevant information, and analyzing it.

Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics

no code implementations NAACL 2022 Daniel Deutsch, Rotem Dror, Dan Roth

How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations.

A Statistical Analysis of Summarization Evaluation Metrics using Resampling Methods

1 code implementation31 Mar 2021 Daniel Deutsch, Rotem Dror, Dan Roth

After evaluating which of the proposed methods is most appropriate for summarization through two simulation experiments, we analyze the results of applying these methods to several different automatic evaluation metrics across three sets of human annotations.

The Structured Weighted Violations MIRA

1 code implementation9 May 2020 Dor Ringel, Rotem Dror, Roi Reichart

We present the Structured Weighted Violation MIRA (SWVM), a new structured prediction algorithm that is based on an hybridization between MIRA (Crammer and Singer, 2003) and the structured weighted violations perceptron (SWVP) (Dror and Reichart, 2016).

Chunking named-entity-recognition +3

Deep Dominance - How to Properly Compare Deep Neural Models

1 code implementation ACL 2019 Rotem Dror, Segev Shlomov, Roi Reichart

Comparing between Deep Neural Network (DNN) models based on their performance on unseen data is crucial for the progress of the NLP field.

Appendix - Recommended Statistical Significance Tests for NLP Tasks

1 code implementation5 Sep 2018 Rotem Dror, Roi Reichart

Statistical significance testing plays an important role when drawing conclusions from experimental results in NLP papers.

valid

The Hitchhiker's Guide to Testing Statistical Significance in Natural Language Processing

1 code implementation ACL 2018 Rotem Dror, Gili Baumer, Segev Shlomov, Roi Reichart

We establish the fundamental concepts of significance testing and discuss the specific aspects of NLP tasks, experimental setups and evaluation measures that affect the choice of significance tests in NLP research.

Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets

1 code implementation TACL 2017 Rotem Dror, Gili Baumer, Marina Bogomolov, Roi Reichart

With the ever-growing amounts of textual data from a large variety of languages, domains, and genres, it has become standard to evaluate NLP algorithms on multiple datasets in order to ensure consistent performance across heterogeneous setups.

Dependency Parsing General Classification +5

The Structured Weighted Violations Perceptron Algorithm

no code implementations EMNLP 2016 Rotem Dror, Roi Reichart

We present the Structured Weighted Violations Perceptron (SWVP) algorithm, a new structured prediction algorithm that generalizes the Collins Structured Perceptron (CSP).

Dependency Parsing Generalization Bounds +1

Cannot find the paper you are looking for? You can Submit a new open access paper.