Overall - Test

8 papers with code • 2 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Overall - Test

Trend	Dataset	Best Model	Paper	Code	Compare
	JEEBench	GPT-4+CoT+Self-Consistency@8			See all
	FeedbackQA	BERT RQA + CombinedReranker			See all

Datasets

Most implemented papers

Most implemented Social Latest No code

Comparative study of deep learning methods for the automatic segmentation of lung, lesion and lesion type in CT scans of COVID-19 patients

endo-angel/ct-angel • • 29 Jul 2020

There is an increasing number of studies that propose to use deep learning to provide fast and accurate quantification of COVID-19 using chest CT scans.

Paper
Code

FreeLB: Enhanced Adversarial Training for Natural Language Understanding

zhuchen03/FreeLB • • ICLR 2020

Adversarial training, which minimizes the maximal risk for label-preserving input perturbations, has proved to be effective for improving the generalization of language models.

Paper
Code

Localizing Open-Ontology QA Semantic Parsers in a Day Using Machine Translation

stanford-oval/SPL • EMNLP 2020

We propose Semantic Parser Localizer (SPL), a toolkit that leverages Neural Machine Translation (NMT) systems to localize a semantic parser for a new language.

Paper
Code

Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment

McGill-NLP/feedbackqa • • Findings (ACL) 2022

We train a neural model with this feedback data that can generate explanations and re-score answer candidates.

Paper
Code

Amplifying Membership Exposure via Data Poisoning

yfchen1994/poisoning_membership • • 1 Nov 2022

In this paper, we investigate the third type of exploitation of data poisoning - increasing the risks of privacy leakage of benign training samples.

Paper
Code

Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For Large Language Models

hgaurav2k/jeebench • 24 May 2023

In response, we present JEEBench, a considerably more challenging benchmark dataset for evaluating the problem solving abilities of LLMs.

Paper
Code

Transferable Availability Poisoning Attacks

trustmlrg/transpoison • • 8 Oct 2023

We consider availability data poisoning attacks, where an adversary aims to degrade the overall test accuracy of a machine learning model by crafting small perturbations to its training data.

Paper
Code

Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning

lcs2-iiitd/daslam • • 21 Oct 2023

Additionally, we show that DaSLaM is not limited by the solver's capabilities as a function of scale; e. g., solver LMs with diverse sizes give significant performance improvement with our solver-agnostic decomposition technique.

Paper
Code

Overall - Test

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result