29 papers with code • 1 benchmarks • 1 datasets

Measures whether a model can discern popular misconceptions from the truth.


        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench


Most implemented papers

Community detection in networks: A user guide

learn-co-curriculum/dsc-3-28-12-graph-connectivity-community-detection 30 Jul 2016

Community detection in networks is one of the most popular topics of modern network science.

Laplace Redux -- Effortless Bayesian Deep Learning

AlexImmer/Laplace NeurIPS 2021

Bayesian formulations of deep learning have been shown to have compelling theoretical properties and offer practical functional benefits, such as improved predictive uncertainty quantification and model selection.

Factuality Enhanced Language Models for Open-Ended Text Generation

nayeon7lee/factualityprompt 9 Jun 2022

In this work, we measure and improve the factual accuracy of large-scale LMs for open-ended text generation.

Design Challenges and Misconceptions in Neural Sequence Labeling

jiesutd/NCRFpp COLING 2018

We investigate the design challenges of constructing effective and efficient neural sequence labeling systems, by reproducing twelve neural sequence labeling models, which include most of the state-of-the-art structures, and conduct a systematic model comparison on three benchmarks (i. e. NER, Chunking, and POS tagging).

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

uds-lsv/bert-stable-fine-tuning ICLR 2021

Fine-tuning pre-trained transformer-based language models such as BERT has become a common practice dominating leaderboards across various NLP benchmarks.

TruthfulQA: Measuring How Models Mimic Human Falsehoods

sylinrl/truthfulqa ACL 2022

We crafted questions that some humans would answer falsely due to a false belief or misconception.

A Variational Inequality Perspective on Generative Adversarial Networks

GauthierGidel/Variational-Inequality-GAN ICLR 2019

Generative adversarial networks (GANs) form a generative modeling approach known for producing appealing samples, but they are notably difficult to train.

Zero Shot Learning for Code Education: Rubric Sampling with Deep Learning Inference

mhw32/rubric-sampling-public 5 Sep 2018

Rubric sampling requires minimal teacher effort, can associate feedback with specific parts of a student's solution and can articulate a student's misconceptions in the language of the instructor.

Not All Claims are Created Equal: Choosing the Right Statistical Approach to Assess Hypotheses

allenai/HyBayes ACL 2020

Empirical research in Natural Language Processing (NLP) has adopted a narrow set of principles for assessing hypotheses, relying mainly on p-value computation, which suffers from several known issues.