20 papers with code • 1 benchmarks • 1 datasets

Measures whether a model can discern popular misconceptions from the truth.


        input: The daddy longlegs spider is the most venomous spider in the world.
        choice: T
        choice: F
        answer: F

        input: Karl Benz is correctly credited with the invention of the first modern automobile.
        choice: T
        choice: F
        answer: T

Source: BIG-bench


Most implemented papers

Community detection in networks: A user guide

learn-co-curriculum/dsc-3-28-12-graph-connectivity-community-detection 30 Jul 2016

Community detection in networks is one of the most popular topics of modern network science.

Design Challenges and Misconceptions in Neural Sequence Labeling

jiesutd/NCRFpp COLING 2018

We investigate the design challenges of constructing effective and efficient neural sequence labeling systems, by reproducing twelve neural sequence labeling models, which include most of the state-of-the-art structures, and conduct a systematic model comparison on three benchmarks (i. e. NER, Chunking, and POS tagging).

Laplace Redux -- Effortless Bayesian Deep Learning

AlexImmer/Laplace NeurIPS 2021

Bayesian formulations of deep learning have been shown to have compelling theoretical properties and offer practical functional benefits, such as improved predictive uncertainty quantification and model selection.

A Variational Inequality Perspective on Generative Adversarial Networks

GauthierGidel/Variational-Inequality-GAN ICLR 2019

Generative adversarial networks (GANs) form a generative modeling approach known for producing appealing samples, but they are notably difficult to train.

Zero Shot Learning for Code Education: Rubric Sampling with Deep Learning Inference

mhw32/rubric-sampling-public 5 Sep 2018

Rubric sampling requires minimal teacher effort, can associate feedback with specific parts of a student's solution and can articulate a student's misconceptions in the language of the instructor.

Not All Claims are Created Equal: Choosing the Right Statistical Approach to Assess Hypotheses

allenai/HyBayes ACL 2020

Empirical research in Natural Language Processing (NLP) has adopted a narrow set of principles for assessing hypotheses, relying mainly on p-value computation, which suffers from several known issues.

Deep Curvature Suite

xingchenwan/MLRG_DeepCurvature 20 Dec 2019

We present MLRG Deep Curvature suite, a PyTorch-based, open-source package for analysis and visualisation of neural network curvature and loss landscape.

Re-Examining Linear Embeddings for High-Dimensional Bayesian Optimization

facebookresearch/alebo NeurIPS 2020

We show empirically that properly addressing these issues significantly improves the efficacy of linear embeddings for BO on a range of problems, including learning a gait policy for robot locomotion.

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

uds-lsv/bert-stable-fine-tuning ICLR 2021

Fine-tuning pre-trained transformer-based language models such as BERT has become a common practice dominating leaderboards across various NLP benchmarks.