Misconceptions
29 papers with code • 1 benchmarks • 1 datasets
Measures whether a model can discern popular misconceptions from the truth.
Example:
input: The daddy longlegs spider is the most venomous spider in the world.
choice: T
choice: F
answer: F
input: Karl Benz is correctly credited with the invention of the first modern automobile.
choice: T
choice: F
answer: T
Source: BIG-bench
Most implemented papers
Community detection in networks: A user guide
Community detection in networks is one of the most popular topics of modern network science.
Laplace Redux -- Effortless Bayesian Deep Learning
Bayesian formulations of deep learning have been shown to have compelling theoretical properties and offer practical functional benefits, such as improved predictive uncertainty quantification and model selection.
Factuality Enhanced Language Models for Open-Ended Text Generation
In this work, we measure and improve the factual accuracy of large-scale LMs for open-ended text generation.
Design Challenges and Misconceptions in Neural Sequence Labeling
We investigate the design challenges of constructing effective and efficient neural sequence labeling systems, by reproducing twelve neural sequence labeling models, which include most of the state-of-the-art structures, and conduct a systematic model comparison on three benchmarks (i. e. NER, Chunking, and POS tagging).
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
Fine-tuning pre-trained transformer-based language models such as BERT has become a common practice dominating leaderboards across various NLP benchmarks.
TruthfulQA: Measuring How Models Mimic Human Falsehoods
We crafted questions that some humans would answer falsely due to a false belief or misconception.
A Variational Inequality Perspective on Generative Adversarial Networks
Generative adversarial networks (GANs) form a generative modeling approach known for producing appealing samples, but they are notably difficult to train.
Zero Shot Learning for Code Education: Rubric Sampling with Deep Learning Inference
Rubric sampling requires minimal teacher effort, can associate feedback with specific parts of a student's solution and can articulate a student's misconceptions in the language of the instructor.
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
In this work, we make the first step towards a comprehensive evaluation of cross-lingual word embeddings.
Not All Claims are Created Equal: Choosing the Right Statistical Approach to Assess Hypotheses
Empirical research in Natural Language Processing (NLP) has adopted a narrow set of principles for assessing hypotheses, relying mainly on p-value computation, which suffers from several known issues.