Search Results for author: Daniel Paleka

Found 5 papers, 1 papers with code

ARB: Advanced Reasoning Benchmark for Large Language Models

no code implementations25 Jul 2023 Tomohiro Sawada, Daniel Paleka, Alexander Havrilla, Pranav Tadepalli, Paula Vidas, Alexander Kranias, John J. Nay, Kshitij Gupta, Aran Komatsuzaki

As a subset of ARB, we introduce a challenging set of math and physics problems which require advanced symbolic reasoning and domain knowledge.

Math

Evaluating Superhuman Models with Consistency Checks

2 code implementations16 Jun 2023 Lukas Fluri, Daniel Paleka, Florian Tramèr

If machine learning models were to achieve superhuman abilities at various reasoning or decision-making tasks, how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth?

Decision Making

Red-Teaming the Stable Diffusion Safety Filter

no code implementations3 Oct 2022 Javier Rando, Daniel Paleka, David Lindner, Lennart Heim, Florian Tramèr

We then reverse-engineer the filter and find that while it aims to prevent sexual content, it ignores violence, gore, and other similarly disturbing content.

Image Generation

A law of adversarial risk, interpolation, and label noise

no code implementations8 Jul 2022 Daniel Paleka, Amartya Sanyal

In supervised learning, it has been shown that label noise in the data can be interpolated without penalties on test accuracy.

Inductive Bias

Cannot find the paper you are looking for? You can Submit a new open access paper.