no code implementations • 5 Mar 2025 • Richard Ren, Arunim Agarwal, Mantas Mazeika, Cristina Menghini, Robert Vacareanu, Brad Kenstler, Mick Yang, Isabelle Barrass, Alice Gatti, Xuwang Yin, Eduardo Trevino, Matias Geralnik, Adam Khoja, Dean Lee, Summer Yue, Dan Hendrycks
Across a diverse set of LLMs, we find that while larger models obtain higher accuracy on our benchmark, they do not become more honest.
no code implementations • 13 Feb 2025 • Clinton J. Wang, Dean Lee, Cristina Menghini, Johannes Mols, Jack Doughty, Adam Khoja, Jayson Lynch, Sean Hendryx, Summer Yue, Dan Hendrycks
As language models master existing reasoning benchmarks, we need new challenges to evaluate their cognitive frontiers.
1 code implementation • 29 Jan 2025 • Ved Sirdeshmukh, Kaustubh Deshpande, Johannes Mols, Lifeng Jin, Ed-Yeremai Cardona, Dean Lee, Jeremy Kritz, Willow Primack, Summer Yue, Chen Xing
We present MultiChallenge, a pioneering benchmark evaluating large language models (LLMs) on conducting multi-turn conversations with human users, a crucial yet underexamined capability for their applications.
no code implementations • 1 May 2024 • Hugh Zhang, Jeff Da, Dean Lee, Vaughn Robinson, Catherine Wu, Will Song, Tiffany Zhao, Pranav Raja, Charlotte Zhuang, Dylan Slack, Qin Lyu, Sean Hendryx, Russell Kaplan, Michele Lunati, Summer Yue
Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning.
no code implementations • 22 Jan 2024 • Patrick Cook, Danny Jammooa, Morten Hjorth-Jensen, Daniel D. Lee, Dean Lee
We present a general class of machine learning algorithms called parametric matrix models.
Ranked #8 on
Image Classification
on EMNIST-Balanced
no code implementations • 4 Dec 2021 • Amber Boehnlein, Markus Diefenthaler, Cristiano Fanelli, Morten Hjorth-Jensen, Tanja Horn, Michelle P. Kuchera, Dean Lee, Witold Nazarewicz, Kostas Orginos, Peter Ostroumov, Long-Gang Pang, Alan Poon, Nobuo Sato, Malachi Schram, Alexander Scheinker, Michael S. Smith, Xin-Nian Wang, Veronique Ziegler
Advances in machine learning methods provide tools that have broad applicability in scientific research.
no code implementations • 28 Jul 2021 • Avik Sarkar, Dean Lee
The key ingredient is a fast estimate of the emulator error that becomes progressively more accurate as the emulator is improved, and the accuracy of the error estimate can be corrected using machine learning.