no code implementations • 20 Dec 2023 • Emily Groves, Minhong Wang, Yusuf Abdulle, Holger Kunz, Jason Hoelscher-Obermaier, Ronin Wu, Honghan Wu
Five setups were designed to assess ML and FT model performance across different data availability scenarios. Datasets for curation tasks included: task 1 (620, 386), task 2 (611, 430), and task 3 (617, 381), maintaining a 50:50 positive versus negative ratio.
1 code implementation • 20 Oct 2023 • Henning Bartsch, Ole Jorgensen, Domenic Rosati, Jason Hoelscher-Obermaier, Jacob Pfau
Using this test, we find that despite increases in self-consistency, models usually place significant weight on alternative, inconsistent answers.
1 code implementation • 27 May 2023 • Jason Hoelscher-Obermaier, Julia Persson, Esben Kran, Ioannis Konstas, Fazl Barez
We use this improved benchmark to evaluate recent model editing techniques and find that they suffer from low specificity.
no code implementations • 27 Oct 2022 • Jason Hoelscher-Obermaier, Edward Stevinson, Valentin Stauber, Ivaylo Zhelev, Victor Botev, Ronin Wu, Jeremy Minton
The most interesting words in scientific texts will often be novel or rare.
no code implementations • 1 Nov 2021 • Mihalis Gongolidis, Jeremy Minton, Ronin Wu, Valentin Stauber, Jason Hoelscher-Obermaier, Viktor Botev
Two new document classification data-sets are collated from general and chemistry scientific journals to compare the proposed update training strategies with benchmark models.