1 code implementation • 20 Dec 2023 • Hen Emuna, Nadav Borenstein, Xin Qian, Hyeonsu Kang, Joel Chan, Aniket Kittur, Dafna Shahaf
We release data and code; we view BARcode as a step towards addressing the challenges that have historically hindered the practical application of BID to engineering innovation.
1 code implementation • 15 Nov 2023 • Yuxia Wang, Revanth Gangi Reddy, Zain Muhammad Mujahid, Arnav Arora, Aleksandr Rubashevskii, Jiahui Geng, Osama Mohammed Afzal, Liangming Pan, Nadav Borenstein, Aditya Pillai, Isabelle Augenstein, Iryna Gurevych, Preslav Nakov
The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs.
1 code implementation • 22 Oct 2023 • Nadav Borenstein, Phillip Rust, Desmond Elliott, Isabelle Augenstein
We then pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period.
1 code implementation • 21 May 2023 • Nadav Borenstein, Karolina Stańczak, Thea Rolskov, Natália da Silva Perez, Natacha Klein Käfer, Isabelle Augenstein
We find that there is a trade-off between the stability of the word embeddings and their compatibility with the historical dataset.
Optical Character Recognition Optical Character Recognition (OCR) +1
1 code implementation • 18 May 2023 • Nadav Borenstein, Natalia da Silva Perez, Isabelle Augenstein
We find that: 1) even with scarce annotated data, it is possible to achieve surprisingly good results by formulating the problem as an extractive QA task and leveraging existing datasets and models for modern languages; and 2) cross-lingual low-resource learning for historical languages is highly challenging, and machine translation of the historical datasets to the considered target languages is, in practice, often the best-performing solution.
no code implementations • 17 Oct 2021 • Aharon Azulay, Tavi Halperin, Orestis Vantzos, Nadav Borenstein, Ofir Bibi
Temporally consistent dense video annotations are scarce and hard to collect.
1 code implementation • ACL 2021 • Chen Shani, Nadav Borenstein, Dafna Shahaf
We construct a dataset containing thousands of funny papers and use it to learn classifiers, combining findings from psychology and linguistics with recent advances in NLP.