1 code implementation • ACL 2022 • Valentin Hofmann, Hinrich Schuetze, Janet Pierrehumbert
We introduce FLOTA (Few Longest Token Approximation), a simple yet effective method to improve the tokenization of pretrained language models (PLMs).
1 code implementation • Findings (NAACL) 2022 • Victor Steinborn, Philipp Dufter, Haris Jabbar, Hinrich Schuetze
Bias research in NLP is a rapidly growing and developing field.
no code implementations • 14 Aug 2024 • Subhabrata Dutta, Timo Kaufmann, Goran Glavaš, Ivan Habernal, Kristian Kersting, Frauke Kreuter, Mira Mezini, Iryna Gurevych, Eyke Hüllermeier, Hinrich Schuetze
While there is a widespread belief that artificial general intelligence (AGI) -- or even superhuman AI -- is imminent, complex problems in expert domains are far from being solved.
no code implementations • 23 May 2023 • Nora Kassner, Oyvind Tafjord, Ashish Sabharwal, Kyle Richardson, Hinrich Schuetze, Peter Clark
To address this, our goals are to make model beliefs and their inferential relationships explicit, and to resolve inconsistencies that may exist, so that answers are supported by interpretable chains of reasoning drawn from a consistent network of beliefs.
1 code implementation • Findings (ACL) 2022 • Antonis Maronikolakis, Axel Wisiorek, Leah Nann, Haris Jabbar, Sahana Udupa, Hinrich Schuetze
Building on current work on multilingual hate speech (e. g., Ousidhoum et al. (2019)) and hate speech reduction (e. g., Sap et al. (2020)), we present XTREMESPEECH, a new hate speech dataset containing 20, 297 social media passages from Brazil, Germany, India and Kenya.
no code implementations • ACL 2022 • Sanjeev Kumar Karn, Ning Liu, Hinrich Schuetze, Oladimeji Farri
A cascade of tasks are required to automatically generate an abstractive summary of the typical information-rich radiology report.
no code implementations • EACL (AdaptNLP) 2021 • Sanjeev Kumar Karn, Francine Chen, Yan-Ying Chen, Ulli Waltinger, Hinrich Schuetze
Interleaved texts, where posts belonging to different threads occur in a sequence, commonly occur in online chat posts, so that it can be time-consuming to quickly obtain an overview of the discussions.
no code implementations • 3 Oct 2016 • Hinrich Schuetze, Heike Adel, Ehsaneddin Asgari
We introduce the first generic text representation model that is completely nonsymbolic, i. e., it does not require the availability of a segmentation or tokenization method that attempts to identify words or other symbolic units in text.
no code implementations • 16 Jan 2013 • Hinrich Schuetze, Christian Scheible
A key characteristic of work on deep learning and neural networks in general is that it relies on representations of the input that support generalization, robust inference, domain adaptation and other desirable functionalities.
no code implementations • 13 Jan 2013 • Christian Scheible, Hinrich Schuetze
This makes the analysis of learned structures particularly difficult.