no code implementations • WMT (EMNLP) 2021 • Gregor Geigle, Jonas Stadtmüller, Wei Zhao, Jonas Pfeiffer, Steffen Eger
This paper presents our submissions to the WMT2021 Shared Task on Quality Estimation, Task 1 Sentence-Level Direct Assessment.
no code implementations • 9 Jan 2025 • Gregor Geigle, Florian Schneider, Carolin Holtermann, Chris Biemann, Radu Timofte, Anne Lauscher, Goran Glavaš
Most Large Vision-Language Models (LVLMs) to date are trained predominantly on English data, which makes them struggle to understand non-English input and fail to generate output in the desired target language.
1 code implementation • 20 Jun 2024 • Gregor Geigle, Radu Timofte, Goran Glavaš
We benchmark 12 public LVLMs on \texttt{FOCI} and show that it tests for a \textit{complementary skill} to established image understanding and reasoning benchmarks.
no code implementations • 20 Jun 2024 • Gregor Geigle, Radu Timofte, Goran Glavaš
Large vision-language models (LVLMs) have recently dramatically pushed the state of the art in image captioning and many image understanding tasks (e. g., visual question answering).
1 code implementation • 29 Jan 2024 • Marcos V. Conde, Gregor Geigle, Radu Timofte
All-In-One image restoration models can effectively restore images from various types and levels of degradation using degradation-specific information as prompts to guide the restoration model.
1 code implementation • 13 Jul 2023 • Gregor Geigle, Abhay Jain, Radu Timofte, Goran Glavaš
Modular vision-language models (Vision-LLMs) align pretrained image encoders with (frozen) large language models (LLMs) and post-hoc condition LLMs to `understand' the image input.
1 code implementation • 14 Jun 2023 • Gregor Geigle, Radu Timofte, Goran Glavaš
Vision-and-language (VL) models with separate encoders for each modality (e. g., CLIP) have become the go-to models for zero-shot image classification and image-text retrieval.
no code implementations • 12 Oct 2022 • Gregor Geigle, Chen Cecilia Liu, Jonas Pfeiffer, Iryna Gurevych
While many VEs -- of different architectures, trained on different data and objectives -- are publicly available, they are not designed for the downstream V+L tasks.
1 code implementation • ACL 2022 • Tim Baumgärtner, Kexin Wang, Rachneet Sachdeva, Max Eichler, Gregor Geigle, Clifton Poth, Hannah Sterz, Haritz Puerto, Leonardo F. R. Ribeiro, Jonas Pfeiffer, Nils Reimers, Gözde Gül Şahin, Iryna Gurevych
Recent advances in NLP and information retrieval have given rise to a diverse set of question answering tasks that are of different formats (e. g., extractive, abstractive), require different model architectures (e. g., generative, discriminative), and setups (e. g., with or without retrieval).
1 code implementation • Findings (ACL) 2022 • Jonas Pfeiffer, Gregor Geigle, Aishwarya Kamath, Jan-Martin O. Steitz, Stefan Roth, Ivan Vulić, Iryna Gurevych
In this work, we address this gap and provide xGQA, a new multilingual evaluation benchmark for the visual question answering task.
1 code implementation • 14 Apr 2021 • Gregor Geigle, Nils Reimers, Andreas Rücklé, Iryna Gurevych
We argue that there exist a wide range of specialized QA agents in literature.
1 code implementation • 22 Mar 2021 • Gregor Geigle, Jonas Pfeiffer, Nils Reimers, Ivan Vulić, Iryna Gurevych
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image.
1 code implementation • EMNLP 2021 • Andreas Rücklé, Gregor Geigle, Max Glockner, Tilman Beck, Jonas Pfeiffer, Nils Reimers, Iryna Gurevych
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements.