Search Results for author: Gregor Geigle

Found 13 papers, 9 papers with code

TUDa at WMT21: Sentence-Level Direct Assessment with Adapters

no code implementations WMT (EMNLP) 2021 Gregor Geigle, Jonas Stadtmüller, Wei Zhao, Jonas Pfeiffer, Steffen Eger

This paper presents our submissions to the WMT2021 Shared Task on Quality Estimation, Task 1 Sentence-Level Direct Assessment.

Sentence

Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model

no code implementations9 Jan 2025 Gregor Geigle, Florian Schneider, Carolin Holtermann, Chris Biemann, Radu Timofte, Anne Lauscher, Goran Glavaš

Most Large Vision-Language Models (LVLMs) to date are trained predominantly on English data, which makes them struggle to understand non-English input and fail to generate output in the desired target language.

Language Modeling Language Modelling +1

African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification

1 code implementation20 Jun 2024 Gregor Geigle, Radu Timofte, Goran Glavaš

We benchmark 12 public LVLMs on \texttt{FOCI} and show that it tests for a \textit{complementary skill} to established image understanding and reasoning benchmarks.

Benchmarking Classification +3

Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?

no code implementations20 Jun 2024 Gregor Geigle, Radu Timofte, Goran Glavaš

Large vision-language models (LVLMs) have recently dramatically pushed the state of the art in image captioning and many image understanding tasks (e. g., visual question answering).

Caption Generation Hallucination +3

InstructIR: High-Quality Image Restoration Following Human Instructions

1 code implementation29 Jan 2024 Marcos V. Conde, Gregor Geigle, Radu Timofte

All-In-One image restoration models can effectively restore images from various types and levels of degradation using degradation-specific information as prompts to guide the restoration model.

Deblurring Image Denoising +4

mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs

1 code implementation13 Jul 2023 Gregor Geigle, Abhay Jain, Radu Timofte, Goran Glavaš

Modular vision-language models (Vision-LLMs) align pretrained image encoders with (frozen) large language models (LLMs) and post-hoc condition LLMs to `understand' the image input.

Image Captioning

Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations

1 code implementation14 Jun 2023 Gregor Geigle, Radu Timofte, Goran Glavaš

Vision-and-language (VL) models with separate encoders for each modality (e. g., CLIP) have become the go-to models for zero-shot image classification and image-text retrieval.

Image Classification Image-text Retrieval +3

One does not fit all! On the Complementarity of Vision Encoders for Vision and Language Tasks

no code implementations12 Oct 2022 Gregor Geigle, Chen Cecilia Liu, Jonas Pfeiffer, Iryna Gurevych

While many VEs -- of different architectures, trained on different data and objectives -- are publicly available, they are not designed for the downstream V+L tasks.

UKP-SQUARE: An Online Platform for Question Answering Research

1 code implementation ACL 2022 Tim Baumgärtner, Kexin Wang, Rachneet Sachdeva, Max Eichler, Gregor Geigle, Clifton Poth, Hannah Sterz, Haritz Puerto, Leonardo F. R. Ribeiro, Jonas Pfeiffer, Nils Reimers, Gözde Gül Şahin, Iryna Gurevych

Recent advances in NLP and information retrieval have given rise to a diverse set of question answering tasks that are of different formats (e. g., extractive, abstractive), require different model architectures (e. g., generative, discriminative), and setups (e. g., with or without retrieval).

Explainable Models Information Retrieval +2

Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval

1 code implementation22 Mar 2021 Gregor Geigle, Jonas Pfeiffer, Nils Reimers, Ivan Vulić, Iryna Gurevych

Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image.

Cross-Modal Retrieval Retrieval

AdapterDrop: On the Efficiency of Adapters in Transformers

1 code implementation EMNLP 2021 Andreas Rücklé, Gregor Geigle, Max Glockner, Tilman Beck, Jonas Pfeiffer, Nils Reimers, Iryna Gurevych

Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements.

Cannot find the paper you are looking for? You can Submit a new open access paper.