Search Results for author: Matthias Minderer

Found 16 papers, 12 papers with code

Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection

no code implementations21 Mar 2024 Tim Salzmann, Markus Ryll, Alex Bewley, Matthias Minderer

We provide a single-stage recipe to train this model on a mixture of object and relationship detection data.

Decoder Object +4

Improving fine-grained understanding in image-text pre-training

no code implementations18 Jan 2024 Ioana Bica, Anastasija Ilić, Matthias Bauer, Goker Erdogan, Matko Bošnjak, Christos Kaplanis, Alexey A. Gritsenko, Matthias Minderer, Charles Blundell, Razvan Pascanu, Jovana Mitrović

We introduce SPARse Fine-grained Contrastive Alignment (SPARC), a simple method for pretraining more fine-grained multimodal representations from image-text pairs.

object-detection Object Detection

Video OWL-ViT: Temporally-consistent open-world localization in video

no code implementations ICCV 2023 Georg Heigold, Matthias Minderer, Alexey Gritsenko, Alex Bewley, Daniel Keysers, Mario Lučić, Fisher Yu, Thomas Kipf

Our model is end-to-end trainable on video data and enjoys improved temporal consistency compared to tracking-by-detection baselines, while retaining the open-world capabilities of the backbone detector.

Decoder Object +1

Scaling Open-Vocabulary Object Detection

1 code implementation NeurIPS 2023 Matthias Minderer, Alexey Gritsenko, Neil Houlsby

However, with OWL-ST, we can scale to over 1B examples, yielding further large improvement: With an L/14 architecture, OWL-ST improves AP on LVIS rare classes, for which the model has seen no human box annotations, from 31. 2% to 44. 6% (43% relative improvement).

 Ranked #1 on Zero-Shot Object Detection on LVIS v1.0 minival (using extra training data)

Image Classification Language Modelling +4

Decoder Denoising Pretraining for Semantic Segmentation

1 code implementation23 May 2022 Emmanuel Brempong Asiedu, Simon Kornblith, Ting Chen, Niki Parmar, Matthias Minderer, Mohammad Norouzi

We propose a decoder pretraining approach based on denoising, which can be combined with supervised pretraining of the encoder.

Decoder Denoising +2

SCENIC: A JAX Library for Computer Vision Research and Beyond

1 code implementation CVPR 2022 Mostafa Dehghani, Alexey Gritsenko, Anurag Arnab, Matthias Minderer, Yi Tay

Scenic is an open-source JAX library with a focus on Transformer-based models for computer vision research and beyond.

Automatic Shortcut Removal for Self-Supervised Representation Learning

no code implementations ICML 2020 Matthias Minderer, Olivier Bachem, Neil Houlsby, Michael Tschannen

In self-supervised visual representation learning, a feature extractor is trained on a "pretext task" for which labels can be generated cheaply, without human annotation.

Representation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.