no code implementations • 26 Sep 2024 • Dmytro Kotovenko, Olga Grebenkova, Nikolaos Sarafianos, Avinash Paliwal, Pingchuan Ma, Omid Poursaeed, Sreyas Mohan, Yuchen Fan, Yilei Li, Rakesh Ranjan, Björn Ommer
While style transfer techniques have been well-developed for 2D image stylization, the extension of these methods to 3D scenes remains relatively unexplored.
1 code implementation • 4 Jun 2024 • Jiajun Wang, Morteza Ghahremani, Yitong Li, Björn Ommer, Christian Wachinger
Controllable text-to-image (T2I) diffusion models have shown impressive performance in generating high-quality visual content through the incorporation of various conditions.
no code implementations • 13 May 2024 • Nick Stracke, Stefan Andreas Baumann, Joshua M. Susskind, Miguel Angel Bautista, Björn Ommer
Text-to-image generative models have become a prominent and powerful tool that excels at generating high-resolution realistic images.
1 code implementation • 25 Mar 2024 • Stefan Andreas Baumann, Felix Krause, Michael Neumayr, Nick Stracke, Vincent Tao Hu, Björn Ommer
We demonstrate that these directions can be used to augment the prompt text input with fine-grained control over attributes of specific subjects in a compositional manner (control over multiple attributes of a single subject) without having to adapt the diffusion model.
no code implementations • 21 Mar 2024 • Aram Davtyan, Sepehr Sameni, Björn Ommer, Paolo Favaro
We call our model CAGE for visual Composition and Animation for video GEneration.
1 code implementation • 20 Mar 2024 • Ming Gui, Johannes S. Fischer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer
Due to the generative nature of our approach, our model reliably predicts the confidence of its depth estimates.
1 code implementation • 20 Mar 2024 • Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, Björn Ommer
The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures.
no code implementations • 28 Feb 2024 • Laura Manduchi, Kushagra Pandey, Robert Bamler, Ryan Cotterell, Sina Däubener, Sophie Fellenz, Asja Fischer, Thomas Gärtner, Matthias Kirchler, Marius Kloft, Yingzhen Li, Christoph Lippert, Gerard de Melo, Eric Nalisnick, Björn Ommer, Rajesh Ranganath, Maja Rudolph, Karen Ullrich, Guy Van Den Broeck, Julia E Vogt, Yixin Wang, Florian Wenzel, Frank Wood, Stephan Mandt, Vincent Fortuin
The field of deep generative modeling has grown rapidly and consistently over the years.
1 code implementation • 13 Jan 2024 • Michael Kölle, Gerhard Stenzel, Jonas Stein, Sebastian Zielinski, Björn Ommer, Claudia Linnhoff-Popien
In recent years, machine learning models like DALL-E, Craiyon, and Stable Diffusion have gained significant attention for their ability to generate high-resolution images from concise descriptions.
2 code implementations • 12 Dec 2023 • Johannes S. Fischer, Ming Gui, Pingchuan Ma, Nick Stracke, Stefan A. Baumann, Björn Ommer
We demonstrate that introducing FM between the Diffusion model and the convolutional decoder offers high-resolution image synthesis with reduced computational cost and model size.
no code implementations • 11 Oct 2023 • Ryan Po, Wang Yifan, Vladislav Golyanik, Kfir Aberman, Jonathan T. Barron, Amit H. Bermano, Eric Ryan Chan, Tali Dekel, Aleksander Holynski, Angjoo Kanazawa, C. Karen Liu, Lingjie Liu, Ben Mildenhall, Matthias Nießner, Björn Ommer, Christian Theobalt, Peter Wonka, Gordon Wetzstein
The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes.
no code implementations • 28 Apr 2023 • Azade Farshad, Yousef Yeganeh, Yu Chi, Chengzhi Shen, Björn Ommer, Nassir Navab
To address this limitation, we propose a novel guidance approach for the sampling process in the diffusion model that leverages bounding box and segmentation map information at inference time without additional training data.
no code implementations • CVPR 2023 • Dmytro Kotovenko, Pingchuan Ma, Timo Milbich, Björn Ommer
Experiments on established DML benchmarks show that our cross-attention conditional embedding during training improves the underlying standard DML pipeline significantly so that it outperforms the state-of-the-art.
1 code implementation • 26 Jul 2022 • Robin Rombach, Andreas Blattmann, Björn Ommer
In RDMs, a set of nearest neighbors is retrieved from an external database during training for each training instance, and the diffusion model is conditioned on these informative samples.
1 code implementation • 25 Jul 2022 • Matthias Wright, Björn Ommer
The field of neural style transfer has experienced a surge of research exploring different avenues ranging from optimization-based approaches and feed-forward models to meta-learning methods.
2 code implementations • 25 Apr 2022 • Andreas Blattmann, Robin Rombach, Kaan Oktay, Jonas Müller, Björn Ommer
Much of this success is due to the scalability of these architectures and hence caused by a dramatic increase in model complexity and in the computational resources invested in training these models.
36 code implementations • CVPR 2022 • Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond.
Ranked #2 on Layout-to-Image Generation on LayoutBench
no code implementations • 17 Sep 2021 • Faegheh Sardari, Björn Ommer, Majid Mirmehdi
Most recent view-invariant action recognition and performance assessment approaches rely on a large amount of annotated 3D skeleton data to extract view-invariant features.
1 code implementation • 9 Sep 2021 • Artsiom Sanakoyeu, Pingchuan Ma, Vadim Tschernezki, Björn Ommer
We propose to build a more expressive representation by jointly splitting the embedding space and the data hierarchically into smaller sub-parts.
no code implementations • NeurIPS 2021 • Patrick Esser, Robin Rombach, Andreas Blattmann, Björn Ommer
Thus, in contrast to pure autoregressive models, it can solve free-form image inpainting and, in the case of conditional models, local, text-guided image modification without requiring mask-specific training.
Ranked #4 on Text-to-Image Generation on Conceptual Captions
2 code implementations • NeurIPS 2021 • Timo Milbich, Karsten Roth, Samarth Sinha, Ludwig Schmidt, Marzyeh Ghassemi, Björn Ommer
Finally, we propose few-shot DML as an efficient way to consistently improve generalization in response to unknown test shifts presented in ooDML.
no code implementations • 14 Jul 2021 • Nikolai Ufer, Sabine Lang, Björn Ommer
In the following, we introduce an algorithm that allows users to search for image regions containing specific motifs or objects and find similar regions in an extensive dataset, helping art historians to analyze large digitized art collections.
2 code implementations • ICCV 2021 • Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer
There will be distinctive movement, despite evident variations caused by the stochastic nature of our world.
1 code implementation • CVPR 2021 • Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer
Given a static image of an object and a local poking of a pixel, the approach then predicts how the object would deform over time.
no code implementations • 13 May 2021 • Manuel Jahn, Robin Rombach, Björn Ommer
The use of coarse-grained layouts for controllable synthesis of complex scene images via deep generative models has recently gained popularity.
1 code implementation • CVPR 2021 • Michael Dorkenwald, Timo Milbich, Andreas Blattmann, Robin Rombach, Konstantinos G. Derpanis, Björn Ommer
Video understanding calls for a model to learn the characteristic interplay between static scene content and its dynamics: Given an image, the model must be able to predict a future progression of the portrayed scene and, conversely, a video should be explained in terms of its static image content and all the remaining characteristics not present in the initial frame.
1 code implementation • ICCV 2021 • Robin Rombach, Patrick Esser, Björn Ommer
Is a geometric model required to synthesize novel views from a single image?
Ranked #1 on Novel View Synthesis on RealEstate10K
2 code implementations • CVPR 2021 • Dmytro Kotovenko, Matthias Wright, Arthur Heimbrecht, Björn Ommer
There have been many successful implementations of neural style transfer in recent years.
1 code implementation • CVPR 2021 • Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer
Using this representation, we are able to change the behavior of a person depicted in an arbitrary posture, or to even directly transfer behavior observed in a given video sequence.
no code implementations • ICLR 2021 • Md Amirul Islam, Matthew Kowal, Patrick Esser, Sen Jia, Björn Ommer, Konstantinos G. Derpanis, Neil Bruce
Contrasting the previous evidence that neurons in the later layers of a Convolutional Neural Network (CNN) respond to complex object shapes, recent studies have shown that CNNs actually exhibit a 'texture bias': given an image with both texture and shape cues (e. g., a stylized image), a CNN is biased towards predicting the category corresponding to the texture.
12 code implementations • CVPR 2021 • Patrick Esser, Robin Rombach, Björn Ommer
We demonstrate how combining the effectiveness of the inductive bias of CNNs with the expressivity of transformers enables them to model and thereby synthesize high-resolution images.
Ranked #3 on Text-to-Image Generation on LHQC
1 code implementation • 4 Dec 2020 • Patrick Esser, Robin Rombach, Björn Ommer
It is tempting to think that machines are less prone to unfairness and prejudice.
1 code implementation • 17 Sep 2020 • Karsten Roth, Timo Milbich, Björn Ommer, Joseph Paul Cohen, Marzyeh Ghassemi
Deep Metric Learning (DML) provides a crucial tool for visual similarity and zero-shot applications by learning generalizing embedding spaces, although recent work in DML has shown strong performance saturation across training objectives.
Ranked #10 on Metric Learning on CARS196 (using extra training data)
1 code implementation • 9 Sep 2020 • Sandro Braun, Patrick Esser, Björn Ommer
Our approach leverages a generative model consisting of two disentangled representations for an object's shape and appearance and a latent variable for the part segmentation.
1 code implementation • ECCV 2020 • Robin Rombach, Patrick Esser, Björn Ommer
To open such a black box, it is, therefore, crucial to uncover the different semantic concepts a model has learned as well as those that it has learned to be invariant to.
1 code implementation • NeurIPS 2020 • Robin Rombach, Patrick Esser, Björn Ommer
Given the ever-increasing computational costs of modern machine learning models, we need to find new ways to reuse such expert models and thus tap into the resources that have been invested in their creation.
2 code implementations • ECCV 2020 • Timo Milbich, Karsten Roth, Homanga Bharadhwaj, Samarth Sinha, Yoshua Bengio, Björn Ommer, Joseph Paul Cohen
Visual Similarity plays an important role in many computer vision applications.
Ranked #13 on Metric Learning on CUB-200-2011 (using extra training data)
2 code implementations • CVPR 2020 • Patrick Esser, Robin Rombach, Björn Ommer
We formulate interpretation as a translation of hidden representations onto semantic concepts that are comprehensible to the user.
no code implementations • 12 Apr 2020 • Timo Milbich, Karsten Roth, Biagio Brattoli, Björn Ommer
The common paradigm is discriminative metric learning, which seeks an embedding that separates different training classes.
2 code implementations • CVPR 2021 • Mahmoud Afifi, Konstantinos G. Derpanis, Björn Ommer, Michael S. Brown
In contrast, our proposed method targets both over- and underexposure errors in photographs.
Ranked #4 on Image Enhancement on Exposure-Errors
1 code implementation • CVPR 2020 • Karsten Roth, Timo Milbich, Björn Ommer
Learning visual similarity requires to learn relations, typically between triplets of images.
Ranked #17 on Metric Learning on CUB-200-2011 (using extra training data)
1 code implementation • CVPR 2019 • Dmytro Kotovenko, Artsiom Sanakoyeu, Pingchuan Ma, Sabine Lang, Björn Ommer
Recent work has significantly improved the representation of color and texture and computational speed and image resolution.
8 code implementations • ICML 2020 • Karsten Roth, Timo Milbich, Samarth Sinha, Prateek Gupta, Björn Ommer, Joseph Paul Cohen
Deep Metric Learning (DML) is arguably one of the most influential lines of research for learning visual similarities with many proposed approaches every year.
no code implementations • 18 Nov 2019 • Timo Milbich, Omair Ghori, Ferran Diego, Björn Ommer
To nevertheless find those relations which can be reliably utilized for learning, we follow a divide-and-conquer strategy: We find reliable similarities by extracting compact groups of images and reliable dissimilarities by partitioning these groups into subsets, converting the complicated overall problem into few reliable local subproblems.
no code implementations • ICCV 2019 • Patrick Esser, Johannes Haux, Björn Ommer
In experiments on diverse object categories, the approach successfully recombines pose and appearance to reconstruct and retarget novel synthesized images.
2 code implementations • ICCV 2019 • Karsten Roth, Biagio Brattoli, Björn Ommer
In contrast, we propose to explicitly learn the latent characteristics that are shared by and go across object classes.
Ranked #19 on Metric Learning on CUB-200-2011 (using extra training data)
no code implementations • 17 Jun 2019 • Nikolai Ufer, Kam To Lui, Katja Schwarz, Paul Warkentin, Björn Ommer
Finding semantic correspondences is a challenging problem.
1 code implementation • CVPR 2019 • Artsiom Sanakoyeu, Vadim Tschernezki, Uta Büchler, Björn Ommer
Approaches for learning a single distance metric often struggle to encode all different types of relationships and do not generalize well.
2 code implementations • CVPR 2019 • Dominik Lorenz, Leonard Bereska, Timo Milbich, Björn Ommer
Large intra-class variation is the result of changes in multiple object characteristics.
Ranked #3 on Unsupervised Human Pose Estimation on Human3.6M
1 code implementation • 9 Nov 2018 • Nawid Sayed, Biagio Brattoli, Björn Ommer
In this paper we present a self-supervised method for representation learning utilizing two different modalities.
no code implementations • ECCV 2018 • Uta Büchler, Biagio Brattoli, Björn Ommer
Self-supervised learning of convolutional neural networks can harness large amounts of cheap unlabeled data to train powerful feature representations.
9 code implementations • ECCV 2018 • Artsiom Sanakoyeu, Dmytro Kotovenko, Sabine Lang, Björn Ommer
These and our qualitative results ranging from small image patches to megapixel stylistic images and videos show that our approach better captures the subtle nature in which a style affects content.
2 code implementations • CVPR 2018 • Patrick Esser, Ekaterina Sutter, Björn Ommer
Experiments show that the model enables conditional image generation and transfer.
no code implementations • 22 Feb 2018 • Artsiom Sanakoyeu, Miguel A. Bautista, Björn Ommer
Exemplar learning of visual similarities in an unsupervised manner is a problem of paramount importance to Computer Vision.
no code implementations • ICCV 2017 • Ömer Sümer, Tobias Dencker, Björn Ommer
Human pose analysis is presently dominated by deep convolutional networks trained with extensive manual annotations of joint locations and beyond.
2 code implementations • CVPR 2017 • Miguel A. Bautista, Artsiom Sanakoyeu, Björn Ommer
Similarity learning is then formulated as a partial ordering task with soft correspondences of all samples to classes.
1 code implementation • NeurIPS 2016 • Miguel A. Bautista, Artsiom Sanakoyeu, Ekaterina Sutter, Björn Ommer
Exemplar learning is a powerful paradigm for discovering visual similarities in an unsupervised manner.
no code implementations • 22 Feb 2015 • Borislav Antić, Björn Ommer
The goal of video parsing is to find a set of indispensable normal spatio-temporal object hypotheses that jointly explain all the foreground of a video, while, at the same time, being supported by normal training samples.