1 code implementation • 25 Mar 2024 • Stefan Andreas Baumann, Felix Krause, Michael Neumayr, Nick Stracke, Vincent Tao Hu, Björn Ommer
We demonstrate that these directions can be used to augment the prompt text input with fine-grained control over attributes of specific subjects in a compositional manner (control over multiple attributes of a single subject) without having to adapt the diffusion model.
no code implementations • 20 Mar 2024 • Ming Gui, Johannes S. Fischer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer
Due to the generative nature of our approach, our model reliably predicts the confidence of its depth estimates.
1 code implementation • 20 Mar 2024 • Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, Björn Ommer
The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures.
1 code implementation • 21 Jan 2024 • Katherine Crowson, Stefan Andreas Baumann, Alex Birch, Tanishq Mathew Abraham, Daniel Z. Kaplan, Enrico Shippole
We present the Hourglass Diffusion Transformer (HDiT), an image generative model that exhibits linear scaling with pixel count, supporting training at high-resolution (e. g. $1024 \times 1024$) directly in pixel-space.
1 code implementation • Proceedings of the International Society for Music Information Retrieval Conference (ISMIR) 2021 • Stefan Andreas Baumann
In recent years, complex convolutional neural network architectures such as the Inception architecture have been shown to offer significant improvements over previous architectures in image classification.
Ranked #1 on Key Detection on Giantsteps (using extra training data)