We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner.
Ranked #1 on Image Generation on FFHQ-U
In this survey, we first present a generic framework called generalized OOD detection, which encompasses the five aforementioned problems, i. e., AD, ND, OSR, OOD detection, and OD.
Although pre-trained models (PLMs) have achieved remarkable improvements in a wide range of NLP tasks, they are expensive in terms of time and resources.
Additionally, we propose a first set of metrics to quantitatively evaluate the accuracy as well as the perceptual quality of the temporal evolution.
Ranked #10 on Video Super-Resolution on Vid4 - 4x upscaling (PSNR metric)
An approach to tackle this issue is to introduce Positional Encoding (PE) of nodes, and inject it into the input layer, like in Transformers.
In this paper, we unify these two families of approaches from the angle of manifold learning and propose TLDR, a dimensionality reduction method for generic input spaces that is porting the simple self-supervised learning framework of Barlow Twins to a setting where it is hard or impossible to define an appropriate set of distortions by hand.
We show that simple pitch and periodicity conditioning is insufficient for reducing this error relative to using autoregression.
We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost.