In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection.
Ranked #1 on Image Super-Resolution on Manga109 - 4x upscaling
Specifically, to tackle the first issue, we present a spatial-temporal convolutional self-attention layer with a theoretical understanding to exploit the locality information.
The importance of locality mechanisms is validated in two ways: 1) A wide range of design choices (activation function, layer placement, expansion ratio) are available for incorporating locality mechanisms and all proper choices can lead to a performance gain over the baseline, and 2) The same locality mechanism is successfully applied to 4 vision transformers, which shows the generalization of the locality concept.
Ranked #255 on Image Classification on ImageNet
Existing attack methods on the construction of adversarial examples use such $\ell_p$ distance as a similarity metric to perturb samples.
Relying on this, we learn a defense transformer to counterattack the adversarial examples by parameterizing the affine transformations and exploiting the boundary information of DNNs.
In this paper, rather than sampling from the predefined prior distribution, we propose an LCCGAN model with local coordinate coding (LCC) to improve the performance of generating data.
In this paper, we aim to review various adversarial attack and defense methods on chest X-rays.
Extensive experiments with paired training data and unpaired real-world data demonstrate our superiority over existing methods.
More critically, our method achieves much higher accuracy on 4-bit quantization than the existing data free quantization method.
In these problems, there are two key challenges: the query budget is often limited; the ratio between classes is highly imbalanced.
Multiple marginal matching problem aims at learning mappings to match a source domain to multiple target domains and it has attracted great attention in many applications, such as multi-domain image translation.
ii) the $W$-distance of a specific layer to the target distribution tends to decrease along training iterations.
We study the joint distribution matching problem which aims at learning bidirectional mappings to match the joint distribution of two domains.
However, most deep learning methods employ feed-forward architectures, and thus the dependencies between LR and HR images are not fully exploited, leading to limited learning performance.