However, for generalization tasks, the current fine-tuning methods for CLIP, such as CoOp and CoCoOp, demonstrate relatively low performance on some fine-grained datasets.
When CLIP is used for depth estimation tasks, the patches, divided from the input images, can be combined with a series of semantic descriptions of the depth information to obtain similarity results.
Training high-quality instance segmentation models requires an abundance of labeled images with instance masks and classifications, which is often expensive to procure.
ML model design either starts with an interpretable model or a Blackbox and explains it post hoc.
We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance.
Extensive experiments on large-scale computer tomography (CT) datasets of lung images show that our method improves the performance of many downstream prediction and segmentation tasks.
(3) We empirically show that the harmful spurious features can be detected by observing the learning dynamics of the DNN's early layers.
We use the Variational Auto-Encoder (VAE) framework to encode the chemical structures of molecules and use the drug-drug similarity information obtained from the hierarchy to induce the clustering of drugs in hyperbolic space.
no code implementations • 6 Jul 2022 • Eric Loreaux, Ke Yu, Jonas Kemp, Martin Seneviratne, Christina Chen, Subhrajit Roy, Ivan Protsyuk, Natalie Harris, Alexander D'Amour, Steve Yadlowsky, Ming-Jun Chen
We propose a joint model of intervention policy and adverse event risk as a means to explicitly communicate the model's assumptions about future interventions.
We use graph neural networks to incorporate the relationship between different anatomical regions.
The critical component in our framework is an anatomy-guided attention module that aids the downstream observation network in focusing on the relevant anatomical regions generated by the anatomy network.
Image Signal Processor (ISP) is a crucial component in digital cameras that transforms sensor signals into images for us to perceive and understand.
However, we observe that the contrastive loss does not always sufficiently guide which features are extracted, a behavior that can negatively impact the performance on downstream tasks via "shortcuts", i. e., by inadvertently suppressing important predictive features.
Experiments on large-scale Computer Tomography (CT) datasets of lung images show that our approach compares favorably to baseline methods that do not account for the context.
Video super-resolution (VSR) approaches tend to have more components than the image counterparts as they need to exploit the additional temporal dimension.
Aside from the contributions to deformable alignment, our formulation inspires a more flexible approach to introduce offset diversity to flow-based alignment, improving its performance.
During training, we adopt a hierarchical structure that simultaneously generates a low-resolution version of the image and a randomly selected sub-volume of the high-resolution image.
We use the Variational Auto-Encoder (VAE) framework to encode the chemical structures of molecules and use the knowledge-based drug-drug similarity to induce the clustering of drugs in hyperbolic space.
In this work, we propose a novel Video Restoration framework with Enhanced Deformable networks, termed EDVR, to address these challenges.
Ranked #2 on Deblurring on REDS
To leverage this, we propose Path-Restore, a multi-path CNN with a pathfinder that can dynamically select an appropriate route for each image region.
Deep convolutional neural network has demonstrated its capability of learning a deterministic mapping for the desired imagery effect.
Most methods in deep-RL achieve good results via the maximization of the reward signal provided by the environment, typically in the form of discounted cumulative returns.
To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN).
Ranked #2 on Image Super-Resolution on Set5 - 4x upscaling
In this paper, we show that it is possible to recover textures faithful to semantic classes.
Ranked #54 on Image Super-Resolution on BSD100 - 4x upscaling
Lossy compression introduces complex compression artifacts, particularly blocking artifacts, ringing effects and blurring.