By contrast, multi-modal inpainting provides more flexible and useful controls on the inpainted content, \eg, a text prompt can be used to describe an object with richer attributes, and a mask can be used to constrain the shape of the inpainted object rather than being only considered as a missing area.
Object compositing based on 2D images is a challenging problem since it typically involves multiple processing stages such as color harmonization, geometry correction and shadow generation to generate realistic results.
Thirdly, we utilize Transformer to learn the global feature on image-level and model the global relationship of the corner points, with the assistance of a corner-query cross-attention mechanism.
We present SImProv - a scalable image provenance framework to match a query image back to a trusted database of originals and identify possible manipulations on the query.
However, most existing works are still trapped in the dilemma between higher accuracy and stronger robustness since they tend to fit a model towards robust features (not easily tampered with by adversaries) while ignoring those non-robust but highly predictive features.
To move a step further, this paper proposes GALA (Geometry-and-Lighting-Aware), a generic foreground object search method with discriminative modeling on geometry and lighting compatibility for open-world image compositing.
Prioritizing fairness is of central importance in artificial intelligence (AI) systems, especially for those societal applications, e. g., hiring systems should recommend applicants equally from different demographic groups, and risk assessment systems must eliminate racism in criminal justice.
no code implementations • 9 Sep 2021 • Yunyou Huang, Nana Wang, Suqin Tang, Li Ma, Tianshu Hao, Zihan Jiang, Fan Zhang, Guoxin Kang, Xiuxia Miao, Xianglong Guan, Ruchang Zhang, Zhifei Zhang, Jianfeng Zhan
In the real-world clinical setting, OpenClinicalAI significantly outperforms the state-of-the-art AI system.
The novel graph constructor maps a glyph's latent code to its graph representation that matches expert knowledge, which is trained to help the translation task.
More specifically, we obtain feature importance by introducing the aggregate gradient, which averages the gradients with respect to feature maps of the source model, computed on a batch of random transforms of the original clean image.
We also introduce Text Refinement Network (TexRNet), a novel text segmentation approach that adapts to the unique properties of text, e. g. non-convex boundary, diverse texture, etc., which often impose burdens on traditional segmentation models.
We aim to super-resolve digital paintings, synthesizing realistic details from high-resolution reference painting materials for very large scaling factors (e. g., 8X, 16X).
Medical image fusion is a promising approach to providing overall information from medical images of different modalities.
Artificial intelligence (AI) researchers claim that they have made great `achievements' in clinical realms.
Reference-based super-resolution (RefSR), on the other hand, has proven to be promising in recovering high-resolution (HR) details when a reference (Ref) image with similar content as that of the LR input is given.
Ranked #1 on Image Super-Resolution on CUFED5 - 4x upscaling
Although the state-of-the-art attacking techniques that incorporated the advance of Generative adversarial networks (GANs) could construct class representatives of the global data distribution among all clients, it is still challenging to distinguishably attack a specific client (i. e., user-level privacy leakage), which is a stronger privacy threat to precisely recover the private data from a specific client.
We focus on transferring the high-resolution texture from reference images to the super-resolution process without the constraint of content similarity between reference and target images, which is a key difference from previous example-based methods.
A more generalized question is that if a large proportion (e. g., more than 50%) of the face/sketch is missing, can a realistic whole face sketch/image still be estimated.
In CAAE, the face is first mapped to a latent vector through a convolutional encoder, and then the vector is projected to the face manifold conditional on age through a deconvolutional generator.
The staggering amount of streaming time series coming from the real world calls for more efficient and effective online modeling solution.