We validated our method on domain adaptation of hand segmentation from real and simulation images.
Unsupervised image captioning is a challenging task that aims at generating captions without the supervision of image-sentence pairs, but only with images and sentences drawn from different sources and object labels detected from the images.
Hence, we consider a new realistic setting called Noisy UniDA, in which classifiers are trained with noisy labeled data from the source domain and unlabeled data with an unknown class distribution from the target domain.
Visual grounding is provided as bounding boxes to image sequences of recipes, and each bounding box is linked to an element of the workflow.
In practice, this is an important problem in UDA; as we do not know labels in target domain datasets, we do not know whether or not its distribution is identical to that in the source domain dataset.
This work addresses a new problem that learns generative adversarial networks (GANs) from multiple data collections that are each i) owned separately by different clients and ii) drawn from a non-identical distribution that comprises different classes.
Images captured in participating media such as murky water, fog, or smoke are degraded by scattered light.
In this paper, we focus on procedure execution videos, in which a human makes or repairs something and propose a method for generating procedural texts from them.
The highlights of this paper are the following two mathematical observations: first, spectral clustering's intrinsic property of an outlier cluster formation, and second, the singularity of an outlier cluster with a valid cluster number.