In the basic generation, we take advantage of the pretrained image diffusion model, and adapt it to a high-quality open-domain vertical video generator for mobile devices.
Low-light conditions not only hamper human visual experience but also degrade the model's performance on downstream vision tasks.
To generate videos, we extend the capabilities of a pretrained text-to-image diffusion model through a two-stage process.
We present VideoFactory, an innovative framework for generating high-quality open-domain videos.
Ranked #1 on Text-to-Video Generation on WebVid
Language-guided image generation has achieved great success nowadays by using diffusion models.
Low light conditions not only degrade human visual experience, but also reduce the performance of downstream machine analytics.
We first present a newdataset S5Mars for Semi-SuperviSed learning on Mars Semantic Segmentation, which contains 6K high-resolution images and is sparsely annotated based on confidence, ensuring the high quality of labels.
For segmentation, we extend supervised inter-class contrastive learning into an element-wise mode and use online pseudo labels for supervision on unlabeled areas.
To reduce the burden of building new datasets for low light conditions, we make full use of existing normal light data and explore how to adapt face detectors from normal light to low light.
In this article, we address the problem by jointly considering the intrinsic properties of stylization and temporal consistency.
Moreover, built upon the sampling network, we present design draft to real fashion item translation network (D2RNet), where two separate translation streams that focus on texture and shape, respectively, are combined tactfully to get both benefits.
The stylized text is finally embellished with decor where the placement of the decor is carefully determined by a novel structure-aware strategy.
The key idea is to train our network to accomplish both the objective of style transfer and style removal, so that it can learn to disentangle and recombine the content and style features of text effects images.
Based on the decomposition, subsequent lightness enhancement is conducted on illumination by an enhancement network called Enhance-Net, and for joint denoising there is a denoising operation on reflectance.
Ranked #6 on Low-Light Image Enhancement on DICM