Based on this single-image-annotation experiment, we design an experiment to quickly annotate an entire data set.
Using this space we can build dictionaries of characters in an animation videos, and define specialized classifiers for specific stylistic content (e. g., characters in a specific animation series) with very little user effort.
In this work, we present a music-driven motion synthesis framework that generates long-term sequences of human motions which are synchronized with the input beats, and jointly form a global structure that respects a specific dance genre.
OIA models interactions between the sentence-corresponding image and important regions in other images of the sequence.
Although simple, our framework synthesizes high-quality face reconstructions, demonstrating that given the statistical prior of a human face, multiple aligned pixelated frames contain sufficient information to reconstruct a high-quality approximation of the original signal.
no code implementations • 1 Sep 2020 • Paul L. Rosin, Yu-Kun Lai, David Mould, Ran Yi, Itamar Berger, Lars Doyle, Seungyong Lee, Chuan Li, Yong-Jin Liu, Amir Semmo, Ariel Shamir, Minjung Son, Holger Winnemoller
Despite the recent upsurge of activity in image-based non-photorealistic rendering (NPR), and in particular portrait image stylisation, due to the advent of neural style transfer, the state of performance evaluation in this field is limited, especially compared to the norms in the computer vision and machine learning communities.
Drawing inspiration from Parameter Continuation methods, we propose steering the training process to consider specific features in the input more than others, through gradual shifts in the input domain.
Given a 3D object in isolation, our functional similarity network (fSIM-NET), a variation of the triplet network, is trained to predict the functionality of the object by inferring functionality-revealing interaction contexts.
We find that the network is capable of estimating the camera pose from the lean images, and it does so not by memorization but by some measure of geometric learning of the geographical area.
In this paper we perform image resizing in feature space where the deep layers of a neural network contain rich important semantic information.
We evaluate our method on public portrait image datasets, and show that it outperforms other state-of-art general image completion methods.
We present a generative neural network which enables us to generate plausible 3D indoor scenes in large quantities and varieties, easily and highly efficiently.
The composed picture is generated by seamlessly stitching several photographs in agreement with the sketch and text labels; these are found by searching the Internet.