Based on these findings, we propose a principle method to improve the robustness of Transformer models by automatically searching for an optimal fusion strategy regarding input data.
On the other hand, Neural Light Field (NeLF) presents a more straightforward representation over NeRF in novel view synthesis -- the rendering of a pixel amounts to one single forward pass without ray-marching.
In addition, our model can extract visual information as suggested by the text prompt, e. g., "an object in image one is moving northeast", and generate corresponding videos.
Our approach achieves comparable and better performance, when compared not only to existing quantization techniques with INT32 multiplication or floating-point arithmetic, but also to the full-precision counterparts, achieving state-of-the-art performance.
Camera, and associated with its objects within the field of view, localization could benefit many computer vision fields, such as autonomous driving, robot navigation, and augmented reality (AR).
However, an extra quantization step (i. e. PIM quantization), typically with limited resolution due to hardware constraints, is required to convert the analog computing results into digital domain.
Graph-based Anomaly Detection (GAD) is becoming prevalent due to the powerful representation abilities of graphs as well as recent advances in graph mining techniques.
Human motion retargeting aims to transfer the motion of one person in a "driving" video or set of images to another person.
We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled.
To facilitate animation and prevent the leakage of the shape of the driving object, we disentangle shape and pose of objects in the region space.
Ranked #1 on Video Reconstruction on Tai-Chi-HD (512)
Existing image outpainting methods pose the problem as a conditional image-to-image translation task, often generating repetitive structures and textures by replicating the content available in the input image.
A common assumption in multimodal learning is the completeness of training data, i. e., full modalities are available in all training examples.
In this work, we aim to address these issues by introducing a teacher network that provides a search space in which efficient network architectures can be found, in addition to performing knowledge distillation.
In deep model compression, the recent finding "Lottery Ticket Hypothesis" (LTH) (Frankle & Carbin, 2018) pointed out that there could exist a winning ticket (i. e., a properly pruned sub-network together with original weight initialization) that can achieve competitive performance than the original dense network.
Unlike existing supervised translation methods that require model-level similarity to preserve consistent structure representation for both real images and fake renderings, our method adopts an unsupervised solution to work on arbitrary hair models.
In this paper, we tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video.
In this paper, we propose an Ecologically-Inspired GENetic (EIGEN) approach that uses the concept of succession, extinction, mimicry, and gene duplication to search neural network structure from scratch with poorly initialized simple network and few constraints forced during the evolution, as we assume no prior knowledge about the task domain.
In this paper, we propose Factorized Adversarial Networks (FAN) to solve unsupervised domain adaptation problems for image classification tasks.
Automatic and accurate Gleason grading of histopathology tissue slides is crucial for prostate cancer diagnosis, treatment, and prognosis.
To accommodate our study, we first collect two distinct datasets, a large image dataset from Flickr and annotated by Amazon Mechanical Turk, and a small dataset of real personal albums rated by owners.