Through the series of experiments and human evaluation, the proposed method renders realistic co-speech gestures not only when all input modalities are given but also when the input modalities are missing or noisy.
In this paper, we introduce a novel data augmentation method for skeleton-based action recognition tasks, which can effectively generate high-quality and diverse sequential actions.
Image inpainting is an old problem in computer vision that restores occluded regions and completes damaged images.
Image inpainting is a technique of completing missing pixels such as occluded region restoration, distracting objects removal, and facial completion.
With these modules, then we employ reinforcement learning in search of an optimal image denoising network at a module level.
To achieve the goal, the proposed method predicts expression from the sentences using a text classification model based on a pretrained language model and generates gestures using the gate recurrent unit-based autoregressive model.
In order to train the network for more robust performance in noisy environments, we introduce the LOw Variant Orthogonal (LOVO) loss.
Graph convolutional networks (GCNs), which can model the human body skeletons as spatial and temporal graphs, have shown remarkable potential in skeleton-based action recognition.
Our model relies on pretrained StyleGAN, and the proposed model is trained in a self-supervised manner without any manual annotations or datasets.
To alleviate the issue, many 3D-aware GANs have been proposed and shown notable results, but 3D GANs struggle with editing semantic attributes.
Although the progress of generative models enables the stylization of a portrait, obtaining the stylized image in canonical view is still a challenging task.
However, the performance of the dynamic filter might be degraded since simple feature pooling is used to reduce the computational resource in the IDF part.
To address this issue, we propose a novel GAN model, i. e., AU-GAN, which has an asymmetric architecture for adverse domain translation.
Due to the fast processing-speed and robustness it can achieve, skeleton-based action recognition has recently received the attention of the computer vision community.
Automatic segmentation of infected regions in computed tomography (CT) images is necessary for the initial diagnosis of COVID-19.
This lightweight dynamic filter is applied to the front-end of KWS to enhance the separability of the input data.
Segmenting medical images accurately and reliably is important for disease diagnosis and treatment.
Ranked #8 on Medical Image Segmentation on ETIS-LARIBPOLYPDB
In this paper, a built-in memory module for semantic segmentation is proposed to overcome these problems.
Anomaly detection in video streams is a challenging problem because of the scarcity of abnormal events and the difficulty of accurately annotating them.
To compensate for the sparseness of labeled data, the proposed method utilizes a large amount of synthetic COVID-19 CT images and adjusts the networks from the source domain (synthetic data) to the target domain (real data) with a cross-domain training mechanism.
The goal of face attribute editing is altering a facial image according to given target attributes such as hair color, mustache, gender, etc.
In this paper, we propose an unsupervised domain adaptation based segmentation network to improve the segmentation performance of the infection areas in COVID-19 CT images.
However, training a deep-learning model requires large volumes of data, and medical staff faces a high risk when collecting COVID-19 CT data due to the high infectivity of the disease.
In machine learning, the performance of a classifier depends on both the classifier model and the dataset.
5 code implementations • 5 May 2020 • Andreas Lugmayr, Martin Danelljan, Radu Timofte, Namhyuk Ahn, Dongwoon Bai, Jie Cai, Yun Cao, Junyang Chen, Kaihua Cheng, SeYoung Chun, Wei Deng, Mostafa El-Khamy, Chiu Man Ho, Xiaozhong Ji, Amin Kheradmand, Gwantae Kim, Hanseok Ko, Kanghyu Lee, Jungwon Lee, Hao Li, Ziluan Liu, Zhi-Song Liu, Shuai Liu, Yunhua Lu, Zibo Meng, Pablo Navarrete Michelini, Christian Micheloni, Kalpesh Prajapati, Haoyu Ren, Yong Hyeok Seo, Wan-Chi Siu, Kyung-Ah Sohn, Ying Tai, Rao Muhammad Umer, Shuangquan Wang, Huibing Wang, Timothy Haoning Wu, Hao-Ning Wu, Biao Yang, Fuzhi Yang, Jaejun Yoo, Tongtong Zhao, Yuanbo Zhou, Haijie Zhuo, Ziyao Zong, Xueyi Zou
This paper reviews the NTIRE 2020 challenge on real world super-resolution.
Performance of learning based Automatic Speech Recognition (ASR) is susceptible to noise, especially when it is introduced in the testing data while not presented in the training data.
Audio waveform generation can then be performed using the proposed network.
As development dataset which is spoken in Cebuano and Mandarin, we could prepare the evaluation trials through preliminary experiments to compensate the language mismatched condition.
The experimental results show that the use of duration and score fusion improves language recognition performance by 5% relative in LRiMLC15 cost.