Image-mixing augmentations (e. g., Mixup or CutMix), which typically mix two images, have become de-facto training tricks for image classification.
Second, we propose a self-refinement method that refines the pseudo instance labels in a self-supervised scheme and employs them to the training in an online manner while resolving the semantic drift problem.
We consider a class-incremental semantic segmentation (CISS) problem.
We propose a novel and effective input transformation based adversarial defense method against gray- and black-box attack, which is computationally efficient and does not require any adversarial training or retraining of a classification model.
Prevalent scenario of continual learning, however, assumes disjoint sets of classes as tasks and is less realistic rather artificial.
In this paper, a new method of training pipeline is discussed to achieve significant performance on the task of anti-spoofing with RGB image.
We then investigate the channel configuration of a model by searching network architectures concerning the channel configuration under the computational cost restriction.
The cost of annotating transcriptions for large speech corpora becomes a bottleneck to maximally enjoy the potential capacity of deep neural network-based automatic speech recognition models.
In this paper, we present a new network architecture search (NAS) procedure to find a network that guarantees both full-precision (FLOAT32) and quantized (INT8) performances.
Despite apparent human-level performances of deep neural networks (DNN), they behave fundamentally differently from humans.
To solve the first problem, we introduce the new extremely lightweight portrait segmentation model SINet, containing an information blocking decoder and spatial squeeze modules.
Ranked #1 on Portrait Segmentation on EG1800
We first assume that the priors of future samples can be generated in an independently and identically distributed (i. i. d.)
(iii) The proposed regression is embedded into a generative model, and the whole procedure is developed by the variational autoencoder framework.
In our qualitative and quantitative analysis on the EG1800 dataset, we show that our method outperforms various existing lightweight segmentation models.
In this paper, we propose a new multi-scale face detector having an extremely tiny number of parameters (EXTD), less than 0. 1 million, as well as achieving comparable performance to deep heavy detectors.
Ranked #11 on Face Detection on WIDER Face (Medium)
Regional dropout strategies have been proposed to enhance the performance of convolutional neural network classifiers.
Ranked #3 on Image Captioning on COCO
To resolve this problem, we propose a new block called Concentrated-Comprehensive Convolution (C3) which applies the asymmetric convolutions before the depth-wise separable dilated convolution to compensate for the information loss due to dilated convolution.
This block enables MC-GAN to generate a realistic object image with the desired background by controlling the amount of the background information from the given base image using the foreground information from the text attributes.
In addition to this performance enhancement problem, we show that the proposed PGN can be adopted to solve the classical adversarial problem without utilizing the information on the target classifier.
In this work, we introduce a new algorithm for analyzing a diagram, which contains visual and textual information in an abstract and integrated way.
Our method learns hundreds to thousand times faster than the conventional methods by learning only a handful of core cluster information, which shows that deep RL agents can effectively learn through the shared knowledge from other agents.
In contrast to the existing trackers using deep networks, the proposed tracker is designed to achieve a light computation as well as satisfactory tracking accuracy in both location and scale.
Semantic segmentation, like other fields of computer vision, has seen a remarkable performance advance by the use of deep convolution neural networks.
In this paper, we consider moving dynamics of co-occurring objects for path prediction in a scene that includes crowded moving objects.