In this paper, we present Dynamic Normalization and Relay (DNR), an improved normalization design, to augment the spatial-temporal representation learning of any deep action recognition model, adapting to small batch size training settings.
Internet video delivery has undergone a tremendous explosion of growth over the past few years.
Anomaly detection plays a pivotal role in numerous real-world scenarios, such as industrial automation and manufacturing intelligence.
no code implementations • 17 Oct 2020 • Yunchao Wei, Shuai Zheng, Ming-Ming Cheng, Hang Zhao, LiWei Wang, Errui Ding, Yi Yang, Antonio Torralba, Ting Liu, Guolei Sun, Wenguan Wang, Luc van Gool, Wonho Bae, Junhyug Noh, Jinhwan Seo, Gunhee Kim, Hao Zhao, Ming Lu, Anbang Yao, Yiwen Guo, Yurong Chen, Li Zhang, Chuangchuang Tan, Tao Ruan, Guanghua Gu, Shikui Wei, Yao Zhao, Mariia Dobko, Ostap Viniavskyi, Oles Dobosevych, Zhendong Wang, Zhenyuan Chen, Chen Gong, Huanqing Yan, Jun He
The purpose of the Learning from Imperfect Data (LID) workshop is to inspire and facilitate the research in developing novel approaches that would harness the imperfect data and improve the data-efficiency during training.
Given the insight that pixels belonging to one instance have one or more common attributes of current instance, we bring up an one-stage instance segmentation network named Common Attribute Support Network (CASNet), which realizes instance segmentation by predicting and clustering common attributes.
This paper analyzes regularization terms proposed recently for improving the adversarial robustness of deep neural networks (DNNs), from a theoretical point of view.
However, most of the existing work in this area focus on the GNNs for node-level tasks, while little work has been done to study the robustness of the GNNs for the graph classification task.
First, to capture the local context of sparse correspondences, the network clusters unordered input correspondences by learning a soft assignment matrix.
Convolutional neural networks have been proven effective in a variety of image restoration tasks.
Deep neural networks (DNNs) are computationally/memory-intensive and vulnerable to adversarial attacks, making them prohibitive in some real-world applications.
During the training phase, we generate binary weights on-the-fly since what we actually maintain is the policy network, and all the binary weights are used in a burn-after-reading style.
Thus, a better solution to handle these critical problems is to train object detectors from scratch, which motivates our proposed method.
Depthwise separable convolution has shown great efficiency in network design, but requires time-consuming training procedure with full training-set available.
Second is the mechanism for handling multiple knowledge facts expanding from question and answer pairs.
Through explicitly regularizing the loss perturbation and the weight approximation error in an incremental way, we show that such a new optimization method is theoretically reasonable and practically effective.
Our method decomposes the semantic style transfer problem into feature reconstruction part and feature decoder part.
State-of-the-art object objectors rely heavily on the off-the-shelf networks pre-trained on large-scale classification datasets like ImageNet, which incurs learning bias due to the difference on both the loss functions and the category distributions between classification and detection tasks.
This procedure can greatly compensate the quantization error and thus yield better accuracy for low-bit DNNs.
To address (a), we design the reverse connection, which enables the network to detect objects on multi-levels of CNNs.
In this paper, we propose an alternative method to estimate room layouts of cluttered indoor scenes.
This paper focuses on a novel and challenging vision task, dense video captioning, which aims to automatically describe a video clip with multiple informative and diverse caption sentences.
The weights in the other group are responsible to compensate for the accuracy loss from the quantization, thus they are the ones to be re-trained.
In this paper, we propose a novel network compression method called dynamic network surgery, which can remarkably reduce the network complexity by making on-the-fly connection pruning.
Almost all of the current top-performing object detection networks employ region proposals to guide the search for object instances.
Results show that deep attribute approaches achieve state-of-the-art results, and outperforms existing peer methods with a significant margin, even though some benchmarks have little overlap of concepts with the pre-trained CNN models.
Second, the face image is further represented by patches of picked channels, and we search from the over-complete patch pool to activate only those most discriminant patches.