In 3D recognition, to fuse multi-scale structure information, existing methods apply hierarchical frameworks stacked by multiple fusion layers for integrating current relative locations with structure information from the previous level.
In surveying each category, we further discuss the design principles and analyze the strength and weaknesses to clarify the landscape of existing EEMs, thus making easily understanding the research trends of EEMs.
Deep neural networks (DNNs) are found to be vulnerable to adversarial attacks, and various methods have been proposed for the defense.
NAS without training (WOT) score is such a metric, which estimates the final trained accuracy of the architecture through the ability to distinguish different inputs in the activation layer.
The learnt deformable kernel is then utilized in convolving the input frames for predicting the interpolated frame.
To solve this issue, we propose a Cross-Domain Predictor (CDP), which is trained based on the existing NAS benchmark datasets (e. g., NAS-Bench-101), but can be used to find high-performance architectures in large-scale search spaces.
Evolutionary computation-based neural architecture search (ENAS) is a popular technique for automating architecture design of deep neural networks.
In this paper, we propose a high-quality facial expression editing method for talking face videos, allowing the user to control the target emotion in the edited video continuously.
Performance predictors can greatly alleviate the prohibitive cost of NAS by directly predicting the performance of DNNs.
A new instance matting metric called instance matting quality (IMQ) is proposed, which addresses the absence of a unified and fair means of evaluation emphasizing both instance recognition and matting quality.
Specifically, the proposed approach is built by learning the knowledge of high-level experts in designing state-of-the-art architectures, and then the new architecture is directly generated upon the knowledge learned.
In this paper, we propose a method to generate talking-face videos with continuously controllable expressions in real-time.
This is achieved by introducing an intermediate representation, i. e., Q-representation, in the querying stage to serve as a bridge between the embedding stage and task heads.
As a consequence, it requires the designers to develop expertise in both CF and DNNs, which limits the application of deep learning methods in CF and the accuracy of recommended results.
The paper conducts efficient comparison experiments on eight ENAS algorithms with high GPU utilization on this platform.
Specifically, a homogeneous architecture augmentation algorithm is proposed in HAAP to generate sufficient training data taking the use of homogeneous representation.
Motion style transfer is an important problem in many computer graphics and computer vision applications, including human animation, games, and robotics.
In this paper, we propose a novel approach, Heart-Darts, to efficiently classify the ECG signals by automatically designing the CNN model with the differentiable architecture search (i. e., Darts, a cell-based neural architecture search method).
Despite the significant progress made by deep learning in natural image matting, there has been so far no representative work on deep learning for video matting due to the inherent technical challenges in reasoning temporal domain and lack of large-scale video matting datasets.
Specifically, we consider and learn 20 classes of matting patterns, and propose to extend the conventional trimap to semantic trimap.
Synchronous methods are widely used in distributed training the Deep Neural Networks (DNNs).
Performance predictors are a type of regression models which can assist to accomplish the search, while without exerting much computational resource.
Deep Neural Networks (DNNs) have achieved great success in many applications.
Hyperspectral images (HSIs) are susceptible to various noise factors leading to the loss of information, and the noise restricts the subsequent HSIs object detection and classification tasks.
GSNet utilizes a unique four-way feature extraction and fusion scheme and directly regresses 6DoF poses and shapes in a single forward pass.
Ranked #1 on 3D Car Instance Understanding on ApolloCar3D
Specifically, the performance of each worker is evaluatedfirst based on the fact in the previous epoch, and then the batch size and datasetpartition are dynamically adjusted in consideration of the current performanceof the worker, thereby improving the utilization of the cluster.
The most challenging issue for our system is that the source domain of face photos (characterized by normal 2D faces) is significantly different from the target domain of 3D caricatures (characterized by 3D exaggerated face shapes and textures).
Our method outperforms all state-of-the-art voxel-based single stage methods by a large margin, and has comparable performance to two stage point-based methods as well, with inference speed more than 25 FPS, 2x faster than former state-of-the-art point-based methods.
Data mining on existing CNN can discover useful patterns and fundamental sub-comments from their architectures, providing researchers with strong prior knowledge to design proper CNN architectures when they have no expertise in CNNs.
We present a new two-stage 3D object detection framework, named sparse-to-dense 3D Object Detector (STD).
In recent years, convolutional neural networks (CNNs) have become deeper in order to achieve better classification accuracy in image classification.
Three major contributions of this work are: Firstly, a new encoding strategy is proposed to encode a CNN, where the architecture and the shortcut connections are encoded separately; Secondly, a hybrid two-level EC method, which combines particle swarm optimisation and genetic algorithms, is developed to search for the optimal CNNs; Lastly, an adjustable learning rate is introduced for the fitness evaluations, which provides a better learning rate for the training process given a fixed number of epochs.
We present a novel 3D object detection framework, named IPOD, based on raw point cloud.
Ranked #1 on 3D Object Detection on KITTI Pedestrians Easy
In this paper, a new hybrid differential evolution (DE) algorithm with a newly added crossover operator is proposed to evolve the architectures of CNNs of any lengths, which is named DECNN.
Convolutional Neural Networks (CNNs) have gained a remarkable success on many image classification tasks in recent years.
Convolutional neural networks (CNNs) are one of the most effective deep learning methods to solve image classification problems, but the best architecture of a CNN to solve a specific problem can be extremely complicated and hard to design.
Inverted Generational Distance (IGD) has been widely considered as a reliable performance indicator to concurrently quantify the convergence and diversity of multi- and many-objective evolutionary algorithms.
Finally, by assigning the Pareto-optimal solutions to the uniformly distributed reference vectors, a set of solutions with excellent diversity and convergence is obtained.
Specifically, error classification rate on MNIST with $1. 15\%$ is reached by the proposed algorithm consistently, which is a very promising result against state-of-the-art unsupervised DL algorithms.
Convolutional auto-encoders have shown their remarkable performance in stacking to deep convolutional neural networks for classifying image data during past several years.
Evolutionary computation methods have been successfully applied to neural networks since two decades ago, while those methods cannot scale well to the modern deep neural networks due to the complicated architectures and large quantities of connection weights.