With CRPD, a unified detection and recognition network with high efficiency is presented as the baseline.
In this paper, we propose a a machine learning approach via model-operator-data network (MOD-Net) for solving PDEs.
It draws class-wise features closer than coarse feature alignment or class-wise feature alignment only, therefore improves the model's performance to a great extent.
Channel pruning is broadly recognized as an effective approach to obtain a small compact model through eliminating unimportant channels from a large cumbersome network.
Why heavily parameterized neural networks (NNs) do not overfit the data is an important long standing open question.
A supervised learning problem is to find a function in a hypothesis function space given values on isolated data points.
Recent works show an intriguing phenomenon of Frequency Principle (F-Principle) that deep neural networks (DNNs) fit the target function from low to high frequency during the training, which provides insight into the training and generalization behavior of DNNs in complex tasks.
To handle the data explosion in the era of internet of things (IoT), it is of interest to investigate the decentralized network, with the aim at relaxing the burden to central server along with keeping data privacy.
In this work, inspired by the phase diagram in statistical mechanics, we draw the phase diagram for the two-layer ReLU neural network at the infinite-width limit for a complete characterization of its dynamical regimes and their dependence on hyperparameters related to initialization.
Borrowing ideas from physics, we propose a path integral based graph neural networks (PAN) for classification and regression tasks on graphs.
We study the problem of distilling knowledge from a large deep teacher network to a much smaller student network for the task of road marking segmentation.
Ranked #1 on Semantic Segmentation on ApolloScape
The input of each pooling layer is transformed by the compressive Haar basis of the corresponding clustering.
To achieve high coverage of target boxes, a normal strategy of conventional one-stage anchor-based detectors is to utilize multiple priors at each spatial position, especially in scene text detection tasks.
Recently, scene text recognition methods based on deep learning have sprung up in computer vision area.
Training deep models for lane detection is challenging due to the very subtle and sparse supervisory signals inherent in lane annotations.
Ranked #2 on Lane Detection on BDD100K
Graph Neural Networks (GNNs) have become a topic of intense research recently due to their powerful capability in high-dimensional classification and regression tasks for graph-structured data.
Along with fruitful applications of Deep Neural Networks (DNNs) to realistic problems, recently, some empirical studies of DNNs reported a universal phenomenon of Frequency Principle (F-Principle): a DNN tends to learn a target function from low to high frequencies during the training.
It remains a puzzle that why deep neural networks (DNNs), with more parameters than samples, often generalize well.
Overall, our work serves as a baseline for the further investigation of the impact of initialization and loss function on the generalization of DNNs, which can potentially guide and improve the training of DNNs in practice.
In this paper, we propose PAN, a new graph convolution framework that involves every path linking the message sender and receiver with learnable weights depending on the path length, which corresponds to the maximal entropy random walk.
3D face reconstruction from a single 2D image is a challenging problem with broad applications.
Ranked #6 on Face Alignment on AFLW2000-3D
Since sparse unmixing has emerged as a promising approach to hyperspectral unmixing, some spatial-contextual information in the hyperspectral images has been exploited to improve the performance of the unmixing recently.
Face anti-spoofing (a. k. a presentation attack detection) has drawn growing attention due to the high-security demand in face authentication systems.
Ranked #1 on Face Anti-Spoofing on MSU-MFSD
Spatio-temporal information is very important to capture the discriminative cues between genuine and fake faces from video sequences.
We propose a CNN framework using sparsely labeled data from the target domain to learn features that are invariant across domains for face anti-spoofing.
Reinforcement learning agents need exploratory behaviors to escape from local optima.
In this paper, we considerably improve the accuracy and robustness of predictions through heterogeneous auxiliary networks feature mimicking, a new and effective training method that provides us with much richer contextual signals apart from steering direction.
Ranked #1 on Steering Control on Udacity
Previous approaches for scene text detection usually rely on manually defined sliding windows.
Ranked #1 on Scene Text Detection on COCO-Text
The goal of this paper is to evaluate density maps generated by density estimation methods on a variety of crowd analysis tasks, including counting, detection, and tracking.
Next, the number of people is estimated in a set of overlapping sliding windows on the temporal slice image, using a regression function that maps from local features to a count.