In this paper, we propose Bi-level doubly variational learning (BiDVL), which is based on a new bi-level optimization framework and two tractable variational distributions to facilitate learning EBLVMs.
Research on the generalization ability of deep neural networks (DNNs) has recently attracted a great deal of attention.
Vision transformers (ViTs) have demonstrated great potential in various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices.
We show that our method improves the recognition accuracy of adversarial training on ImageNet by 8. 32% compared with the baseline.
Recently, self-supervised learning technology has been applied to calculate depth and ego-motion from monocular videos, achieving remarkable performance in autonomous driving scenarios.
Real-time point cloud processing is fundamental for lots of computer vision tasks, while still challenged by the computational problem on resource-limited edge devices.
In this paper, we observe an interesting phenomenon of intra-class heterogeneity in real data and show that existing methods fail to retain this property in their synthetic images, which causes a limited performance increase.
In particular, our method improves previous state-of-the-art methods from 0. 365 to 0. 349 on the metric RMSE on the NYU dataset.
Then, the sampling will gradually be prone to sampling subnets from the subnet pools.
Conventional gradient descent methods compute the gradients for multiple variables through the partial derivative.
To date, learning weakly supervised panoptic segmentation (WSPS) with only image-level labels remains unexplored.
At each layer, it exploits a differentiable binarization search (DBS) to minimize the angular error in a student-teacher framework.
Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN.
Transformers with remarkable global representation capacities achieve competitive results for visual tasks, but fail to consider high-level local pattern information in input images.
In this article, we propose a novel auxiliary learning induced graph convolutional network in a multi-task fashion.
This leads to a new problem of confidence discrepancy for the detector ensembles.
We propose an Interpretable Attention Guided Network (IAGN) for fine-grained visual classification.
In this paper, we show that our weight binarization provides an analytical solution by encoding high-magnitude weights into +1s, and 0s otherwise.
Unmanned Aerial vehicles (UAVs) are widely used as network processors in mobile networks, but more recently, UAVs have been used in Mobile Edge Computing as mobile servers.
A common practice is to start from a large perturbation and then iteratively reduce it with a deterministic direction and a random one while keeping it adversarial.
Weakly supervised instance segmentation (WSIS) with only image-level labels has recently drawn much attention.
Differentiable Architecture Search (DARTS) improves the efficiency of architecture search by learning the architecture and network parameters end-to-end.
In recent years, deep learning has dominated progress in the field of medical image analysis.
To achieve fast online adaptivity, a class-wise updating method is developed to decompose the binary code learning and alternatively renew the hash functions in a class-wise fashion, which well addresses the burden on large amounts of training batches.
Edge computing is promising to become one of the next hottest topics in artificial intelligence because it benefits various evolving domains such as real-time unmanned aerial systems, industrial applications, and the demand for privacy protection.
Specifically, most state-of-the-art SR models without batch normalization have a large dynamic quantization range, which also serves as another cause of performance drop.
In this paper, for the first time, we explore the influence of angular bias on the quantization error and then introduce a Rotated Binary Neural Network (RBNN), which considers the angle alignment between the full-precision weight vector and its binarized version.
1 code implementation • 16 Sep 2020 • Xuehui Yu, Zhenjun Han, Yuqi Gong, Nan Jiang, Jian Zhao, Qixiang Ye, Jie Chen, Yuan Feng, Bin Zhang, Xiaodi Wang, Ying Xin, Jingwei Liu, Mingyuan Mao, Sheng Xu, Baochang Zhang, Shumin Han, Cheng Gao, Wei Tang, Lizuo Jin, Mingbo Hong, Yuchao Yang, Shuiwang Li, Huan Luo, Qijun Zhao, Humphrey Shi
The 1st Tiny Object Detection (TOD) Challenge aims to encourage research in developing novel and accurate methods for tiny object detection in images which have wide views, with a current focus on tiny person detection.
In this paper, binarized neural architecture search (BNAS), with a search space of binarized convolutions, is introduced to produce extremely compressed models to reduce huge computational cost on embedded devices for edge computing.
Deep convolutional neural networks (DCNNs) have dominated as the best performers in machine learning, but can be challenged by adversarial attacks.
In this paper, we propose a new feature optimization approach to enhance features and suppress background noise in both the training and inference stages.
Conventional learning methods simplify the bilinear model by regarding two intrinsically coupled factors independently, which degrades the optimization procedure.
To this end, a Child-Parent (CP) model is introduced to a differentiable NAS to search the binarized architecture (Child) under the supervision of a full-precision model (Parent).
Most of the recent advances in crowd counting have evolved from hand-designed density estimation networks, where multi-scale features are leveraged to address the scale variation problem, but at the expense of demanding design efforts.
The principle behind our pruning is that low-rank feature maps contain less information, and thus pruned results can be easily reproduced.
In this paper, we propose a new channel pruning method based on artificial bee colony algorithm (ABC), dubbed as ABCPruner, which aims to efficiently find optimal pruned structure, i. e., channel number in each layer, rather than selecting "important" channels as previous works did.
To model these two inherent diversities in image captioning, we propose a Variational Structured Semantic Inferring model (termed VSSI-cap) executed in a novel structured encoder-inferer-decoder schema.
The BGA method is proposed to modify the binary process of GBCNs to alleviate the local minima problem, which can significantly improve the performance of 1-bit DCNNs.
A variant, binarized neural architecture search (BNAS), with a search space of binarized convolutions, can produce extremely compressed models.
In this paper, we propose a novel aggregation signature suitable for small object tracking, especially aiming for the challenge of sudden and large drift.
The CiFs can be easily incorporated into existing deep convolutional neural networks (DCNNs), which leads to new Circulant Binary Convolutional Networks (CBCNs).
Specially, we propose a novel Structured-Spatial Semantic Embedding model for image deblurring (termed S3E-Deblur), which introduces a novel Structured-Spatial Semantic tree model (S3-tree) to bridge two basic tasks in computer vision: image deblurring (ImD) and image captioning (ImC).
Binarized convolutional neural networks (BCNNs) are widely used to improve memory and computation efficiency of deep convolutional neural networks (DCNNs) for mobile and AI chips based applications.
Deep convolutional neural networks (DCNNs) have dominated the recent developments in computer vision through making various record-breaking models.
More specifically, we introduce a novel architecture controlling module in each layer to encode the network architecture by a vector.
In this paper, we propose to model the similarity distributions between the input data and the hashing codes, upon which a novel supervised online hashing method, dubbed as Similarity Distribution based Online Hashing (SDOH), is proposed, to keep the intrinsic semantic relationship in the produced Hamming space.
The search space is dynamically pruned every a few epochs to update this distribution, and the optimal neural architecture is obtained when there is only one structure remained.
Therefore, NAS can be transformed to a multinomial distribution learning problem, i. e., the distribution is optimized to have a high expectation of the performance.
In this paper, we propose an effective structured pruning approach that jointly prunes filters as well as other structures in an end-to-end manner.
In this paper, we propose a trellis encoder-decoder network (TEDnet) for crowd counting, which focuses on generating high-quality density estimation maps.
The relationship between the input feature maps and 2D kernels is revealed in a theoretical framework, based on which a kernel sparsity and entropy (KSE) indicator is proposed to quantitate the feature map importance in a feature-agnostic manner to guide model compression.
The advancement of deep convolutional neural networks (DCNNs) has driven significant improvement in the accuracy of recognition systems for many computer vision tasks.
Real-time object detection and tracking have shown to be the basis of intelligent production for industrial 4. 0 applications.
Specifically, the TARM is deployed in a residual learning module that employs a novel attention learning network to recalibrate the temporal attention of frames in a skeleton sequence.
Ranked #62 on Skeleton Based Action Recognition on NTU RGB+D
Representation learning is a fundamental but challenging problem, especially when the distribution of data is unknown.
Compression artifacts reduction (CAR) is a challenging problem in the field of remote sensing.
In this paper, we introduce an intermediate step -- solution sampling -- after the data sampling step to form a subspace, in which an optimal solution can be estimated.
Gesture recognition is a challenging problem in the field of biometrics.
Ranked #1 on Hand Gesture Recognition on MGB
In this paper, we propose a hashing scheme, termed Fusion Similarity Hashing (FSH), which explicitly embeds the graph-based fusion similarity across modalities into a common Hamming space.
Steerable properties dominate the design of traditional filters, e. g., Gabor filters, and endow features the capability of dealing with spatial transformations.
Kernelized Correlation Filter (KCF) is one of the state-of-the-art object trackers.
In this paper, a self-learning approach is proposed towards solving scene-specific pedestrian detection problem without any human' annotation involved.
There is a neglected fact in the traditional machine learning methods that the data sampling can actually lead to the solution sampling.
The fact that image data samples lie on a manifold has been successfully exploited in many learning and inference problems.