Recent research has revealed that reducing the temporal and spatial redundancy are both effective approaches towards efficient video recognition, e. g., allocating the majority of computation to a task-relevant subset of frames or the most valuable image regions of each frame.
In this paper, we present a novel learning framework, ActiveNeRF, aiming to model a 3D scene with a constrained input budget.
Intuitively, easy samples, which generally exit early in the network during inference, should contribute more to training early classifiers.
In this paper, we propose an improved column generation algorithm with neural prediction (CG-P) for solving graph-based set covering problems.
Our method leverages an off-the-shelf object detector to identify visual objects from unlabeled images, and then language queries for these objects are obtained in an unsupervised fashion with a pseudo-query generation module.
We save a moderate number of intermediate models from the training process of the teacher model uniformly, and then integrate the knowledge of these intermediate models by ensemble technique.
Unsupervised domain adaption (UDA) aims to adapt models learned from a well-annotated source domain to a target domain, where only unlabeled samples are given.
Spatial redundancy widely exists in visual recognition tasks, i. e., discriminative features in an image or video frame usually correspond to only a subset of pixels, while the remaining regions are irrelevant to the task at hand.
On the one hand, using dense attention e. g., in ViT, leads to excessive memory and computational cost, and features can be influenced by irrelevant parts which are beyond the region of interests.
Ranked #2 on Object Detection on COCO test-dev (AP metric)
These methods appear to be quite different in the designed loss functions from various motivations.
TSCI model builds on the formulation of temporal causality, which reflects the temporal causal relations between sequential observations and decisions of RL agent.
In this paper, we show that there exists a strong underlying relation between them, in the sense that the bulk of computations of these two paradigms are in fact done with the same operation.
As a data augmentation method, FOT can be conveniently applied to any existing few shot learning algorithm and greatly improve its performance on FG-FSL tasks.
The backbone of traditional CNN classifier is generally considered as a feature extractor, followed by a linear layer which performs the classification.
Inspired by this phenomenon, we propose a Dynamic Transformer to automatically configure a proper number of tokens for each input image.
Ranked #28 on Image Classification on CIFAR-100 (using extra training data)
In this paper, we explore the spatial redundancy in video recognition with the aim to improve the computational efficiency.
Reusing features in deep networks through dense connectivity is an effective way to achieve high computational efficiency.
Due to the need to store the intermediate activations for back-propagation, end-to-end (E2E) training of deep networks usually suffers from high GPUs memory footprint.
As InfoPro loss is difficult to compute in its original form, we derive a feasible upper bound as a surrogate optimization objective, yielding a simple but effective algorithm.
We theoretically show that AdaPT produces a tight upper bound on the distributional deviation between the learned policy and the behavior policy, and this upper bound is the minimum requirement to guarantee policy improvement at each iteration.
In this paper, we take a step forward to establish a unified framework for convolution-based graph neural networks, by formulating the basic graph convolution operation as an optimization problem in the graph Fourier space.
The accuracy of deep convolutional neural networks (CNNs) generally improves when fueled with high resolution images.
The proposed method is inspired by the intriguing property that deep networks are effective in learning linearized features, i. e., certain directions in the deep feature space correspond to meaningful semantic transformations, e. g., changing the background or view angle of an object.
In this paper, we propose a novel meta-learning based SSL algorithm (Meta-Semi) that requires tuning only one additional hyper-parameter, compared with a standard supervised deep learning algorithm, to achieve competitive performance under various conditions of SSL.
Deep reinforcement learning (RL) has recently led to many breakthroughs on a range of complex control tasks.
Then we develop two algorithms for optimizing the energy efficiency of train operation.
Specifically, the proposed algorithm can be used to estimate the upper and lower bounds of the updated classifier's coefficient matrix with a low computational complexity related to the size of the updated dataset.
This paper provides a selected review on RL based control for AUVs with the focus on applications of RL to low-level control tasks for underwater regulation and tracking.
Our work is motivated by the intriguing property that deep networks are surprisingly good at linearizing features, such that certain directions in the deep feature space correspond to meaningful semantic transformations, e. g., adding sunglasses or changing backgrounds.
Different from existing policy gradient methods which employ single actor-critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively.
Then, we present an off-policy actor-critic, model-free maximum entropy deep RL algorithm called deep soft policy gradient (DSPG) by combining soft policy gradient with soft Bellman equation.
To tackle this problem, we propose a general acceleration method for model-free, off-policy deep RL algorithms by drawing the idea underlying regularized Anderson acceleration (RAA), which is an effective approach to accelerating the solving of fixed point problems with perturbations.
Our second contribution is to derive a practical algorithm based on this reduction.