Efficient video architecture is the key to deploying video recognition systems on devices with limited computing resources.
We design a multivariate search space, including 6 search variables to capture a wide variety of choices in designing two-stream models.
This work targets designing a principled and unified training-free framework for Neural Architecture Search (NAS), with high performance, low cost, and in-depth interpretation.
We demonstrate that plugging SAFIN into the base network of another state-of-the-art method results in enhanced stylization.
Can we select the best neural architectures without involving any training and eliminate a drastic portion of the search cost?
We present Sandwich Batch Normalization (SaBN), a frustratingly easy improvement of Batch Normalization (BN) with only a few lines of code changes.
Such simplification limits the fusion of information at different scales and fails to maintain high-resolution representations.
Machine learning (ML) systems often encounter Out-of-Distribution (OoD) errors when dealing with testing data coming from a distribution different from training data.
Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet.
Ranked #1 on Speaker Identification on VoxCeleb1
We present FasterSeg, an automatically designed semantic segmentation network with not only state-of-the-art performance but also faster speed than current methods.
Ranked #1 on Semantic Segmentation on BDD
Neural architecture search (NAS) has witnessed prevailing success in image classification and (very recently) segmentation tasks.
Ranked #11 on Image Generation on STL-10
Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data?
Flow-based generative models show great potential in image synthesis due to its reversible pipeline and exact log-likelihood target, yet it suffers from weak ability for conditional image synthesis, especially for multi-label or unaware conditions.
While each view of the stereoscopic pair is processed in an individual path, a novel feature aggregation strategy is proposed to effectively share information between the two paths.