Therefore, we propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training.
Recent years have witnessed the prevailing progress of Generative Adversarial Networks (GANs) in image-to-image translation.
This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection.
Furthermore, DLIP succeeds in retaining more than 95% of the performance with 22. 4% parameters and 24. 8% FLOPs compared to the teacher model and accelerates inference speed by 2. 7x.
To this end, we propose AlignDet, a unified pre-training framework that can be adapted to various existing detectors to alleviate the discrepancies.
A first-frame conditioning strategy is proposed to facilitate the model to generate videos transferred from the image domain as well as arbitrary-length videos in an auto-regressive manner.
Recently, open-vocabulary learning has emerged to accomplish segmentation for arbitrary categories of text-based descriptions, which popularizes the segmentation system to more general-purpose application scenarios.
Ranked #4 on Open Vocabulary Panoptic Segmentation on ADE20K
In particular, we first formulate the oscillation in PTQ and prove the problem is caused by the difference in module capacity.
Our improved backbone network can reduce the computational effort while improving the accuracy of the object detection network.
Consequently, we offer the first attempt to provide lightweight SSSS models via a novel multi-granularity distillation (MGD) scheme, where multi-granularity is captured from three aspects: i) complementary teacher structure; ii) labeled-unlabeled data cooperative distillation; iii) hierarchical and multi-levels loss setting.
Then, Next Hybrid Strategy (NHS) is designed to stack NCB and NTB in an efficient hybrid paradigm, which boosts performance in various downstream tasks.
Ranked #258 on Image Classification on ImageNet
Recently, Synthetic data-based Instance Segmentation has become an exceedingly favorable optimization paradigm since it leverages simulation rendering and physics to generate high-quality image-annotation pairs.
We revisit the existing excellent Transformers from the perspective of practical application.
Vision Transformers have witnessed prevailing success in a series of vision tasks.
The vanilla self-attention mechanism inherently relies on pre-defined and steadfast computational dimensions.
Extensive experiments show that AMR establishes a new state-of-the-art performance on the PASCAL VOC 2012 dataset, surpassing not only current methods trained with the image-level of supervision but also some methods relying on stronger supervision, such as saliency label.
In this work, we revisit the role of discriminator in GAN compression and design a novel generator-discriminator cooperative compression scheme for GAN compression, termed GCC.
no code implementations • 17 May 2021 • Andrey Ignatov, Grigory Malivenko, Radu Timofte, Sheng Chen, Xin Xia, Zhaoyan Liu, Yuwei Zhang, Feng Zhu, Jiashi Li, Xuefeng Xiao, Yuan Tian, Xinglong Wu, Christos Kyrkou, Yixin Chen, Zexin Zhang, Yunbo Peng, Yue Lin, Saikat Dutta, Sourya Dipta Das, Nisarg A. Shah, Himanshu Kumar, Chao Ge, Pei-Lin Wu, Jin-Hua Du, Andrew Batutin, Juan Pablo Federico, Konrad Lyda, Levon Khojoyan, Abhishek Thanki, Sayak Paul, Shahid Siddiqui
To address this problem, we introduce the first Mobile AI challenge, where the target is to develop quantized deep learning-based camera scene classification solutions that can demonstrate a real-time performance on smartphones and IoT platforms.
In this way, PAD-NAS can automatically design the operations for each layer and achieve a trade-off between search space quality and model diversity.
While propagation-based approaches have achieved state-of-the-art performance for video object segmentation, the literature lacks a fair comparison of different methods using the same settings.
Currently, owing to the ubiquity of mobile devices, online handwritten Chinese character recognition (HCCR) has become one of the suitable choice for feeding input to cell phones and tablet devices.
We design a nine-layer CNN for HCCR consisting of 3, 755 classes, and devise an algorithm that can reduce the networks computational cost by nine times and compress the network to 1/18 of the original size of the baseline model, with only a 0. 21% drop in accuracy.