1 code implementation • 20 Sep 2023 • Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, HongYu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi
This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation.
VPP leverages structured voxel representation in the proposed Voxel Semantic Generator and the sparsity of unstructured point representation in the Point Upsampler, enabling efficient generation of multi-category objects.
Geometry and color information provided by the point clouds are both crucial for 3D scene understanding.
Ranked #1 on 3D Object Detection on S3DIS (using extra training data)
In this paper, we revisit the usage of data augmentation in model compression and give a comprehensive study on the relation between model sizes and their optimal data augmentation policy.
Knowledge distillation conducts an effective model compression method while holding some limitations:(1) the feature based distillation methods only focus on distilling the feature map but are lack of transferring the relation of data examples; (2) the relational distillation methods are either limited to the handcrafted functions for relation extraction, such as L2 norm, or weak in inter- and intra- class relation modeling.
Training a 3D scene understanding model requires complicated human annotations, which are laborious to collect and result in a model only encoding close-set object semantics.
Comparing different plasticity rules under the same framework shows that Hebbian plasticity is well-suited for several memory and associative learning tasks; however, it is outperformed by gradient-based plasticity on few-shot regression tasks which require the model to infer the underlying mapping.
This motivates us to learn 3D representations by sharing the merits of both paradigms, which is non-trivial due to the pattern difference between the two paradigms.
Ranked #1 on 3D Point Cloud Linear Classification on ModelNet40 (using extra training data)
The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages.
Ranked #3 on Few-Shot 3D Point Cloud Classification on ModelNet40 10-way (10-shot) (using extra training data)
To guide 3D feature learning toward important geometric attributes and scene context, we explore the help of textual scene descriptions.
Detecting 3D objects from multi-view images is a fundamental problem in 3D computer vision.
The deep learning (DL)-based methods of low-level tasks have many advantages over the traditional camera in terms of hardware prospects, error accumulation and imaging effects.
A data augmentation module is utilized in contrastive learning to transform the given data example into two views, which is considered essential and irreplaceable.
In this paper, we propose Region-aware Knowledge Distillation ReKo to compress image-to-image translation models.
The remarkable breakthroughs in point cloud representation learning have boosted their usage in real-world applications such as self-driving cars and virtual reality.
Current approaches aim at generating synthesized training data from unpaired samples by exploring the relationship between the corrupted and clean data.
Instead of directly distilling the generated images of teachers, wavelet knowledge distillation first decomposes the images into different frequency bands with discrete wavelet transformation and then only distills the high frequency bands.
Besides, an efficient deployment flow for the mobile CPU is developed, achieving up to 7. 46$\times$ inference acceleration on an octa-core ARM CPU.
Inspired by this observation, we propose an end-to-end trainable Multi-Glimpse Network (MGNet) which aims to tackle the challenges of high computation and the lack of robustness based on recurrent downsampled attention mechanism.
Without accessing to the old training samples, knowledge transfer from the old tasks to each new task is difficult to determine, which might be either positive or negative.
To tackle this challenge, in this paper, we propose Region-aware Knowledge Distillation which first localizes the crucial regions in the images with attention mechanism.
Collecting the paired training data is a difficult task in practice, but the unpaired samples broadly exist.
In this paper, we suggest that the failure of knowledge distillation on object detection is mainly caused by two reasons: (1) the imbalance between pixels of foreground and background and (2) lack of distillation on the relation between different pixels.
In the real-world case, the noise distribution is so complex that the simplified additive white Gaussian (AWGN) assumption rarely holds, which significantly deteriorates the Gaussian denoisers' performance.
Moreover, an orthogonal loss is applied to the feature resizing layer in TOFD to improve the performance of knowledge distillation.
Weight pruning is a powerful technique to realize model compression.
Weight pruning has been widely acknowledged as a straightforward and effective method to eliminate redundancy in Deep Neural Networks (DNN), thereby achieving acceleration on various platforms.
Existing domain adaptation methods aim at learning features that can be generalized among domains.
Ranked #3 on Domain Adaptation on USPS-to-MNIST
To the best of our knowledge, there is no study on the interpretation of modern CNNs from the perspective of the frequency proportion of filters.
Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method.
Based on the proposed comparison framework, with the same accuracy and quantization, the results show that non-structrued pruning is not competitive in terms of both storage and computation efficiency.
On contrast, current state-of-the-art deep learning approaches heavily depend on the variety of training samples and the capacity of the network.
Remarkable achievements have been attained by deep neural networks in various applications.
Different from traditional knowledge distillation - a knowledge transformation methodology among networks, which forces student neural networks to approximate the softmax layer outputs of pre-trained teacher neural networks, the proposed self distillation framework distills knowledge within network itself.
Weight quantization is one of the most important techniques of Deep Neural Networks (DNNs) model compression method.
Furthermore, this work studies two hypotheses about weight pruning in the conventional setting and finds that weight pruning is essential for reducing the network model size in the adversarial setting, training a small model from scratch even with inherited initialization from the large model cannot achieve both adversarial robustness and high standard accuracy.
Without loss of accuracy on the AlexNet model, we achieve 2. 58X and 3. 65X average measured speedup on two GPUs, clearly outperforming the prior work.