With the capacity of modeling long-range dependencies in sequential data, transformers have shown remarkable performances in a variety of generative tasks such as image, audio, and text generation.
To preserve the knowledge we learn from previous instances, we proposed a method to protect the path by restricting the gradient updates of one instance from overriding past updates calculated from previous instances if these instances are not similar.
Conventional Neural Architecture Search (NAS) aims at finding a single architecture that achieves the best performance, which usually optimizes task related learning objectives such as accuracy.
The additional depth prediction path supplements the relationship prediction model in a way that bounding boxes or segmentation masks are unable to deliver.
Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performance in many tasks such as image classification and language understanding.
We propose DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures, optimizing for both device-related (e. g., inference time and memory usage) and device-agnostic (e. g., accuracy and model size) objectives.