DNAS, or Differentiable Neural Architecture Search, uses gradient-based methods to optimize ConvNet architectures, avoiding enumerating and training individual architectures separately as in previous methods. DNAS allows us to explore a layer-wise search space where we can choose a different block for each layer of the network. DNAS represents the search space by a super net whose operators execute stochastically. It relaxes the problem of finding the optimal architecture to find a distribution that yields the optimal architecture. By using the Gumbel Softmax technique, it is possible to directly train the architecture distribution using gradient-based optimization such as SGD.
The loss used to train the stochastic super net consists of both the cross-entropy loss that leads to better accuracy and the latency loss that penalizes the network's latency on a target device. To estimate the latency of an architecture, the latency of each operator in the search space is measured and a lookup table model is used to compute the overall latency by adding up the latency of each operator. Using this model allows for estimation of the latency of architectures in an enormous search space. More importantly, it makes the latency differentiable with respect to layer-wise block choices.Source: FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search