Deep residual networks were shown to be able to scale up to thousands of layers and still have improving performance. However, each fraction of a percent of improved accuracy costs nearly doubling the number of layers, and so training very deep residual networks has a problem of diminishing feature reuse, which makes these networks very slow to train. To tackle these problems, in this paper we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks.
|Task||Dataset||Model||Metric name||Metric value||Global rank||Compare|
|Image Classification||CIFAR-10||Wide ResNet||Percentage error||3.89||# 63|
|Image Classification||CIFAR-100||Wide ResNet||Percentage error||18.85||# 42|
|Image Classification||SVHN||Wide ResNet||Percentage error||1.7||# 6|