We explore a key architectural aspect of deep convolutional neural networks:
the pattern of internal skip connections used to aggregate outputs of earlier
layers for consumption by deeper layers. Such aggregation is critical to
facilitate training of very deep networks in an end-to-end manner...
This is a
primary reason for the widespread adoption of residual networks, which
aggregate outputs via cumulative summation. While subsequent works investigate
alternative aggregation operations (e.g. concatenation), we focus on an
orthogonal question: which outputs to aggregate at a particular point in the
network. We propose a new internal connection structure which aggregates only a
sparse set of previous outputs at any given depth. Our experiments demonstrate
this simple design change offers superior performance with fewer parameters and
lower computational requirements. Moreover, we show that sparse aggregation
allows networks to scale more robustly to 1000+ layers, thereby opening future
avenues for training long-running visual processes.