ID	dla102
LR	0.1
Epochs	120
Layers	102
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear

ID	dla102x
LR	0.1
Epochs	120
Layers	102
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear

ID	dla102x2
LR	0.1
Epochs	120
Layers	102
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear

ID	dla169
LR	0.1
Epochs	120
Layers	169
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear

ID	dla34
LR	0.1
Epochs	120
Layers	32
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear

ID	dla46_c
LR	0.1
Epochs	120
Layers	46
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear

ID	dla46x_c
LR	0.1
Epochs	120
Layers	46
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear

ID	dla60
LR	0.1
Epochs	120
Layers	60
Dropout	0.2
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear

ID	dla60_res2net
Layers	60
Crop Pct	0.875
Image Size	224
Interpolation	bilinear

ID	dla60_res2next
Layers	60
Crop Pct	0.875
Image Size	224
Interpolation	bilinear

ID	dla60x
LR	0.1
Epochs	120
Layers	60
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear

ID	dla60x_c
LR	0.1
Epochs	120
Layers	60
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear

DLA

rwightman / pytorch-image-models

Last updated on Feb 14, 2021

Parameters 33 Million

FLOPs 7 Billion

File Size 129.02 MB

Training Data ImageNet

Training Resources 8x GPUs

Training Time

Training Techniques	SGD with Momentum, Weight Decay
Architecture	1x1 Convolution, Batch Normalization, Convolution, DLA Residual Block, DLA Bottleneck Residual Block, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax
ID	dla102
LR	0.1
Epochs	120
Layers	102
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear
SHOW MORE
SHOW LESS

Parameters 26 Million

FLOPs 6 Billion

File Size 102.57 MB

Training Data ImageNet

Training Resources 8x GPUs

Training Time

Training Techniques	SGD with Momentum, Weight Decay
Architecture	1x1 Convolution, Batch Normalization, Convolution, DLA Residual Block, DLA Bottleneck Residual Block, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax
ID	dla102x
LR	0.1
Epochs	120
Layers	102
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear
SHOW MORE
SHOW LESS

Parameters 41 Million

FLOPs 9 Billion

File Size 159.88 MB

Training Data ImageNet

Training Resources 8x GPUs

Training Time

Training Techniques	SGD with Momentum, Weight Decay
Architecture	1x1 Convolution, Batch Normalization, Convolution, DLA Residual Block, DLA Bottleneck Residual Block, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax
ID	dla102x2
LR	0.1
Epochs	120
Layers	102
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear
SHOW MORE
SHOW LESS

Parameters 53 Million

FLOPs 12 Billion

File Size 206.52 MB

Training Data ImageNet

Training Resources 8x GPUs

Training Time

Training Techniques	SGD with Momentum, Weight Decay
Architecture	1x1 Convolution, Batch Normalization, Convolution, DLA Residual Block, DLA Bottleneck Residual Block, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax
ID	dla169
LR	0.1
Epochs	120
Layers	169
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear
SHOW MORE
SHOW LESS

Parameters 16 Million

FLOPs 3 Billion

File Size 60.30 MB

Training Data ImageNet

Training Resources

Training Time

Training Techniques	SGD with Momentum, Weight Decay
Architecture	1x1 Convolution, Batch Normalization, Convolution, DLA Residual Block, DLA Bottleneck Residual Block, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax
ID	dla34
LR	0.1
Epochs	120
Layers	32
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear
SHOW MORE
SHOW LESS

Parameters 1 Million

FLOPs 583 Million

File Size 5.06 MB

Training Data ImageNet

Training Resources

Training Time

Training Techniques	SGD with Momentum, Weight Decay
Architecture	1x1 Convolution, Batch Normalization, Convolution, DLA Residual Block, DLA Bottleneck Residual Block, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax
ID	dla46_c
LR	0.1
Epochs	120
Layers	46
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear
SHOW MORE
SHOW LESS

Parameters 1 Million

FLOPs 544 Million

File Size 4.18 MB

Training Data ImageNet

Training Resources

Training Time

Training Techniques	SGD with Momentum, Weight Decay
Architecture	1x1 Convolution, Batch Normalization, Convolution, DLA Residual Block, DLA Bottleneck Residual Block, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax
ID	dla46x_c
LR	0.1
Epochs	120
Layers	46
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear
SHOW MORE
SHOW LESS

Parameters 22 Million

FLOPs 4 Billion

File Size 85.41 MB

Training Data ImageNet

Training Resources

Training Time

Training Techniques	SGD with Momentum, Weight Decay
Architecture	1x1 Convolution, Batch Normalization, Convolution, DLA Residual Block, DLA Bottleneck Residual Block, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax
ID	dla60
LR	0.1
Epochs	120
Layers	60
Dropout	0.2
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear
SHOW MORE
SHOW LESS

Parameters 21 Million

FLOPs 4 Billion

File Size 80.95 MB

Training Data ImageNet

Training Resources

Training Time

Training Techniques	SGD with Momentum, Weight Decay
Architecture	1x1 Convolution, Batch Normalization, Convolution, DLA Residual Block, DLA Bottleneck Residual Block, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax
ID	dla60_res2net
Layers	60
Crop Pct	0.875
Image Size	224
Interpolation	bilinear
SHOW MORE
SHOW LESS

Parameters 17 Million

FLOPs 3 Billion

File Size 66.41 MB

Training Data ImageNet

Training Resources

Training Time

Training Techniques	SGD with Momentum, Weight Decay
Architecture	1x1 Convolution, Batch Normalization, Convolution, DLA Residual Block, DLA Bottleneck Residual Block, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax
ID	dla60_res2next
Layers	60
Crop Pct	0.875
Image Size	224
Interpolation	bilinear
SHOW MORE
SHOW LESS

Parameters 17 Million

FLOPs 4 Billion

File Size 67.60 MB

Training Data ImageNet

Training Resources

Training Time

Training Techniques	SGD with Momentum, Weight Decay
Architecture	1x1 Convolution, Batch Normalization, Convolution, DLA Residual Block, DLA Bottleneck Residual Block, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax
ID	dla60x
LR	0.1
Epochs	120
Layers	60
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear
SHOW MORE
SHOW LESS

Parameters 1 Million

FLOPs 593 Million

File Size 5.20 MB

Training Data ImageNet

Training Resources

Training Time

Training Techniques	SGD with Momentum, Weight Decay
Architecture	1x1 Convolution, Batch Normalization, Convolution, DLA Residual Block, DLA Bottleneck Residual Block, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax
ID	dla60x_c
LR	0.1
Epochs	120
Layers	60
Crop Pct	0.875
Momentum	0.9
Batch Size	256
Image Size	224
Weight Decay	0.0001
Interpolation	bilinear
SHOW MORE
SHOW LESS

README.md

Summary

Extending “shallow” skip connections, Dense Layer Aggregation (DLA) incorporates more depth and sharing. The authors introduce two structures for deep layer aggregation (DLA): iterative deep aggregation (IDA) and hierarchical deep aggregation (HDA). These structures are expressed through an architectural framework, independent of the choice of backbone, for compatibility with current and future networks.

IDA focuses on fusing resolutions and scales while HDA focuses on merging features from all modules and channels. IDA follows the base hierarchy to refine resolution and aggregate scale stage-bystage. HDA assembles its own hierarchy of tree-structured connections that cross and merge stages to aggregate different levels of representation.

How do I load this model?

To load a pretrained model:

import timm
m = timm.create_model('dla34', pretrained=True)
m.eval()

Replace the model name with the variant you want to use, e.g. dla34. You can find the IDs in the model summaries at the top of this page.

How do I train this model?

You can follow the timm recipe scripts for training a new model afresh.

Citation

@misc{yu2019deep,
      title={Deep Layer Aggregation}, 
      author={Fisher Yu and Dequan Wang and Evan Shelhamer and Trevor Darrell},
      year={2019},
      eprint={1707.06484},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Results

Image Classification on ImageNet

MODEL	TOP 1 ACCURACY	TOP 5 ACCURACY
dla102x2	79.44%	94.65%
dla169	78.69%	94.33%
dla102x	78.51%	94.23%
dla60_res2net	78.46%	94.21%
dla60_res2next	78.44%	94.16%
dla60x	78.25%	94.02%
dla102	78.03%	93.95%
dla60	77.04%	93.32%
dla34	74.62%	92.06%
dla60x_c	67.91%	88.42%
dla46x_c	65.98%	86.99%
dla46_c	64.87%	86.29%