This work presents CascadeCNN, an automated toolflow that pushes the
quantisation limits of any given CNN model, aiming to perform high-throughput
inference. A two-stage architecture tailored for any given CNN-FPGA pair is
generated, consisting of a low- and high-precision unit in a cascade...
confidence evaluation unit is employed to identify misclassified cases from the
excessively low-precision unit and forward them to the high-precision unit for
re-processing. Experiments demonstrate that the proposed toolflow can achieve a
performance boost up to 55% for VGG-16 and 48% for AlexNet over the baseline
design for the same resource budget and accuracy, without the need of
retraining the model or accessing the training data.