no code implementations • 6 Apr 2023 • Rafael Sousa, Marcio Pereira, Yongin Kwon, TaeHo Kim, Namsoon Jung, Chang Soo Kim, Michael Frank, Guido Araujo
Although code generation for Convolution Neural Network (CNN) models has been extensively studied, performing efficient data slicing and parallelization for highly-constrai\-ned Multicore Neural Processor Units (NPUs) is still a challenging problem.
no code implementations • 8 Mar 2023 • Victor Ferrari, Rafael Sousa, Marcio Pereira, João P. L. de Carvalho, José Nelson Amaral, José Moreira, Guido Araujo
The speed-up over an Im2Col + BLAS method based on current BLAS implementations for end-to-end machine-learning model inference is in the range of 9% - 25% for Intel x86 and 10% - 42% for IBM POWER10 architectures.