no code implementations • 19 Jan 2024 • Ghada Alsuhli, Hani Saleh, Mahmoud Al-Qutayri, Baker Mohammad, Thanos Stouraitis
Since a hardware implementation of FFT/IFFT over the ring is currently non-existent, the execution time achieved by this processor is compared to the software implementation of FFT/IFFT of FALCON on a Raspberry Pi 4 with Cortex-A72, where the proposed processor achieves a speedup of up to 2. 3$\times$.
no code implementations • 11 Jul 2023 • Ghada Alsuhli, Vasileios Sakellariou, Hani Saleh, Mahmoud Al-Qutayri, Baker Mohammad, Thanos Stouraitis
The reader will be able to understand the importance of an efficient number system for DNN, learn about the widely used number systems for DNN, understand the trade-offs between various number systems, and consider various design aspects that affect the impact of number systems on DNN performance.
no code implementations • 11 Oct 2021 • Dima Kilani, Baker Mohammad, Yasmin Halawani, Mohammed F. Tolba, Hani Saleh
The 5x4 C3PU consumed energy of 66. 4 fJ/MAC at 0. 3 V voltage supply with an error of 5. 7%.
no code implementations • 28 Apr 2021 • Mohammed F. Tolba, Huruy Tekle Tesfai, Hani Saleh, Baker Mohammad, Mahmoud Al-Qutayri
This leads to a repetition of the weights across the processing element (PE) array, which in turn enables the reuse of the DNN sub-computations (computational reuse) and leverage the same data (data reuse) to reduce DNNs computations, memory accesses, and improve energy efficiency albeit at the cost of increased training time.