no code implementations • 15 Feb 2023 • Ben Zandonati, Glenn Bucagu, Adrian Alan Pol, Maurizio Pierini, Olya Sirkin, Tal Kopetz
Model compression is instrumental in optimizing deep neural network inference on resource-constrained hardware.
no code implementations • 16 Oct 2022 • Ben Zandonati, Adrian Alan Pol, Maurizio Pierini, Olya Sirkin, Tal Kopetz
This response is non-linear and heterogeneous throughout the network.
no code implementations • 30 May 2022 • Moshe Kimhi, Tal Rozen, Tal Kopetz, Olya Sirkin, Avi Mendelson, Chaim Baskin
Quantized neural networks are well known for reducing latency, power consumption, and model size without significant degradation in accuracy, making them highly applicable for systems with limited resources and low power requirements.
no code implementations • 9 Feb 2022 • Adrian Alan Pol, Thea Aarrestad, Ekaterina Govorkova, Roi Halily, Anat Klempner, Tal Kopetz, Vladimir Loncar, Jennifer Ngadiuba, Maurizio Pierini, Olya Sirkin, Sioni Summers
We experiment with 8-bit and ternary quantization, benchmarking their accuracy and inference latency against a single-precision floating-point.