no code implementations • Findings (ACL) 2021 • Tianchu Ji, Shraddhan Jain, Michael Ferdman, Peter Milder, H. Andrew Schwartz, Niranjan Balasubramanian
This informs the design of an inference-time quantization technique using both pruning and log-scaled mapping which produces only a few (e. g. $2^3$) unique values.
no code implementations • 11 Jul 2018 • Yongming Shen, Tianchu Ji, Michael Ferdman, Peter Milder
To cope with the increasing demand and computational intensity of deep neural networks (DNNs), industry and academia have turned to accelerator technologies.
no code implementations • 30 Jun 2016 • Yongming Shen, Michael Ferdman, Peter Milder
Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection of layers is computed.
Hardware Architecture