Differentiable Discrete Device-to-System Codesign for Optical Neural Networks via Gumbel-Softmax

29 Sep 2021 · Yingjie Li, Ruiyang Chen, Weilu Gao, Cunxi Yu ·

Deep neural networks (DNNs) have significantly improved the productions in many areas like large-scale computer vision and natural language processing. While conventional DNNs implemented on digital platforms have intrinsic limitations in computation and memory requirements, optical neural networks (ONNs), such as diffractive optical neural networks (DONNs), have attracted lots of attention as they can bring significant advantages in terms of power efficiency, parallelism, and computational speed. In order to train DONNs, fully differentiable physical optical propagations have been developed, which can be used to train the physical parameters in optical systems using conventional gradient descent algorithms. However, inversely mapping algorithm-trained physical model parameters onto the applied stimulus in real-world optical devices is a non-trivial task, which can involve multiple imperfections (e.g., quantization and non-monotonicity) and is especially challenging in complex-valued domains. This work proposes a novel device-to-system hardware-software codesign framework, which enables efficient training of DONNs w.r.t arbitrary experimental measured optical devices across layers. Specifically, Gumbel-Softmax with a novel complex-domain regularization method is employed to enable differentiable one-to-one mapping from discrete device parameters into the forward function of DONNs, where the physical parameters in DONNs can be trained by simply minimizing the loss function of the ML task. The experimental results have demonstrated significant advantages over traditional quantization-based methods with low-precision optical devices (e.g., 8 discrete values), with ~20% accuracy improvements for MNIST and ~28% for FashionMNIST. More importantly, our framework provides high versatility in codesign even for one system implemented with mixed optical devices. In addition, we include comprehensive studies of regularization analysis, temperature scheduling exploration, and runtime complexity evaluation of the proposed framework.

PDF Abstract