no code implementations • 12 Apr 2025 • Vikas Natesh, H. T. Kung, David Kong
'swamping') is a significant source of numerical errors in accumulation when implementing low-bitwidth dot products (e. g., 8-bit floating point) as the mantissa has a small number of bits.
no code implementations • 7 Nov 2024 • He-Yen Hsieh, Ziyun Li, Sai Qian Zhang, Wei-Te Mark Ting, Kao-Den Chang, Barbara De Salvo, Chiao Liu, H. T. Kung
GazeGen is the first system to combine visual content generation with real-time gaze estimation, made possible exclusively by DFT Gaze.
no code implementations • 29 Oct 2024 • Matheus Farias, H. T. Kung
We introduce a novel approach to reduce the number of times required for reprogramming memristors on bit-sliced compute-in-memory crossbars for deep neural networks (DNNs).
no code implementations • 15 Oct 2024 • Matheus Farias, H. T. Kung
SWS effectively reduces this cost leveraging (1) small weights and (2) zero weights (weight sparsity).
no code implementations • 25 Mar 2024 • Xiang-Li Lu, Hwai-Jung Hsu, Che-Wei Chou, H. T. Kung, Chen-Hsin Lee, Sheng-Mao Cheng
Specifically, we first pretrain a deep learning model for a given lathe machine's operations to learn the salient features of machining states.
1 code implementation • 23 Feb 2024 • Chun-Hsiao Yeh, Ta-Ying Cheng, He-Yen Hsieh, Chuan-En Lin, Yi Ma, Andrew Markham, Niki Trigoni, H. T. Kung, Yubei Chen
First, current personalization techniques fail to reliably extend to multiple concepts -- we hypothesize this to be due to the mismatch between complex scenes and simple text descriptions in the pre-training dataset (e. g., LAION).
1 code implementation • 8 Jul 2023 • Vikas Natesh, Andrew Sabot, H. T. Kung, Mark Ting
We propose Rosko -- row skipping outer products -- for deriving sparse matrix multiplication (SpMM) kernels in reducing computation and memory access requirements of deep neural networks (DNNs).
no code implementations • 12 Apr 2023 • Andrew Sabot, Vikas Natesh, H. T. Kung, Wei-Te Ting
We present the MEMA framework for the easy and quick derivation of efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems.
1 code implementation • 5 Jan 2023 • Surat Teerapittayanon, Marcus Comiter, Brad McDanel, H. T. Kung
We then show that these fragments can be stitched together to create neural networks with accuracy comparable to that of traditionally trained networks at a fraction of computing resource and data requirements.
no code implementations • 25 Sep 2022 • Yuji Chai, Luke Bailey, Yunho Jin, Matthew Karle, Glenn G. Ko, David Brooks, Gu-Yeon Wei, H. T. Kung
While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints.
no code implementations • 19 Jul 2022 • Xin Dong, Sai Qian Zhang, Ang Li, H. T. Kung
Federated Learning aims at training a global model from multiple decentralized devices (i. e. clients) without exchanging their private local data.
no code implementations • CVPR 2022 • Xin Dong, Barbara De Salvo, Meng Li, Chiao Liu, Zhongnan Qu, H. T. Kung, Ziyun Li
We design deep neural networks (DNNs) and corresponding networks' splittings to distribute DNNs' workload to camera sensors and a centralized aggregator on head mounted devices to meet system performance targets in inference accuracy and latency under the given hardware resource constraints.
no code implementations • 28 Oct 2021 • Sai Qian Zhang, Bradley McDanel, H. T. Kung
Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via a shared exponent across a group of values.
no code implementations • 13 Jul 2021 • Xin Dong, Hongxu Yin, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov, H. T. Kung
Prior works usually assume that SC offers privacy benefits as only intermediate features, instead of private data, are shared from devices to the cloud.
no code implementations • CVPR 2022 • Xin Dong, Junfeng Guo, Ang Li, Wei-Te Ting, Cong Liu, H. T. Kung
Based upon this observation, we propose a novel metric called Neural Mean Discrepancy (NMD), which compares neural means of the input examples and training data.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Wen Tai, H. T. Kung, Xin Dong, Marcus Comiter, Chang-Fu Kuo
We introduce exBERT, a training method to extend BERT pre-trained models from a general domain to a new pre-trained model for a specific domain with a new additive vocabulary under constrained training resources (i. e., constrained computation and data).
no code implementations • 13 Jul 2020 • H. T. Kung, Bradley McDanel, Sai Qian Zhang
To perform conversion from binary to SDR, we develop an efficient encoding method called HESE (Hybrid Encoding for Signed Expressions) that can be performed in one pass looking at only two bits at a time.
1 code implementation • 19 Jul 2019 • Surat Teerapittayanon, H. T. Kung
A main feature of DaiMoN is that it allows peers to verify the accuracy improvement of submitted models without knowing the test labels.
1 code implementation • MAPL 2019 • Philippe Tillet, H. T. Kung, David Cox
The validation and deployment of novel research ideas in the field of Deep Learning is often limited by the availability of efficient compute kernels for certain basic primitives.
no code implementations • 17 Jun 2019 • Marcus Comiter, Surat Teerapittayanon, H. T. Kung
CheckNet is like a checksum for neural network inference: it verifies the integrity of the inference computation performed by untrusted devices to 1) ensure the inference has actually been performed, and 2) ensure the inference has not been manipulated by an attacker.
no code implementations • 1 May 2019 • Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong
A highlight of our full-stack approach which attributes to the achieved high energy efficiency is an efficient Selector-Accumulator (SAC) architecture for implementing the multiplier-accumulator (MAC) operation present in any digital CNN hardware.
no code implementations • 12 Dec 2018 • Miriam Cha, Youngjune L. Gwon, H. T. Kung
Instead of selecting random training examples, we perform negative sampling based on the semantic distance from a positive example in the class.
no code implementations • 7 Nov 2018 • H. T. Kung, Bradley McDanel, Sai Qian Zhang
We study the effectiveness of this joint optimization for both high utilization and classification accuracy with ASIC and FPGA designs based on efficient bit-serial implementations of multiplier-accumulators.
no code implementations • 21 Oct 2017 • Bradley McDanel, Surat Teerapittayanon, H. T. Kung
At inference time, the number of channels used can be dynamically adjusted to trade off accuracy for lowered power consumption and reduced latency by selecting only a beginning subset of channels.
2 code implementations • 6 Sep 2017 • Bradley McDanel, Surat Teerapittayanon, H. T. Kung
Beyond minimizing the memory required to store weights, as in a BNN, we show that it is essential to minimize the memory used for temporaries which hold intermediate results between layers in feedforward inference.
3 code implementations • 6 Sep 2017 • Surat Teerapittayanon, Bradley McDanel, H. T. Kung
Deep neural networks are state of the art methods for many learning tasks due to their ability to extract increasingly better features at each network layer.
1 code implementation • 6 Sep 2017 • Surat Teerapittayanon, Bradley McDanel, H. T. Kung
In our experiment, compared with the traditional method of offloading raw sensor data to be processed in the cloud, DDNN locally processes most sensor data on end devices while achieving high accuracy and is able to reduce the communication cost by a factor of over 20x.
no code implementations • 5 Sep 2017 • Miriam Cha, Youngjune Gwon, H. T. Kung
We argue that clustering with word embeddings in the metric space should yield feature representations in a higher semantic space appropriate for text regression.
no code implementations • 30 Aug 2017 • Miriam Cha, Youngjune Gwon, H. T. Kung
Recent approaches in generative adversarial networks (GANs) can automatically synthesize realistic images from descriptive text.
no code implementations • 17 May 2016 • Youngjune Gwon, William Campbell, Kevin Brady, Douglas Sturim, Miriam Cha, H. T. Kung
Unsupervised feature learning methods have proven effective for classification tasks based on a single modality.
no code implementations • 19 Nov 2015 • Miriam Cha, Youngjune Gwon, H. T. Kung
In this paper, we present a multimodal framework for learning sparse representations that can capture semantic correlation between modalities.