no code implementations • 12 May 2023 • Minjae Lee, Seongmin Park, Hyungmin Kim, Minyong Yoon, Janghwan Lee, Jun Won Choi, Nam Sung Kim, Mingu Kang, Jungwook Choi
3D object detection using point cloud (PC) data is essential for perception pipelines of autonomous driving, where efficient encoding is key to meeting stringent resource and latency requirements.
no code implementations • 3 Feb 2023 • Hyoungwook Nam, Raghavendra Pradyumna Pothukuchi, Bo Li, Nam Sung Kim, Josep Torrellas
To address this problem, this paper explores using Adversarial Machine Learning (AML) methods as a defense at the computer architecture layer to obfuscate side channels.
1 code implementation • 21 Mar 2022 • Cheng Wan, Youjie Li, Ang Li, Nam Sung Kim, Yingyan Lin
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art method for graph-based learning tasks.
Ranked #1 on Node Classification on Reddit
1 code implementation • ICLR 2022 • Cheng Wan, Youjie Li, Cameron R. Wolfe, Anastasios Kyrillidis, Nam Sung Kim, Yingyan Lin
Notably, little is known regarding the convergence rate of GCN training with both stale features and stale feature gradients.
1 code implementation • 2 Feb 2022 • Youjie Li, Amar Phanishayee, Derek Murray, Jakub Tarnawski, Nam Sung Kim
Deep neural networks (DNNs) have grown exponentially in size over the past decade, leaving only those who have massive datacenter-based resources with the ability to develop and train such models.
no code implementations • 1 Jan 2021 • Cheng Wan, Youjie Li, Nam Sung Kim, Yingyan Lin
While it can be natural to leverage graph partition and distributed training for tackling this challenge, this direction has only been slightly touched on previously due to the unique challenge posed by the GCN structures, especially the excessive amount of boundary nodes in each partitioned subgraph, which can easily explode the required memory and communications for distributed training of GCNs.
no code implementations • 1 Jan 2021 • Dong Kai Wang, Nam Sung Kim
This work addresses NAS challenges in a search space of weight connections within layers, specifically the large number of architecture variations compared to a high-level search space with predetermined layer types.
no code implementations • 9 Jul 2020 • Yifan Yuan, Mohammad Alian, Yipeng Wang, Ilia Kurakin, Ren Wang, Charlie Tai, Nam Sung Kim
In this paper, we argue that besides CPU cores, high-speed network I/O is also important for LLC management.
Hardware Architecture Operating Systems
no code implementations • 11 Apr 2020 • Soroush Ghodrati, Hardik Sharma, Cliff Young, Nam Sung Kim, Hadi Esmaeilzadeh
This paper explores a different design style, where each unit is only responsible for a slice of the bit-level operations to interleave and combine the benefits of bit-level parallelism with the abundant data-level parallelism in deep neural networks.
no code implementations • 27 Jun 2019 • Soroush Ghodrati, Hardik Sharma, Sean Kinzer, Amir Yazdanbakhsh, Kambiz Samadi, Nam Sung Kim, Doug Burger, Hadi Esmaeilzadeh
Low-power potential of mixed-signal design makes it an alluring option to accelerate Deep Neural Networks (DNNs).
Hardware Architecture
no code implementations • NeurIPS 2018 • Youjie Li, Mingchao Yu, Songze Li, Salman Avestimehr, Nam Sung Kim, Alexander Schwing
Distributed training of deep nets is an important technique to address some of the present day computing challenges like memory consumption and computational demands.
no code implementations • NeurIPS 2018 • Mingchao Yu, Zhifeng Lin, Krishna Narra, Songze Li, Youjie Li, Nam Sung Kim, Alexander Schwing, Murali Annavaram, Salman Avestimehr
Data parallelism can boost the training speed of convolutional neural networks (CNN), but could suffer from significant communication costs caused by gradient aggregation.
no code implementations • 10 May 2018 • Amir Yazdanbakhsh, Hajar Falahati, Philip J. Wolfe, Kambiz Samadi, Nam Sung Kim, Hadi Esmaeilzadeh
Even though there is a convolution stage in this operator, the inserted zeros lead to underutilization of the compute resources when a conventional convolution accelerator is employed.