no code implementations • 19 Feb 2024 • Souvik Kundu, Anthony Sarah, Vinay Joshi, Om J Omer, Sreenivas Subramoney
With the recent growth in demand for large-scale deep neural networks, compute in-memory (CiM) has come up as a prominent solution to alleviate bandwidth and on-chip interconnect bottlenecks that constrain Von-Neuman architectures.
no code implementations • 17 Apr 2023 • Quintin Fettes, Avinash Karanth, Razvan Bunescu, Brandon Beckwith, Sreenivas Subramoney
Many cloud applications are migrated from the monolithic model to a microservices framework in which hundreds of loosely-coupled microservices run concurrently, with significant benefits in terms of scalability, rapid development, modularity, and isolation.
no code implementations • 17 Feb 2023 • Geonhwa Jeong, Sana Damani, Abhimanyu Rajeshkumar Bambhaniya, Eric Qin, Christopher J. Hughes, Sreenivas Subramoney, Hyesoon Kim, Tushar Krishna
Therefore, as DL workloads embrace sparsity to reduce the computations and memory size of models, it is also imperative for CPUs to add support for sparsity to avoid under-utilization of the dense matrix engine and inefficient usage of the caches and registers.
1 code implementation • 20 Jul 2022 • Can Firtina, Kamlesh Pillai, Gurpreet S. Kalsi, Bharathwaj Suresh, Damla Senol Cali, Jeremie Kim, Taha Shahroodi, Meryem Banu Cavlak, Joel Lindegger, Mohammed Alser, Juan Gómez Luna, Sreenivas Subramoney, Onur Mutlu
We identify an urgent need for a flexible, high-performance, and energy-efficient HW/SW co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs.
no code implementations • CVPR 2022 • Anirud Thyagharajan, Benjamin Ummenhofer, Prashant Laddha, Om Ji Omer, Sreenivas Subramoney
3D semantic segmentation is a fundamental building block for several scene understanding applications such as autonomous driving, robotics and AR/VR.
no code implementations • 16 Nov 2021 • Anirud Thyagharajan, Benjamin Ummenhofer, Prashant Laddha, Om J Omer, Sreenivas Subramoney
3D semantic segmentation is a fundamental building block for several scene understanding applications such as autonomous driving, robotics and AR/VR.
no code implementations • 5 Oct 2021 • Geonhwa Jeong, Eric Qin, Ananda Samajdar, Christopher J. Hughes, Sreenivas Subramoney, Hyesoon Kim, Tushar Krishna
As AI-based applications become pervasive, CPU vendors are starting to incorporate matrix engines within the datapath to boost efficiency.
2 code implementations • 24 Sep 2021 • Rahul Bera, Konstantinos Kanellopoulos, Anant V. Nori, Taha Shahroodi, Sreenivas Subramoney, Onur Mutlu
In this paper, we make a case for designing a holistic prefetch algorithm that learns to prefetch using multiple different types of program context and system-level feedback information inherent to its design.
2 code implementations • 16 Sep 2020 • Damla Senol Cali, Gurpreet S. Kalsi, Zülal Bingöl, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, Onur Mutlu
Unfortunately, it is currently bottlenecked by the computational power and memory bandwidth limitations of existing systems, as many of the steps in genome sequence analysis must process a large amount of data.
Hardware Architecture Genomics