no code implementations • 22 Dec 2024 • Dennis Menn, Feng Liang, Hung-Yueh Chiang, Diana Marculescu
This paper shows that the similarity of denoised images between consecutive time steps during the sampling process is related to the severity of artifacts in images generated by diffusion models.
1 code implementation • 26 Oct 2024 • Yang Zhou, Zhen Dong, Ellick Chan, Dhiraj Kalamkar, Diana Marculescu, Kurt Keutzer
The size of these 1TB+ tables imposes a severe memory bottleneck for the training and inference of recommendation models.
1 code implementation • 17 Oct 2024 • Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin, Kai-Chiang Wu, Diana Marculescu
To address this issue, we propose a static 8-bit per-tensor SSM quantization method which suppresses the maximum values of the input activations to the selective SSM for finer quantization precision and quantizes the output activations in an outlier-free space with Hadamard transform.
1 code implementation • 28 Sep 2024 • Tanvir Mahmud, Diana Marculescu
Audio separation in real-world scenarios, where mixtures contain a variable number of sources, presents significant challenges due to limitations of existing models, such as over-separation, under-separation, and dependence on predefined training sources.
1 code implementation • 15 Sep 2024 • Ning-Chi Huang, Chi-Chih Chang, Wei-Cheng Lin, Endri Taka, Diana Marculescu, Kai-Chiang Wu
$N{:}M$ sparsity is an emerging model compression method supported by more and more accelerators to speed up sparse matrix multiplication in deep neural networks.
no code implementations • 27 Aug 2024 • Hung-Yueh Chiang, Diana Marculescu
Designing low-latency and high-efficiency hybrid networks for a variety of low-cost commodity edge devices is both costly and tedious, leading to the adoption of hardware-aware neural architecture search (NAS) for finding optimal architectures.
Hardware Aware Neural Architecture Search
Neural Architecture Search
1 code implementation • 7 Jun 2024 • Tanvir Mahmud, Shentong Mo, Yapeng Tian, Diana Marculescu
Furthermore, to suppress the background features in each modality from foreground matched audio-visual features, we introduce a robust discriminative foreground mining scheme.
1 code implementation • 7 Jun 2024 • Tanvir Mahmud, Mustafa Munir, Radu Marculescu, Diana Marculescu
This enables a greater number of cross-frame attentions over more frames within the same computational budget, thereby enhancing both video quality and temporal coherence.
1 code implementation • 24 May 2024 • Feng Liang, Akio Kodaira, Chenfeng Xu, Masayoshi Tomizuka, Kurt Keutzer, Diana Marculescu
This paper introduces StreamV2V, a diffusion model that achieves real-time streaming video-to-video (V2V) translation with user prompts.
2 code implementations • 2 Apr 2024 • Tanvir Mahmud, Saeed Amizadeh, Kazuhito Koishida, Diana Marculescu
Conditional sound separation in multi-source audio mixtures without having access to single source sound data during training is a long standing challenge.
1 code implementation • CVPR 2024 • Tanvir Mahmud, Yapeng Tian, Diana Marculescu
Visual sound source localization poses a significant challenge in identifying the semantic region of each sounding source within a video.
1 code implementation • 24 Mar 2024 • Tanvir Mahmud, Burhaneddin Yaman, Chun-Hao Liu, Diana Marculescu
To solve this, we first introduce a novel property of lightweight ConvNets: their ability to identify key discriminative patch regions in images, irrespective of model's final accuracy or size.
no code implementations • CVPR 2024 • Feng Liang, Bichen Wu, Jialiang Wang, Licheng Yu, Kunpeng Li, Yinan Zhao, Ishan Misra, Jia-Bin Huang, Peizhao Zhang, Peter Vajda, Diana Marculescu
This enables our model for video synthesis by editing the first frame with any prevalent I2I models and then propagating edits to successive frames.
1 code implementation • 7 Nov 2023 • Chi-Chih Chang, Yuan-Yao Sung, Shixing Yu, Ning-Chi Huang, Diana Marculescu, Kai-Chiang Wu
With our method, a more fine-grained rank configuration can be generated automatically and yield up to 33% extra FLOPs reduction compared to a simple uniform configuration.
1 code implementation • ICCV 2023 • Natalia Frumkin, Dibakar Gope, Diana Marculescu
Evol-Q improves the top-1 accuracy of a fully quantized ViT-Base by $10. 30\%$, $0. 78\%$, and $0. 15\%$ for $3$-bit, $4$-bit, and $8$-bit weight quantization levels.
no code implementations • 5 Dec 2022 • Hung-Yueh Chiang, Natalia Frumkin, Feng Liang, Diana Marculescu
MobileTL trains the shifts for internal normalization layers to avoid storing activation maps for the backward pass.
no code implementations • 17 Nov 2022 • Natalia Frumkin, Dibakar Gope, Diana Marculescu
Borrowing the idea of contrastive loss from self-supervised learning, we find a robust way to jointly minimize a loss function using just 1, 000 calibration images.
no code implementations • 11 Oct 2022 • Tanvir Mahmud, Diana Marculescu
An audio-visual event (AVE) is denoted by the correspondence of the visual and auditory signals in a video segment.
1 code implementation • CVPR 2023 • Feng Liang, Bichen Wu, Xiaoliang Dai, Kunpeng Li, Yinan Zhao, Hang Zhang, Peizhao Zhang, Peter Vajda, Diana Marculescu
To address this, we propose to finetune CLIP on a collection of masked image regions and their corresponding text descriptions.
Ranked #4 on
Semantic Segmentation
on Replica
no code implementations • 30 Jun 2022 • Ahmet Inci, Siri Garudanagiri Virupaksha, Aman Jain, Ting-Wu Chin, Venkata Vivek Thallam, Ruizhou Ding, Diana Marculescu
As the machine learning and systems communities strive to achieve higher energy-efficiency through custom deep neural network (DNN) accelerators, varied precision or quantization levels, and model compression techniques, there is a need for design space exploration frameworks that incorporate quantization-aware processing elements into the accelerator design space while having accurate and fast power, performance, and area models.
no code implementations • 27 Jun 2022 • Ahmet Inci, Mehmet Meric Isgenc, Diana Marculescu
DeepNVM++ is demonstrated on STT-/SOT-MRAM technologies and can be used for the characterization, modeling, and analysis of any NVM technology for last-level caches in GPUs for DL applications.
no code implementations • 22 Jun 2022 • Yang Zhou, Feng Liang, Ting-Wu Chin, Diana Marculescu
Machine learning (ML) has entered the mobile era where an enormous number of ML models are deployed on edge devices.
2 code implementations • 28 May 2022 • Feng Liang, Yangguang Li, Diana Marculescu
The proposed Supervised MAE (SupMAE) only exploits a visible subset of image patches for classification, unlike the standard supervised pre-training where all image patches are used.
no code implementations • 20 May 2022 • Ahmet Inci, Siri Garudanagiri Virupaksha, Aman Jain, Venkata Vivek Thallam, Ruizhou Ding, Diana Marculescu
We also show that the proposed lightweight processing elements (LightPEs) consistently achieve Pareto-optimal results in terms of accuracy and hardware-efficiency.
no code implementations • 17 May 2022 • Ahmet Inci, Siri Garudanagiri Virupaksha, Aman Jain, Venkata Vivek Thallam, Ruizhou Ding, Diana Marculescu
As the machine learning and systems community strives to achieve higher energy-efficiency through custom DNN accelerators and model compression techniques, there is a need for a design space exploration framework that incorporates quantization-aware processing elements into the accelerator design space while having accurate and fast power, performance, and area models.
no code implementations • 24 Apr 2021 • Ting-Wu Chin, Diana Marculescu, Ari S. Morcos
In this work, we propose width transfer, a technique that harnesses the assumptions that the optimized widths (or channel counts) are regular across sizes and depths.
no code implementations • 8 Dec 2020 • Ahmet Inci, Mehmet Meric Isgenc, Diana Marculescu
Under iso-area assumptions, STT-MRAM and SOT-MRAM provide up to 2x and 2. 3x EDP reduction and accommodate 2. 3x and 3. 3x cache capacity when compared to SRAM, respectively.
no code implementations • 8 Dec 2020 • Ahmet Inci, Evgeny Bolotin, Yaosheng Fu, Gal Dalal, Shie Mannor, David Nellans, Diana Marculescu
With deep reinforcement learning (RL) methods achieving results that exceed human capabilities in games, robotics, and simulated environments, continued scaling of RL training is crucial to its deployment in solving complex real-world problems.
no code implementations • 28 Sep 2020 • Rudy Chin, Ari S. Morcos, Diana Marculescu
Slimmable neural networks provide a flexible trade-off front between prediction error and computational cost (such as the number of floating-point operations or FLOPs) with the same storage cost as a single model.
no code implementations • 22 Aug 2020 • Ting-Wu Chin, Pierce I-Jen Chuang, Vikas Chandra, Diana Marculescu
Weight quantization for deep ConvNets has shown promising results for applications such as image classification and semantic segmentation and is especially important for applications where memory storage is limited.
2 code implementations • 23 Jul 2020 • Ting-Wu Chin, Ari S. Morcos, Diana Marculescu
In this work, we propose a general framework to enable joint optimization for both width configurations and weights of slimmable networks.
1 code implementation • 7 Feb 2020 • Ting-Wu Chin, Cha Zhang, Diana Marculescu
Fine-tuning through knowledge transfer from a pre-trained model on a large-scale dataset is a widely spread approach to effectively build models on small-scale datasets.
no code implementations • 25 Sep 2019 • Ting-Wu Chin, Pierce I-Jen Chuang, Vikas Chandra, Diana Marculescu
Weight Quantization for deep convolutional neural networks (CNNs) has shown promising results in compressing and accelerating CNN-powered applications such as semantic segmentation, gesture recognition, and scene understanding.
1 code implementation • 1 Jul 2019 • Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, Diana Marculescu
In this work, we alleviate the NAS search cost down to less than 3 hours, while achieving state-of-the-art image classification results under mobile latency constraints.
1 code implementation • 19 Jun 2019 • Zhuo Chen, Jiyuan Zhang, Ruizhou Ding, Diana Marculescu
In this paper, we propose Virtual Pooling (ViP), a model-level approach to improve speed and energy consumption of CNN-based image classification and object detection tasks, with a provable error bound.
no code implementations • 10 May 2019 • Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, Diana Marculescu
Can we automatically design a Convolutional Network (ConvNet) with the highest image classification accuracy under the latency constraint of a mobile device?
1 code implementation • CVPR 2020 • Ting-Wu Chin, Ruizhou Ding, Cha Zhang, Diana Marculescu
First, both the accuracy and the speed of ConvNets can affect the performance of the application.
no code implementations • 5 Apr 2019 • Ruizhou Ding, Zeye Liu, Ting-Wu Chin, Diana Marculescu, R. D., Blanton
Over 46 FPGA-design experiments involving eight configurations and four data sets reveal that lightweight neural networks with a flexible $k$ value (dubbed FLightNNs) fully utilize the hardware resources on Field Programmable Gate Arrays (FPGAs), our experimental results show that FLightNNs can achieve 2$\times$ speedup when compared to lightweight NNs with $k=2$, with only 0. 1\% accuracy degradation.
9 code implementations • 5 Apr 2019 • Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, Diana Marculescu
Can we automatically design a Convolutional Network (ConvNet) with the highest image classification accuracy under the runtime constraint of a mobile device?
Ranked #965 on
Image Classification
on ImageNet
1 code implementation • CVPR 2019 • Ruizhou Ding, Ting-Wu Chin, Zeye Liu, Diana Marculescu
Binarized Neural Networks (BNNs) can significantly reduce the inference latency and energy consumption in resource-constrained devices due to their pure-logical computation and fewer memory accesses.
no code implementations • 8 Feb 2019 • Ting-Wu Chin, Ruizhou Ding, Diana Marculescu
In vision-enabled autonomous systems such as robots and autonomous cars, video object detection plays a crucial role, and both its speed and accuracy are important factors to provide reliable operation.
1 code implementation • 21 Jan 2019 • Zhuo Chen, Ruizhou Ding, Ting-Wu Chin, Diana Marculescu
In this paper, we conduct extensive experiments using various datasets to demonstrate and analyze how and why training based on fine-grain labeling, such as "Persian cat" can improve CNN accuracy on classifying coarse-grain classes, in this case "cat."
no code implementations • NIPS Workshop CDNNRIA 2018 • Ruizhou Ding, Zeye Liu, Ting-Wu Chin, Diana Marculescu, R.D. (Shawn) Blanton
To reduce runtime and resource utilization of Deep Neural Networks (DNNs) on customized hardware, LightNN has been proposed by constraining the weights of DNNs to be a sum of a limited number (denoted as $k\in\{1, 2\}$) of powers of 2.
1 code implementation • 20 Oct 2018 • Biresh Kumar Joardar, Ryan Gary Kim, Janardhan Rao Doppa, Partha Pratim Pande, Diana Marculescu, Radu Marculescu
Our results show that these generalized 3D NoCs only incur a 1. 8% (36-tile system) and 1. 1% (64-tile system) average performance loss compared to application-specific NoCs.
1 code implementation • 1 Oct 2018 • Ting-Wu Chin, Cha Zhang, Diana Marculescu
Resource-efficient convolution neural networks enable not only the intelligence on edge devices but also opportunities in system-level optimization such as scheduling.
no code implementations • 14 Sep 2018 • Diana Marculescu, Dimitrios Stamoulis, Ermao Cai
What is the latency or energy cost for an inference made by a Deep Neural Network (DNN)?
no code implementations • 5 Aug 2018 • Dimitrios Stamoulis, Ting-Wu Chin, Anand Krishnan Prakash, Haocheng Fang, Sribhuvan Sajja, Mitchell Bognar, Diana Marculescu
We cast the design of adaptive CNNs as a hyper-parameter optimization problem with respect to energy, accuracy, and communication constraints imposed by the mobile device.
no code implementations • 23 Jan 2018 • Ifigeneia Apostolopoulou, Diana Marculescu
Probabilistic Boolean Networks (PBNs) have been previously proposed so as to gain insights into complex dy- namical systems.
no code implementations • 6 Dec 2017 • Dimitrios Stamoulis, Ermao Cai, Da-Cheng Juan, Diana Marculescu
While selecting the hyper-parameters of Neural Networks (NNs) has been so far treated as an art, the emergence of more complex, deeper architectures poses increasingly more challenges to designers and Machine Learning (ML) practitioners, especially when power and memory constraints need to be considered.
no code implementations • 2 Dec 2017 • Ruizhou Ding, Zeye Liu, Rongye Shi, Diana Marculescu, R. D. Blanton
For a fixed DNN configuration, LightNNs have better accuracy at a slight energy increase than BNNs, yet are more energy efficient with only slightly less accuracy than conventional DNNs.
no code implementations • 30 Nov 2017 • Ryan Gary Kim, Janardhan Rao Doppa, Partha Pratim Pande, Diana Marculescu, Radu Marculescu
Tight collaboration between experts of machine learning and manycore system design is necessary to create a data-driven manycore design framework that integrates both learning and expert knowledge.
3 code implementations • 15 Oct 2017 • Ermao Cai, Da-Cheng Juan, Dimitrios Stamoulis, Diana Marculescu
We also propose the "energy-precision ratio" (EPR) metric to guide machine learners in selecting an energy-efficient CNN architecture that better trades off the energy consumption and prediction accuracy.