Search Results for author: Deming Chen

Found 50 papers, 22 papers with code

Building Machine Learning Challenges for Anomaly Detection in Science

no code implementations3 Mar 2025 Elizabeth G. Campolongo, Yuan-Tang Chou, Ekaterina Govorkova, Wahid Bhimji, Wei-Lun Chao, Chris Harris, Shih-Chieh Hsu, Hilmar Lapp, Mark S. Neubauer, Josephine Namayanja, Aneesh Subramanian, Philip Harris, Advaith Anand, David E. Carlyn, Subhankar Ghosh, Christopher Lawrence, Eric Moreno, Ryan Raikman, Jiaman Wu, Ziheng Zhang, Bayu Adhi, Mohammad Ahmadi Gharehtoragh, Saúl Alonso Monsalve, Marta Babicz, Furqan Baig, Namrata Banerji, William Bardon, Tyler Barna, Tanya Berger-Wolf, Adji Bousso Dieng, Micah Brachman, Quentin Buat, David C. Y. Hui, Phuong Cao, Franco Cerino, Yi-Chun Chang, Shivaji Chaulagain, An-Kai Chen, Deming Chen, Eric Chen, Chia-Jui Chou, Zih-Chen Ciou, Miles Cochran-Branson, Artur Cordeiro Oudot Choi, Michael Coughlin, Matteo Cremonesi, Maria Dadarlat, Peter Darch, Malina Desai, Daniel Diaz, Steven Dillmann, Javier Duarte, Isla Duporge, Urbas Ekka, Saba Entezari Heravi, Hao Fang, Rian Flynn, Geoffrey Fox, Emily Freed, Hang Gao, Jing Gao, Julia Gonski, Matthew Graham, Abolfazl Hashemi, Scott Hauck, James Hazelden, Joshua Henry Peterson, Duc Hoang, Wei Hu, Mirco Huennefeld, David Hyde, Vandana Janeja, Nattapon Jaroenchai, Haoyi Jia, Yunfan Kang, Maksim Kholiavchenko, Elham E. Khoda, Sangin Kim, Aditya Kumar, Bo-Cheng Lai, Trung Le, Chi-Wei Lee, Janghyeon Lee, Shaocheng Lee, Suzan van der Lee, Charles Lewis, Haitong Li, Haoyang Li, Henry Liao, Mia Liu, Xiaolin Liu, Xiulong Liu, Vladimir Loncar, Fangzheng Lyu, Ilya Makarov, Abhishikth Mallampalli Chen-Yu Mao, Alexander Michels, Alexander Migala, Farouk Mokhtar, Mathieu Morlighem, Min Namgung, Andrzej Novak, Andrew Novick, Amy Orsborn, Anand Padmanabhan, Jia-Cheng Pan, Sneh Pandya, Zhiyuan Pei, Ana Peixoto, George Percivall, Alex Po Leung, Sanjay Purushotham, Zhiqiang Que, Melissa Quinnan, Arghya Ranjan, Dylan Rankin, Christina Reissel, Benedikt Riedel, Dan Rubenstein, Argyro Sasli, Eli Shlizerman, Arushi Singh, Kim Singh, Eric R. Sokol, Arturo Sorensen, Yu Su, Mitra Taheri, Vaibhav Thakkar, Ann Mariam Thomas, Eric Toberer, Chenghan Tsai, Rebecca Vandewalle, Arjun Verma, Ricco C. Venterea, He Wang, Jianwu Wang, Sam Wang, Shaowen Wang, Gordon Watts, Jason Weitz, Andrew Wildridge, Rebecca Williams, Scott Wolf, Yue Xu, Jianqi Yan, Jai Yu, Yulei Zhang, Haoran Zhao, Ying Zhao, Yibo Zhong

We present the different datasets along with a scheme to make machine learning challenges around the three datasets findable, accessible, interoperable, and reusable (FAIR).

Anomaly Detection scientific discovery

Large Language Model Strategic Reasoning Evaluation through Behavioral Game Theory

no code implementations27 Feb 2025 Jingru Jia, Zehua Yuan, Junhao Pan, Paul E. McNamara, Deming Chen

Strategic decision-making involves interactive reasoning where agents adapt their choices in response to others, yet existing evaluations of large language models (LLMs) often emphasize Nash Equilibrium (NE) approximation, overlooking the mechanisms driving their strategic choices.

Decision Making Fairness +3

New Solutions on LLM Acceleration, Optimization, and Application

no code implementations16 Jun 2024 Yingbing Huang, Lily Jiaxin Wan, Hanchen Ye, Manvi Jha, Jinghua Wang, Yuhong Li, Xiaofan Zhang, Deming Chen

Large Language Models (LLMs) have become extremely potent instruments with exceptional capacities for comprehending and producing human-like text in a wide range of applications.

High-Level Synthesis

Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

no code implementations10 Jun 2024 Jingru Jia, Zehua Yuan, Junhao Pan, Paul E. McNamara, Deming Chen

Through a multiple-choice-list experiment, we estimate the degree of risk preference, probability weighting, and loss aversion in a context-free setting for three commercial LLMs: ChatGPT-4. 0-Turbo, Claude-3-Opus, and Gemini-1. 0-pro.

Decision Making Multiple-choice

SnapKV: LLM Knows What You are Looking for Before Generation

1 code implementation22 Apr 2024 Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen

Specifically, SnapKV achieves a consistent decoding speed with a 3. 6x increase in generation speed and an 8. 2x enhancement in memory efficiency compared to the baseline when processing inputs of 16K tokens.

16k

On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models

1 code implementation4 Apr 2024 Sean Farhat, Deming Chen

We observe that, when distilled on a task from a pre-trained teacher model, a small model can achieve or surpass the performance it would achieve if it was pre-trained then finetuned on that task.

Contrastive Learning Knowledge Distillation

FedCore: Straggler-Free Federated Learning with Distributed Coresets

1 code implementation31 Jan 2024 Hongpeng Guo, Haotian Gu, Xiaoyang Wang, Bo Chen, Eun Kyung Lee, Tamar Eilam, Deming Chen, Klara Nahrstedt

Federated learning (FL) is a machine learning paradigm that allows multiple clients to collaboratively train a shared model while keeping their data on-premise.

Federated Learning

Subgraph Extraction-based Feedback-guided Iterative Scheduling for HLS

no code implementations22 Jan 2024 Hanchen Ye, David Z. Pan, Chris Leary, Deming Chen, Xiaoqing Xu

This paper proposes ISDC, a novel feedback-guided iterative system of difference constraints (SDC) scheduling algorithm for high-level synthesis (HLS).

High-Level Synthesis Scheduling

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

1 code implementation19 Jan 2024 Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao

We present two levels of fine-tuning procedures for Medusa to meet the needs of different use cases: Medusa-1: Medusa is directly fine-tuned on top of a frozen backbone LLM, enabling lossless inference acceleration.

Extensible and Efficient Proxy for Neural Architecture Search

no code implementations ICCV 2023 Yuhong Li, Jiajie Li, Cong Hao, Pan Li, JinJun Xiong, Deming Chen

We further propose a Discrete Proxy Search (DPS) method to find the optimized training settings for Eproxy with only a handful of benchmarked architectures on the target tasks.

Neural Architecture Search

What Makes Convolutional Models Great on Long Sequence Modeling?

1 code implementation17 Oct 2022 Yuhong Li, Tianle Cai, Yi Zhang, Deming Chen, Debadeepta Dey

We focus on the structure of the convolution kernel and identify two critical but intuitive principles enjoyed by S4 that are sufficient to make up an effective global convolutional model: 1) The parameterization of the convolutional kernel needs to be efficient in the sense that the number of parameters should scale sub-linearly with sequence length.

Long-range modeling

Extensible Proxy for Efficient NAS

1 code implementation17 Oct 2022 Yuhong Li, Jiajie Li, Cong Han, Pan Li, JinJun Xiong, Deming Chen

(2) Efficient proxies are not extensible to multi-modality downstream tasks.

Neural Architecture Search

HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation

no code implementations22 Jul 2022 Yao Chen, Junhao Pan, Xinheng Liu, JinJun Xiong, Deming Chen

In this study, we propose HiKonv, a unified solution that maximizes the throughput of convolution on a given underlying processing unit with low-bitwidth quantized data inputs through novel bit-wise management and parallel computation.

Management Quantization

ORB-based SLAM accelerator on SoC FPGA

no code implementations18 Jul 2022 Vibhakar Vemulapati, Deming Chen

Simultaneous Localization and Mapping (SLAM) is one of the main components of autonomous navigation systems.

Autonomous Navigation Simultaneous Localization and Mapping

Compilation and Optimizations for Efficient Machine Learning on Embedded Systems

no code implementations6 Jun 2022 Xiaofan Zhang, Yao Chen, Cong Hao, Sitao Huang, Yuhong Li, Deming Chen

Deep Neural Networks (DNNs) have achieved great success in a variety of machine learning (ML) applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc.

BIG-bench Machine Learning

AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models

no code implementations21 Jan 2022 Xiaofan Zhang, Zongwei Zhou, Deming Chen, Yu Emma Wang

By evaluating on SQuAD, a model found by AutoDistill achieves an 88. 4% F1 score with 22. 8M parameters, which reduces parameters by more than 62% while maintaining higher accuracy than DistillBERT, TinyBERT, and NAS-BERT.

Bayesian Optimization Knowledge Distillation +2

HiKonv: High Throughput Quantized Convolution With Novel Bit-wise Management and Computation

no code implementations28 Dec 2021 Xinheng Liu, Yao Chen, Prakhar Ganesh, Junhao Pan, JinJun Xiong, Deming Chen

Quantization for Convolutional Neural Network (CNN) has shown significant progress with the intention of reducing the cost of computation and storage with low-bitwidth data inputs.

Management Quantization

EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search

1 code implementation24 Nov 2021 Qian Jiang, Xiaofan Zhang, Deming Chen, Minh N. Do, Raymond A. Yeh

In this work, we propose End-to-end Hardware-aware DNAS (EH-DNAS), a seamless integration of end-to-end hardware benchmarking, and fully automated DNAS to deliver hardware-efficient deep neural networks on various platforms, including Edge GPUs, Edge TPUs, Mobile CPUs, and customized accelerators.

Benchmarking Neural Architecture Search

YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs

1 code implementation26 Oct 2021 Prakhar Ganesh, Yao Chen, Yin Yang, Deming Chen, Marianne Winslett

Performance of object detection models has been growing rapidly on two major fronts, model accuracy and efficiency.

object-detection Real-Time Object Detection +1

Generic Neural Architecture Search via Regression

2 code implementations NeurIPS 2021 Yuhong Li, Cong Hao, Pan Li, JinJun Xiong, Deming Chen

Such a self-supervised regression task can effectively evaluate the intrinsic power of an architecture to capture and transform the input signal patterns, and allow more sufficient usage of training samples.

 Ranked #1 on Neural Architecture Search on NAS-Bench-101 (Spearman Correlation metric)

Image Classification Neural Architecture Search +1

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

1 code implementation9 Jul 2021 Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, Deming Chen

We implement our proposed accelerator on multiple FPGAs, which outperforms the state-of-the-art designs in terms of both throughput and DSP efficiency.

Software/Hardware Co-design for Multi-modal Multi-task Learning in Autonomous Systems

no code implementations8 Apr 2021 Cong Hao, Deming Chen

We formulate the MMMT model and heterogeneous hardware implementation co-design as a differentiable optimization problem, with the objective of improving the solution quality and reducing the overall power consumption and critical path latency.

Multi-Task Learning Sensor Fusion

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Co-design

no code implementations25 Mar 2021 Cong Hao, Jordan Dotzel, JinJun Xiong, Luca Benini, Zhiru Zhang, Deming Chen

Artificial intelligence (AI) technologies have dramatically advanced in recent years, resulting in revolutionary changes in people's lives.

Benchmarking Edge-computing

F-CAD: A Framework to Explore Hardware Accelerators for Codec Avatar Decoding

no code implementations8 Mar 2021 Xiaofan Zhang, Dawei Wang, Pierce Chuang, Shugao Ma, Deming Chen, Yuecheng Li

Creating virtual avatars with realistic rendering is one of the most essential and challenging tasks to provide highly immersive virtual reality (VR) experiences.

Decoder

Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture

1 code implementation4 Mar 2021 Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, JinJun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu

In this work, we propose a novel GPU-oriented data communication approach for GCN training, where GPU threads directly access sparse features in host memory through zero-copy accesses without much CPU help.

Recommendation Systems

PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses

1 code implementation20 Jan 2021 Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, JinJun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu

While this process accounts for a significant portion of the training time, we find existing GNN implementations using popular deep neural network (DNN) libraries such as PyTorch are limited to a CPU-centric approach for the entire data preparation step.

Graph Neural Network

TwinDNN: A Tale of Two Deep Neural Networks

no code implementations1 Jan 2021 Hyunmin Jeong, Deming Chen

This is the first work that considers using a highly compressed DNN along with the original DNN in parallel to improve latency significantly while effectively maintaining the original model accuracy.

Image Classification Model Compression +2

Improving Random-Sampling Neural Architecture Search by Evolving the Proxy Search Space

1 code implementation1 Jan 2021 Yuhong Li, Cong Hao, Xiaofan Zhang, JinJun Xiong, Wen-mei Hwu, Deming Chen

This raises the question of whether we can find an effective proxy search space (PS) that is only a small subset of GS to dramatically improve RandomNAS’s search efficiency while at the same time keeping a good correlation for the top-performing architectures.

Image Classification Neural Architecture Search

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices

no code implementations14 Oct 2020 Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, JinJun Xiong, Wen-mei Hwu, Deming Chen

High quality AI solutions require joint optimization of AI algorithms, such as deep neural networks (DNNs), and their hardware accelerators.

Comprehensive assessment of error correction methods for high-throughput sequencing data

no code implementations10 Jul 2020 Yun Heo, Gowthami Manikandan, Anand Ramachandran, Deming Chen

We also present a compilation of sequencing datasets for Illumina, PacBio and ONT platforms that present challenging scenarios for error-correction tools.

Vocal Bursts Intensity Prediction

VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization

1 code implementation18 May 2020 Cheng Gong, Yao Chen, Ye Lu, Tao Li, Cong Hao, Deming Chen

Quantization has been proven to be an effective method for reducing the computing and/or storage cost of DNNs.

Model Compression object-detection +2

EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions

no code implementations6 May 2020 Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen

We formulate the co-search problem by fusing DNN search variables and hardware implementation variables into one solution space, and maximize both algorithm accuracy and hardware implementation quality.

Neural Architecture Search

HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation

no code implementations8 Apr 2020 Hanchen Ye, Xiaofan Zhang, Zhize Huang, Gengsheng Chen, Deming Chen

To speedup Deep Neural Networks (DNN) accelerator design and enable effective implementation, we propose HybridDNN, a framework for building high-performance hybrid DNN accelerators and delivering FPGA-based hardware implementations.

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

no code implementations27 Feb 2020 Prakhar Ganesh, Yao Chen, Xin Lou, Mohammad Ali Khan, Yin Yang, Hassan Sajjad, Preslav Nakov, Deming Chen, Marianne Winslett

Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks.

Model Compression

AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs

1 code implementation6 Jan 2020 Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, Yingyan Lin

Specifically, AutoDNNchip consists of two integrated enablers: (1) a Chip Predictor, built on top of a graph-based accelerator representation, which can accurately and efficiently predict a DNN accelerator's energy, throughput, and area based on the DNN model parameters, hardware configuration, technology-based IPs, and platform constraints; and (2) a Chip Builder, which can automatically explore the design space of DNN chips (including IP selection, block configuration, resource balancing, etc.

NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving

no code implementations18 Nov 2019 Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, JinJun Xiong, Wen-mei Hwu, Junli Gu, Deming Chen

The rapidly growing demands for powerful AI algorithms in many application domains have motivated massive investment in both high-quality deep neural network (DNN) models and high-efficiency implementations.

Autonomous Driving

SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection

2 code implementations25 Jun 2019 Xiaofan Zhang, Cong Hao, Haoming Lu, Jiachen Li, Yuhong Li, Yuchen Fan, Kyle Rupnow, JinJun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen

Developing artificial intelligence (AI) at the edge is always challenging, since edge devices have limited computation capability and memory resources but need to meet demanding requirements, such as real-time processing, high throughput performance, and high inference accuracy.

object-detection Object Detection

A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices

3 code implementations20 May 2019 Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen

Developing deep learning models for resource-constrained Internet-of-Things (IoT) devices is challenging, as it is difficult to achieve both good quality of results (QoR), such as DNN model inference accuracy, and quality of service (QoS), such as inference latency, throughput, and power consumption.

object-detection Object Detection

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

3 code implementations9 Apr 2019 Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, JinJun Xiong, Kyle Rupnow, Wen-mei Hwu, Deming Chen

While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment.

C++ code object-detection +1

SiamVGG: Visual Tracking using Deeper Siamese Networks

4 code implementations7 Feb 2019 Yuhong Li, Xiaofan Zhang, Deming Chen

It combines a Convolutional Neural Network (CNN) backbone and a cross-correlation operator, and takes advantage of the features from exemplary images for more accurate object tracking.

Visual Object Tracking Visual Tracking

When CTC Training Meets Acoustic Landmarks

no code implementations5 Nov 2018 Di He, Xuesong Yang, Boon Pang Lim, Yi Liang, Mark Hasegawa-Johnson, Deming Chen

In this paper, the convergence properties of CTC are improved by incorporating acoustic landmarks.

Automatic Speech Recognition (ASR)

Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA

no code implementations31 Jul 2018 Junsong Wang, Qiuwen Lou, Xiaofan Zhang, Chao Zhu, Yonghua Lin, Deming Chen

To create such accelerators, we propose a design flow for accelerating the extremely low bit-width neural network (ELB-NN) in embedded FPGAs with hybrid quantization schemes.

Edge-computing Quantization

Improved ASR for Under-Resourced Languages Through Multi-Task Learning with Acoustic Landmarks

no code implementations15 May 2018 Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson, Deming Chen

Furui first demonstrated that the identity of both consonant and vowel can be perceived from the C-V transition; later, Stevens proposed that acoustic landmarks are the primary cues for speech perception, and that steady-state regions are secondary or supplemental.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

13 code implementations CVPR 2018 Yuhong Li, Xiaofan Zhang, Deming Chen

We demonstrate CSRNet on four datasets (ShanghaiTech dataset, the UCF_CC_50 dataset, the WorldEXPO'10 dataset, and the UCSD dataset) and we deliver the state-of-the-art performance.

Crowd Counting Scene Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.