1 code implementation • 2 Jan 2025 • Zihao Ye, Lequn Chen, Ruihang Lai, Wuwei Lin, Yineng Zhang, Stephanie Wang, Tianqi Chen, Baris Kasikci, Vinod Grover, Arvind Krishnamurthy, Luis Ceze
We present FlashInfer: a customizable and efficient attention engine for LLM serving.
no code implementations • 6 Nov 2024 • Muhammad Tayyab Khan, Lequn Chen, Ye Han Ng, Wenhe Feng, Nicholas Yew Jin Tan, Seung Ki Moon
These findings highlight the effectiveness of fine-tuning smaller, open-source VLMs like Florence-2, offering a practical and efficient solution for automated GD&T extraction to support downstream manufacturing tasks.
no code implementations • 5 Nov 2024 • Muhammad Tayyab Khan, Lequn Chen, Ye Han Ng, Wenhe Feng, Nicholas Yew Jin Tan, Seung Ki Moon
To address these challenges, this study investigates vision-language models (VLMs) for automating the recognition of a wide range of manufacturing features in CAD designs without the need for extensive training datasets or predefined rules.
no code implementations • 13 Aug 2024 • Muhammad Tayyab Khan, Wenhe Feng, Lequn Chen, Ye Han Ng, Nicholas Yew Jin Tan, Seung Ki Moon
Therefore, the proposed methodology enhances the integration of CAD, CAPP, and CAM within hybrid manufacturing by providing precise feature recognition and dimension extraction.
no code implementations • 9 Aug 2024 • Jiarui Xie, Mutahar Safdar, Lequn Chen, Seung Ki Moon, Yaoyao Fiona Zhao
This paper introduces a cross-modality knowledge transfer (CMKT) methodology for LAM in-situ monitoring, which transfers knowledge from a source modality to a target modality.
no code implementations • 21 Apr 2024 • Lequn Chen, Guijun Bi, Xiling Yao, Jinlong Su, Chaolin Tan, Wenhe Feng, Michalis Benakis, Youxiang Chew, Seung Ki Moon
Future directions are proposed, with an emphasis on multimodal sensor fusion for multiscale defect prediction and fault diagnosis, ultimately enabling self-adaptation in LAM processes.
no code implementations • 17 Jan 2024 • Yao Lu, Song Bian, Lequn Chen, Yongjun He, Yulong Hui, Matthew Lentz, Beibin Li, Fei Liu, Jialin Li, Qi Liu, Rui Liu, Xiaoxuan Liu, Lin Ma, Kexin Rong, Jianguo Wang, Yingjun Wu, Yongji Wu, Huanchen Zhang, Minjia Zhang, Qizhen Zhang, Tianyi Zhou, Danyang Zhuo
In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures.
1 code implementation • 29 Oct 2023 • Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci
To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss.
1 code implementation • 28 Oct 2023 • Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, Arvind Krishnamurthy
Our scheduler consolidates multi-tenant LoRA serving workloads in a shared GPU cluster.
no code implementations • 14 Aug 2023 • Lequn Chen, Weixin Deng, Anirudh Canumalla, Yu Xin, Danyang Zhuo, Matthai Philipose, Arvind Krishnamurthy
However, existing model serving systems cannot achieve adequate batch sizes while meeting latency objectives as these systems eagerly dispatch requests to accelerators to minimize the accelerator idle time.
no code implementations • 23 May 2023 • Lequn Chen, Xiling Yao, Wenhe Feng, Youxiang Chew, Seung Ki Moon
Traditional in-situ monitoring approach utilizes a single sensor (i. e., acoustic, visual, or thermal sensor) to capture the complex process dynamic behaviors, which is insufficient for defect detection with high accuracy and robustness.
no code implementations • 12 Apr 2023 • Lequn Chen, Xiling Yao, Kui Liu, Chaolin Tan, Seung Ki Moon
Early detection and correction of defects are critical in additive manufacturing (AM) to avoid build failures.
no code implementations • 5 Dec 2018 • Ignacio Cano, Lequn Chen, Pedro Fonseca, Tianqi Chen, Chern Cheah, Karan Gupta, Ramesh Chandra, Arvind Krishnamurthy
Our large-scale analysis confirms that VMs are often misconfigured, either overprovisioned or underprovisioned, and that this problem is pervasive across a wide range of private clusters.