Search Results for author: Ningyi Xu

Found 14 papers, 5 papers with code

Speculative Decoding for Verilog: Speed and Quality, All in One

no code implementations18 Mar 2025 Changran Xu, Yi Liu, Yunhao Zhou, Shan Huang, Ningyi Xu, Qiang Xu

The rapid advancement of large language models (LLMs) has revolutionized code generation tasks across various programming languages.

All Code Generation

ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM

no code implementations17 Mar 2025 WenQiang Wang, Yijia Zhang, Zikai Zhang, Guanting Huo, Hao Liang, Shijie Cao, Ningyi Xu

In this work, we propose ROMA, a QLoRA accelerator with a hybrid storage architecture that uses ROM for quantized base models and SRAM for LoRA weights and KV cache.

Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback

no code implementations13 Mar 2025 Derun Li, Jianwei Ren, Yue Wang, Xin Wen, Pengxiang Li, Leimeng Xu, Kun Zhan, Zhongpu Xia, Peng Jia, Xianpeng Lang, Ningyi Xu, Hang Zhao

To address this, we introduce TrajHF, a human feedback-driven finetuning framework for generative trajectory models, designed to align motion planning with diverse driving preferences.

Autonomous Driving Imitation Learning +1

DeepGate4: Efficient and Effective Representation Learning for Circuit Design at Scale

no code implementations2 Feb 2025 Ziyang Zheng, Shan Huang, Jianyuan Zhong, Zhengyuan Shi, Guohao Dai, Ningyi Xu, Qiang Xu

Circuit representation learning has become pivotal in electronic design automation, enabling critical tasks such as testability analysis, logic reasoning, power estimation, and SAT solving.

Representation Learning

Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach

no code implementations28 Nov 2024 Yijia Zhang, Zhihong Gou, Shijie Cao, Weigang Feng, Sicheng Zhang, Guohao Dai, Ningyi Xu

Furthermore, we introduce a dynamic updating strategy for the energy cost model, reducing the need for on-device energy measurements and accelerating the search process.

MARCA: Mamba Accelerator with ReConfigurable Architecture

no code implementations16 Sep 2024 Jinhao Li, Shan Huang, Jiaming Xu, Jun Liu, Li Ding, Ningyi Xu, Guohao Dai

We propose intra-operation buffer management strategy to maximize input data sharing for linear operations within operations, and inter-operation strategy for element-wise operations between operations.

Mamba Management

Enhancing Vectorized Map Perception with Historical Rasterized Maps

1 code implementation1 Sep 2024 XiaoYu Zhang, Guangwei Liu, Zihao Liu, Ningyi Xu, Yunhui Liu, Ji Zhao

To fully exploit a historical map, we propose two novel modules to enhance BEV features and map element queries.

Autonomous Driving

Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

1 code implementation27 Feb 2024 Zihao Liu, XiaoYu Zhang, Guangwei Liu, Ji Zhao, Ningyi Xu

In autonomous driving, the high-definition (HD) map plays a crucial role in localization and planning.

Autonomous Driving

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation

2 code implementations16 Feb 2024 Dayou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, Ningyi Xu

The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges.

Knowledge Distillation Quantization

AFPQ: Asymmetric Floating Point Quantization for LLMs

1 code implementation3 Nov 2023 Yijia Zhang, Sicheng Zhang, Shijie Cao, Dayou Du, Jianyu Wei, Ting Cao, Ningyi Xu

Large language models (LLMs) show great performance in various tasks, but face deployment challenges from limited memory capacity and bandwidth.

Quantization

Large Trajectory Models are Scalable Motion Predictors and Planners

1 code implementation30 Oct 2023 Qiao Sun, Shiduo Zhang, Danjiao Ma, Jingzhe Shi, Derun Li, Simian Luo, Yu Wang, Ningyi Xu, Guangzhi Cao, Hang Zhao

STR reformulates the motion prediction and motion planning problems by arranging observations, states, and actions into one unified sequence modeling task.

Autonomous Driving Language Modeling +3

Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

no code implementations31 May 2023 Yijia Zhang, Yibo Han, Shijie Cao, Guohao Dai, Youshan Miao, Ting Cao, Fan Yang, Ningyi Xu

We find that previous gradient accumulation reduces activation memory but fails to be compatible with gradient memory reduction due to a contradiction between preserving gradients and releasing gradients.

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models

no code implementations21 May 2023 Yijia Zhang, Lingran Zhao, Shijie Cao, WenQiang Wang, Ting Cao, Fan Yang, Mao Yang, Shanghang Zhang, Ningyi Xu

In this study, we conduct a comparative analysis of INT and FP quantization with the same bit-width, revealing that the optimal quantization format varies across different layers due to the complexity and diversity of tensor distribution.

Quantization

Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units

no code implementations NeurIPS 2009 Feng Yan, Ningyi Xu, Yuan Qi

Extensive experiments showed that our parallel inference methods consistently produced LDA models with the same predictive power as sequential training methods did but with 26x speedup for CGS and 196x speedup for CVB on a GPU with 30 multiprocessors; actually the speedup is almost linearly scalable with the number of multiprocessors available.

Cannot find the paper you are looking for? You can Submit a new open access paper.