Search Results for author: Xingcheng Zhang

Found 22 papers, 14 papers with code

Towards Efficient Pre-training: Exploring FP4 Precision in Large Language Models

no code implementations17 Feb 2025 Jiecheng Zhou, Ding Tang, Rong Fu, Boni Hu, Haoran Xu, Yi Wang, Zhilin Pei, Zhongling Su, Liang Liu, Xingcheng Zhang, Weiming Zhang

The burgeoning computational demands for training large language models (LLMs) necessitate efficient methods, including quantized training, which leverages low-bit arithmetic operations to reduce costs.

Quantization

Fisheye-GS: Lightweight and Extensible Gaussian Splatting Module for Fisheye Cameras

1 code implementation7 Sep 2024 Zimu Liao, Siyan Chen, Rong Fu, Yi Wang, Zhongling Su, Hao Luo, Li Ma, Linning Xu, Bo Dai, Hengjie Li, Zhilin Pei, Xingcheng Zhang

However, adapting 3DGS to different camera models, particularly fisheye lenses, poses challenges due to the unique 3D to 2D projection calculation.

3DGS

PSE-Net: Channel Pruning for Convolutional Neural Networks with Parallel-subnets Estimator

no code implementations29 Aug 2024 Shiguang Wang, Tao Xie, Haijun Liu, Xingcheng Zhang, Jian Cheng

Channel Pruning is one of the most widespread techniques used to compress deep neural networks while maintaining their performances.

Neural Architecture Search

FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering

1 code implementation15 Aug 2024 Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Zhilin Pei, Hengjie Li, Xingcheng Zhang, Bo Dai

This work introduces FlashGS, an open-source CUDA Python library, designed to facilitate the efficient differentiable rasterization of 3D Gaussian Splatting through algorithmic and kernel-level optimizations.

Computational Efficiency Scheduling

PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training

no code implementations7 Aug 2024 Haoran Xu, Ziqian Liu, Rong Fu, Zhongling Su, Zerui Wang, Zheng Cai, Zhilin Pei, Xingcheng Zhang

With the evolution of large language models, traditional Transformer models become computationally demanding for lengthy sequences due to the quadratic growth in computation with respect to the sequence length.

Mamba State Space Models

OriGen:Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection

1 code implementation23 Jul 2024 Fan Cui, Chenyang Yin, Kexing Zhou, Youwei Xiao, Guangyu Sun, Qiang Xu, Qipeng Guo, Demin Song, Dahua Lin, Xingcheng Zhang, Yun, Liang

While open-source LLMs offer solutions to these concerns, they typically underperform commercial models in RTL code generation tasks, primarily due to the scarcity of high-quality open-source RTL datasets.

Code Generation Knowledge Distillation

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

no code implementations17 Jun 2024 Qianchao Zhu, Jiangfei Duan, Chang Chen, Siran Liu, Xiuhong Li, Guanyu Feng, Xin Lv, Huanqi Cao, Xiao Chuanfu, Xingcheng Zhang, Dahua Lin, Chao Yang

Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency.

SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models

no code implementations10 May 2024 Haojie Duanmu, Zhihang Yuan, Xiuhong Li, Jiangfei Duan, Xingcheng Zhang, Dahua Lin

Large language models (LLMs) can now handle longer sequences of tokens, enabling complex tasks like book understanding and generating lengthy novels.

Quantization

InternLM2 Technical Report

3 code implementations26 Mar 2024 Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang, Penglong Jiao, Zhenjiang Jin, Zhikai Lei, Jiaxing Li, Jingwen Li, Linyang Li, Shuaibin Li, Wei Li, Yining Li, Hongwei Liu, Jiangning Liu, Jiawei Hong, Kaiwen Liu, Kuikun Liu, Xiaoran Liu, Chengqi Lv, Haijun Lv, Kai Lv, Li Ma, Runyuan Ma, Zerun Ma, Wenchang Ning, Linke Ouyang, Jiantao Qiu, Yuan Qu, FuKai Shang, Yunfan Shao, Demin Song, Zifan Song, Zhihao Sui, Peng Sun, Yu Sun, Huanze Tang, Bin Wang, Guoteng Wang, Jiaqi Wang, Jiayu Wang, Rui Wang, Yudong Wang, Ziyi Wang, Xingjian Wei, Qizhen Weng, Fan Wu, Yingtong Xiong, Chao Xu, Ruiliang Xu, Hang Yan, Yirong Yan, Xiaogui Yang, Haochen Ye, Huaiyuan Ying, JIA YU, Jing Yu, Yuhang Zang, Chuyu Zhang, Li Zhang, Pan Zhang, Peng Zhang, Ruijie Zhang, Shuo Zhang, Songyang Zhang, Wenjian Zhang, Wenwei Zhang, Xingcheng Zhang, Xinyue Zhang, Hui Zhao, Qian Zhao, Xiaomeng Zhao, Fengzhe Zhou, Zaida Zhou, Jingming Zhuo, Yicheng Zou, Xipeng Qiu, Yu Qiao, Dahua Lin

The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI).

4k Long-Context Understanding

Poly-PC: A Polyhedral Network for Multiple Point Cloud Tasks at Once

no code implementations CVPR 2023 Tao Xie, Shiguang Wang, Ke Wang, Linqi Yang, Zhiqiang Jiang, Xingcheng Zhang, Kun Dai, Ruifeng Li, Jian Cheng

In this work, we show that it is feasible to perform multiple tasks concurrently on point cloud with a straightforward yet effective multi-task network.

Incremental Learning Multi-Task Learning

MDL-NAS: A Joint Multi-Domain Learning Framework for Vision Transformer

no code implementations CVPR 2023 Shiguang Wang, Tao Xie, Jian Cheng, Xingcheng Zhang, Haijun Liu

Technically, MDL-NAS constructs a coarse-to-fine search space, where the coarse search space offers various optimal architectures for different tasks while the fine search space provides fine-grained parameter sharing to tackle the inherent obstacles of multi-domain learning.

Image Classification Incremental Learning

DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation

1 code implementation30 Mar 2022 Yu Tang, Chenyu Wang, Yufan Zhang, Yuliang Liu, Xingcheng Zhang, Linbo Qiao, Zhiquan Lai, Dongsheng Li

To the best of our knowledge, we are the first to make a reasonable dynamic runtime scheduler on the combination of tensor swapping and tensor recomputation without user oversight.

Optimizing Video Object Detection via a Scale-Time Lattice

1 code implementation CVPR 2018 Kai Chen, Jiaqi Wang, Shuo Yang, Xingcheng Zhang, Yuanjun Xiong, Chen Change Loy, Dahua Lin

High-performance object detection relies on expensive convolutional networks to compute features, often leading to significant challenges in applications, e. g. those that require detecting objects from video streams in real time.

Object object-detection +1

Accelerated Training for Massive Classification via Dynamic Class Selection

no code implementations5 Jan 2018 Xingcheng Zhang, Lei Yang, Junjie Yan, Dahua Lin

Massive classification, a classification task defined over a vast number of classes (hundreds of thousands or even millions), has become an essential part of many real-world systems, such as face recognition.

Classification Face Recognition +1

PolyNet: A Pursuit of Structural Diversity in Very Deep Networks

3 code implementations CVPR 2017 Xingcheng Zhang, Zhizhong Li, Chen Change Loy, Dahua Lin

A number of studies have shown that increasing the depth or width of convolutional networks is a rewarding approach to improve the performance of image recognition.

Diversity Image Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.