no code implementations • 9 Jan 2025 • Ziwei He, Jian Yuan, Haoli Bai, Jingwen Leng, Bo Jiang
Efficient key-value (KV) cache compression is critical for scaling transformer-based Large Language Models (LLMs) in long sequences and resource-limited settings.
1 code implementation • 12 Oct 2024 • Yuxuan Sun, Ruikang Liu, Haoli Bai, Han Bao, Kang Zhao, Yuening Li, Jiaxin Hu, Xianzhi Yu, Lu Hou, Chun Yuan, Xin Jiang, Wulong Liu, Jun Yao
In this paper, we propose FlatQuant (Fast and Learnable Affine Transformation), a new post-training quantization approach to enhance flatness of weights and activations.
no code implementations • 26 Sep 2024 • Kai Chen, Yunhao Gou, Runhui Huang, Zhili Liu, Daxin Tan, Jing Xu, Chunwei Wang, Yi Zhu, Yihan Zeng, Kuo Yang, Dingdong Wang, Kun Xiang, Haoyuan Li, Haoli Bai, Jianhua Han, Xiaohui Li, Weike Jin, Nian Xie, Yu Zhang, James T. Kwok, Hengshuang Zhao, Xiaodan Liang, Dit-yan Yeung, Xiao Chen, Zhenguo Li, Wei zhang, Qun Liu, Jun Yao, Lanqing Hong, Lu Hou, Hang Xu
GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models.
no code implementations • 2 Jul 2024 • Parsa Kavehzadeh, Mohammadreza Pourreza, Mojtaba Valipour, Tinashu Zhu, Haoli Bai, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh
In SD, usually, one draft model is used to serve a specific target model; however, in practice, LLMs are diverse, and we might need to deal with many target models or more than one target model simultaneously.
1 code implementation • 25 Mar 2024 • Zhiming Mao, Haoli Bai, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu, Kam-Fai Wong
Prior study shows that pre-training techniques can boost the performance of visual document understanding (VDU), which typically requires models to gain abilities to perceive and reason both document texts and layouts (e. g., locations of texts and table-cells).
no code implementations • CVPR 2024 • Haokun Lin, Haoli Bai, Zhili Liu, Lu Hou, Muyi Sun, Linqi Song, Ying WEI, Zhenan Sun
We find that directly using smaller pre-trained models and applying magnitude-based pruning on CLIP models leads to inflexibility and inferior performance.
2 code implementations • 2 Mar 2024 • Ruikang Liu, Haoli Bai, Haokun Lin, Yuening Li, Han Gao, Zhengzhuo Xu, Lu Hou, Jun Yao, Chun Yuan
Such outliers are found to allocate most of the attention scores on initial tokens of input, termed as pivot tokens, which are crucial to the performance of quantized LLMs.
1 code implementation • 19 Dec 2022 • Haoli Bai, Zhiguang Liu, Xiaojun Meng, Wentao Li, Shuang Liu, Nian Xie, Rongfu Zheng, Liangwei Wang, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu
While various vision-language pre-training objectives are studied in existing solutions, the document textline, as an intrinsic granularity in VDU, has seldom been explored so far.
no code implementations • 18 Nov 2021 • Haoli Bai, Hongda Mao, Dinesh Nair
In this paper, we seek to design a lightweight SegFormer for efficient semantic segmentation.
no code implementations • 30 Sep 2021 • Haoli Bai, Lu Hou, Lifeng Shang, Xin Jiang, Irwin King, Michael R. Lyu
Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.
1 code implementation • 16 Jun 2021 • Xianghong Fang, Haoli Bai, Jian Li, Zenglin Xu, Michael Lyu, Irwin King
We further design discrete latent space for the variational attention and mathematically show that our model is free from posterior collapse.
1 code implementation • ACL 2021 • Haoli Bai, Wei zhang, Lu Hou, Lifeng Shang, Jing Jin, Xin Jiang, Qun Liu, Michael Lyu, Irwin King
In this paper, we propose BinaryBERT, which pushes BERT quantization to the limit by weight binarization.
1 code implementation • NeurIPS 2020 • Jiaxing Wang, Haoli Bai, Jiaxiang Wu, Xupeng Shi, Junzhou Huang, Irwin King, Michael Lyu, Jian Cheng
Nevertheless, it is unclear how parameter sharing affects the searching process.
no code implementations • 21 Apr 2020 • Xianghong Fang, Haoli Bai, Zenglin Xu, Michael Lyu, Irwin King
Variational autoencoders have been widely applied for natural language generation, however, there are two long-standing problems: information under-representation and posterior collapse.
no code implementations • 17 Mar 2020 • Yuhang Li, Wei Wang, Haoli Bai, Ruihao Gong, Xin Dong, Fengwei Yu
Network quantization has rapidly become one of the most widely used methods to compress and accelerate deep neural networks.
no code implementations • 4 Dec 2019 • Yuhang Li, Xin Dong, Sai Qian Zhang, Haoli Bai, Yuanpeng Chen, Wei Wang
We first bring up three omitted issues in extremely low-bit networks: the squashing range of quantized values; the gradient vanishing during backpropagation and the unexploited hardware acceleration of ternary networks.
1 code implementation • 21 Nov 2019 • Haoli Bai, Jiaxiang Wu, Irwin King, Michael Lyu
The core challenge of few shot network compression lies in high estimation errors from the original network during inference, since the compressed network can easily over-fits on the few training instances.
no code implementations • 17 Jun 2019 • Liangjian Wen, Xuanyang Zhang, Haoli Bai, Zenglin Xu
Recurrent neural networks (RNNs) have recently achieved remarkable successes in a number of applications.
no code implementations • 30 Dec 2018 • Xianghong Fang, Haoli Bai, Ziyi Guo, Bin Shen, Steven Hoi, Zenglin Xu
In this paper, we propose a new unsupervised domain adaptation method named Domain-Adversarial Residual-Transfer (DART) learning of Deep Neural Networks to tackle cross-domain image classification tasks.
1 code implementation • NIPS Workshop CDNNRIA 2018 • Jiaxiang Wu, Yao Zhang, Haoli Bai, Huasong Zhong, Jinlong Hou, Wei Liu, Wenbing Huang, Junzhou Huang
Deep neural networks are widely used in various domains, but the prohibitive computational complexity prevents their deployment on mobile devices.
no code implementations • 24 May 2017 • Hao Liu, Haoli Bai, Lirong He, Zenglin Xu
Inheriting these advantages of stochastic neural sequential models, we propose a structured and stochastic sequential neural network, which models both the long-term dependencies via recurrent neural networks and the uncertainty in the segmentation and labels via discrete random variables.