1 code implementation • 8 Oct 2024 • Wei Huang, Yue Liao, Jianhui Liu, Ruifei He, Haoru Tan, Shiming Zhang, Hongsheng Li, Si Liu, Xiaojuan Qi
Our MC-MoE integrates static quantization and dynamic pruning to collaboratively achieve extreme compression for MoE-LLMs with less accuracy loss, ensuring an optimal trade-off between performance and efficiency.
no code implementations • 26 May 2024 • Haoru Tan, Chuang Wang, Xu-Yao Zhang, Cheng-Lin Liu
The whole algorithm can be considered as a differentiable map from the graph affinity matrix to the prediction of node correspondence.
no code implementations • 11 Mar 2024 • Haoru Tan, Chuang Wang, Sitong Wu, Xu-Yao Zhang, Fei Yin, Cheng-Lin Liu
In this paper, we propose a graph neural network (GNN) based approach to combine the advantages of data-driven and traditional methods.
no code implementations • 22 Feb 2024 • Ruifei He, Chuhui Xue, Haoru Tan, Wenqing Zhang, Yingchen Yu, Song Bai, Xiaojuan Qi
Despite its simplicity, we show that IDA shows efficiency and fast convergence in resolving the social bias in TTI diffusion models.
no code implementations • CVPR 2024 • Sitong Wu, Haoru Tan, Zhuotao Tian, Yukang Chen, Xiaojuan Qi, Jiaya Jia
We discover that the lack of consideration for sample-wise affinity consistency across modalities in existing training objectives is the central cause.
no code implementations • NeurIPS 2022 • Haoru Tan, Sitong Wu, Jimin Pi
We then propose a novel learnable approach called semantic diffusion network (SDN) to approximate the diffusion process, which contains a parameterized semantic difference convolution operator followed by a feature fusion module.
no code implementations • 10 Dec 2022 • Hai Wu, Ruifei He, Haoru Tan, Xiaojuan Qi, Kaibin Huang
Experiments show that the proposed vertical-layered representation and developed once QAT scheme are effective in embodying multiple quantized networks into a single one and allow one-time training, and it delivers comparable performance as that of quantized models tailored to any specific bit-width.
2 code implementations • 28 Dec 2021 • Sitong Wu, Tianyi Wu, Haoru Tan, Guodong Guo
To reduce the quadratic computation complexity caused by the global self-attention, various methods constrain the range of attention within a local region to improve its efficiency.
no code implementations • AAAI 2021 • Haoru Tan, Chuang Wang, Sitong Wu, Tie-Qiang Wang, Xu-Yao Zhang, Cheng-Lin Liu
It consists of three parts: a graph neural network to generate a high-level local feature, an attention-based module to normalize the rotational transform, and a global feature matching module based on proximal optimization.