1 code implementation • 3 Apr 2024 • Fanxu Meng, Zhaohui Wang, Muhan Zhang
However, LoRA approximates Delta W through the product of two matrices, A, initialized with Gaussian noise, and B, initialized with zeros, while PiSSA initializes A and B with principal singular values and vectors of the original matrix W. PiSSA can better approximate the outcomes of full-parameter fine-tuning at the beginning by changing the essential parts while freezing the "noisy" parts.
1 code implementation • 9 Nov 2023 • Fanxu Meng, Haotong Yang, Yiding Wang, Muhan Zhang
The human brain is naturally equipped to comprehend and interpret visual information rapidly.
no code implementations • 9 Oct 2023 • Haotong Yang, Fanxu Meng, Zhouchen Lin, Muhan Zhang
Furthermore, by generalizing this structure to the hierarchical case, we demonstrate that models can achieve task composition, further reducing the space needed to learn from linear to logarithmic, thereby effectively learning on complex reasoning involving multiple steps.
1 code implementation • 24 May 2023 • Xiaojuan Tang, Zilong Zheng, Jiaqi Li, Fanxu Meng, Song-Chun Zhu, Yitao Liang, Muhan Zhang
On the whole, our analysis provides a novel perspective on the role of semantics in developing and evaluating language models' reasoning abilities.
1 code implementation • 1 Nov 2021 • Fanxu Meng, Hao Cheng, Jiaxin Zhuang, Ke Li, Xing Sun
In this paper, we aim to remedy this problem and propose to remove the residual connection in a vanilla ResNet equivalently by a reserving and merging (RM) operation on ResBlock.
no code implementations • 19 Jan 2021 • Huixiang Luo, Hao Cheng, Fanxu Meng, Yuting Gao, Ke Li, Mengdan Zhang, Xing Sun
Pseudo-labeling (PL) and Data Augmentation-based Consistency Training (DACT) are two approaches widely used in Semi-Supervised Learning (SSL) methods.
1 code implementation • NeurIPS 2020 • Fanxu Meng, Hao Cheng, Ke Li, Huixiang Luo, Xiaowei Guo, Guangming Lu, Xing Sun
Through extensive experiments, we demonstrate that SWP is more effective compared to the previous FP-based methods and achieves the state-of-art pruning ratio on CIFAR-10 and ImageNet datasets without obvious accuracy drop.
1 code implementation • 26 Apr 2020 • Hao Cheng, Fanxu Meng, Ke Li, Yuting Gao, Guangming Lu, Xing Sun, Rongrong Ji
To gain a universal improvement on both valid and invalid filters, we compensate grafting with distillation (\textbf{Cultivation}) to overcome the drawback of grafting .
2 code implementations • CVPR 2020 • Fanxu Meng, Hao Cheng, Ke Li, Zhixin Xu, Rongrong Ji, Xing Sun, Gaungming Lu
To better perform the grafting process, we develop an entropy-based criterion to measure the information of filters and an adaptive weighting strategy for balancing the grafted information among networks.