Search Results for author: Ka Man Lo

Found 4 papers, 3 papers with code

A Closer Look into Mixture-of-Experts in Large Language Models

1 code implementation26 Jun 2024 Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, Jie Fu

Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance, especially for language tasks.

Computational Efficiency Diversity

m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers

1 code implementation26 Feb 2024 Ka Man Lo, Yiming Liang, Wenyu Du, Yuantao Fan, Zili Wang, Wenhao Huang, Lei Ma, Jie Fu

Additionally, the V-MoE-Base model trained with m2mKD achieves 3. 5% higher accuracy than end-to-end training on ImageNet-1k.

Knowledge Distillation

Cannot find the paper you are looking for? You can Submit a new open access paper.