Search Results for author: Jinghan Yao

Found 4 papers, 3 papers with code

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

1 code implementation16 Jan 2024 Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

Unlike previous methods, our solution can be directly applied to pre-trained MoE models without any fine-tuning or accuracy degradation.

Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference

1 code implementation22 May 2023 Jinghan Yao, Nawras Alnaasan, Tian Chen, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

Inference on these models, by design, harnesses a temporal dependency, where the current token's probability distribution is conditioned on preceding tokens.

Computational Efficiency

SOFT: Softmax-free Transformer with Linear Complexity

2 code implementations NeurIPS 2021 Jiachen Lu, Jinghan Yao, Junge Zhang, Xiatian Zhu, Hang Xu, Weiguo Gao, Chunjing Xu, Tao Xiang, Li Zhang

Crucially, with a linear complexity, much longer token sequences are permitted in SOFT, resulting in superior trade-off between accuracy and complexity.

Computational Efficiency

Single Pixel Reconstruction for One-stage Instance Segmentation

no code implementations16 Apr 2019 Jun Yu, Jinghan Yao, Jian Zhang, Zhou Yu, DaCheng Tao

In this paper, we propose a one-stage framework, SPRNet, which performs efficient instance segmentation by introducing a single pixel reconstruction (SPR) branch to off-the-shelf one-stage detectors.

Instance Segmentation Region Proposal +2

Cannot find the paper you are looking for? You can Submit a new open access paper.