Search Results for author: Jinghan Yao

Found 4 papers, 3 papers with code

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

1 code implementation • 16 Jan 2024 • Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

Unlike previous methods, our solution can be directly applied to pre-trained MoE models without any fine-tuning or accuracy degradation.

Paper
Code

Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference

1 code implementation • 22 May 2023 • Jinghan Yao, Nawras Alnaasan, Tian Chen, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

Inference on these models, by design, harnesses a temporal dependency, where the current token's probability distribution is conditioned on preceding tokens.

Computational Efficiency

Paper
Code

SOFT: Softmax-free Transformer with Linear Complexity

2 code implementations • NeurIPS 2021 • Jiachen Lu, Jinghan Yao, Junge Zhang, Xiatian Zhu, Hang Xu, Weiguo Gao, Chunjing Xu, Tao Xiang, Li Zhang

Crucially, with a linear complexity, much longer token sequences are permitted in SOFT, resulting in superior trade-off between accuracy and complexity.

Computational Efficiency

292

Paper
Code

Single Pixel Reconstruction for One-stage Instance Segmentation

no code implementations • 16 Apr 2019 • Jun Yu, Jinghan Yao, Jian Zhang, Zhou Yu, DaCheng Tao

In this paper, we propose a one-stage framework, SPRNet, which performs efficient instance segmentation by introducing a single pixel reconstruction (SPR) branch to off-the-shelf one-stage detectors.

Instance Segmentation Region Proposal +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.