Search Results for author: Zhexi Zhang

Found 5 papers, 2 papers with code

Understanding Parameter Sharing in Transformers

no code implementations • 15 Jun 2023 • Ye Lin, Mingxuan Wang, Zhexi Zhang, Xiaohui Wang, Tong Xiao, Jingbo Zhu

Inspired by this, we tune the training hyperparameters related to model convergence in a targeted manner.

Paper
Add Code

MobileNMT: Enabling Translation in 15MB and 30ms

1 code implementation • 7 Jun 2023 • Ye Lin, Xiaohui Wang, Zhexi Zhang, Mingxuan Wang, Tong Xiao, Jingbo Zhu

With the co-design of model and engine, compared with the existing system, we speed up 47. 0x and save 99. 5% of memory with only 11. 6% loss of BLEU.

Model Compression NMT +2

Paper
Code

PARAGEN : A Parallel Generation Toolkit

1 code implementation • 7 Oct 2022 • Jiangtao Feng, Yi Zhou, Jun Zhang, Xian Qian, Liwei Wu, Zhexi Zhang, Yanming Liu, Mingxuan Wang, Lei LI, Hao Zhou

PARAGEN is a PyTorch-based NLP toolkit for further development on parallel generation.

Model Selection

183

Paper
Code

ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and Gradient Accumulation

no code implementations • ICCV 2023 • Xiaoxing Wang, Xiangxiang Chu, Yuda Fan, Zhexi Zhang, Bo Zhang, Xiaokang Yang, Junchi Yan

Albeit being a prevalent architecture searching approach, differentiable architecture search (DARTS) is largely hindered by its substantial memory cost since the entire supernet resides in the memory.

Disentanglement Neural Architecture Search

Paper
Add Code

Pingan Smart Health and SJTU at COIN - Shared Task: utilizing Pre-trained Language Models and Common-sense Knowledge in Machine Reading Tasks

no code implementations • WS 2019 • Xiepeng Li, Zhexi Zhang, Wei Zhu, Zheng Li, Yuan Ni, Peng Gao, Junchi Yan, Guotong Xie

We have experimented both (a) improving the fine-tuning of pre-trained language models on a task with a small dataset size, by leveraging datasets of similar tasks; and (b) incorporating the distributional representations of a KG onto the representations of pre-trained language models, via simply concatenation or multi-head attention.

Ranked #17 on Common Sense Reasoning on ReCoRD

Common Sense Reasoning Machine Reading Comprehension +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.