no code implementations • 15 Jun 2023 • Ye Lin, Mingxuan Wang, Zhexi Zhang, Xiaohui Wang, Tong Xiao, Jingbo Zhu
Inspired by this, we tune the training hyperparameters related to model convergence in a targeted manner.
1 code implementation • 7 Jun 2023 • Ye Lin, Xiaohui Wang, Zhexi Zhang, Mingxuan Wang, Tong Xiao, Jingbo Zhu
With the co-design of model and engine, compared with the existing system, we speed up 47. 0x and save 99. 5% of memory with only 11. 6% loss of BLEU.
1 code implementation • 7 Oct 2022 • Jiangtao Feng, Yi Zhou, Jun Zhang, Xian Qian, Liwei Wu, Zhexi Zhang, Yanming Liu, Mingxuan Wang, Lei LI, Hao Zhou
PARAGEN is a PyTorch-based NLP toolkit for further development on parallel generation.
no code implementations • ICCV 2023 • Xiaoxing Wang, Xiangxiang Chu, Yuda Fan, Zhexi Zhang, Bo Zhang, Xiaokang Yang, Junchi Yan
Albeit being a prevalent architecture searching approach, differentiable architecture search (DARTS) is largely hindered by its substantial memory cost since the entire supernet resides in the memory.
no code implementations • WS 2019 • Xiepeng Li, Zhexi Zhang, Wei Zhu, Zheng Li, Yuan Ni, Peng Gao, Junchi Yan, Guotong Xie
We have experimented both (a) improving the fine-tuning of pre-trained language models on a task with a small dataset size, by leveraging datasets of similar tasks; and (b) incorporating the distributional representations of a KG onto the representations of pre-trained language models, via simply concatenation or multi-head attention.
Ranked #17 on Common Sense Reasoning on ReCoRD