no code implementations • 17 Feb 2025 • Jiachen Yu, Shaoning Sun, Xiaohui Hu, Jiaxu Yan, Kaidong Yu, Xuelong Li
Furthermore, our training method enhances the general capabilities of the model by constructing complicated judge task, and the judge signals provided by our model have significantly enhanced the downstream DPO training performance of our internal models in our test to optimize policy model with Judge Model.
no code implementations • 2 Dec 2024 • Alan Wake, Bei Chen, C. X. Lv, Chao Li, Chengen Huang, Chenglin Cai, Chujie Zheng, Daniel Cooper, Fan Zhou, Feng Hu, Ge Zhang, Guoyin Wang, Heng Ji, Howard Qiu, Jiangcheng Zhu, Jun Tian, Katherine Su, Lihuan Zhang, Liying Li, Ming Song, Mou Li, Peng Liu, Qicheng Hu, Shawn Wang, Shijun Zhou, Shiming Yang, Shiyong Li, Tianhang Zhu, Wen Xie, Wenhao Huang, Xiang He, Xiaobo Chen, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Yanpeng Li, Yongke Zhao, Yongzhen Luo, Yuchi Xu, Yuxuan Sha, Zhaodong Yan, Zhiyuan Liu, Zirui Zhang, Zonghong Dai
This technical report presents Yi-Lightning, our latest flagship large language model (LLM).
1 code implementation • 7 Mar 2024 • 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Guoyin Wang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie, Yanpeng Li, Yuchi Xu, Yudong Liu, Yue Wang, Yuxuan Cai, Zhenyu Gu, Zhiyuan Liu, Zonghong Dai
The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models.
Ranked #1 on
Chatbot
on AlpacaEval
1 code implementation • COLING 2022 • Yifan Jin, Jiangmeng Li, Zheng Lian, Chengbo Jiao, Xiaohui Hu
However, the quality of the 1-best dependency tree for medical texts produced by an out-of-domain parser is relatively limited so that the performance of medical relation extraction method may degenerate.
1 code implementation • 26 Aug 2022 • Jiangmeng Li, Yanan Zhang, Wenwen Qiang, Lingyu Si, Chengbo Jiao, Xiaohui Hu, Changwen Zheng, Fuchun Sun
To understand the reasons behind this phenomenon, we revisit the learning paradigm of knowledge distillation on the few-shot object detection task from the causal theoretic standpoint, and accordingly, develop a Structural Causal Model.
no code implementations • 19 Jun 2022 • Pengfei Zhang, Xiaohui Hu, Kaidong Yu, Jian Wang, Song Han, Cao Liu, Chunyang Yuan
Firstly, we build an evaluation metric composed of 5 groups of parallel sub-metrics called Multi-Metric Evaluation (MME) to evaluate the quality of dialogue comprehensively.
1 code implementation • 1 Dec 2021 • Kai Zhang, Yifan Sun, Rui Wang, Haichang Li, Xiaohui Hu
MFA basically considers three parallel information fusion strategies, i. e., the cross-model fusion, temporal fusion and a novel online-offline pseudo label fusion.
no code implementations • 17 Sep 2021 • Zheng Lian, Yanan Zhang, Haichang Li, Rui Wang, Xiaohui Hu
The conventional encoder-decoder framework for image captioning generally adopts a single-pass decoding process, which predicts the target descriptive sentence word by word in temporal order.
no code implementations • 19 Aug 2021 • Pan Xie, Zexian Li, Xiaohui Hu
Conditional masked language models (CMLM) have shown impressive progress in non-autoregressive machine translation (NAT).
no code implementations • 27 Jul 2021 • Pan Xie, Mengyi Zhao, Xiaohui Hu
Since the superiority of Transformer in learning long-term dependency, the sign language Transformer model achieves remarkable progress in Sign Language Recognition (SLR) and Translation (SLT).
no code implementations • 27 Jul 2021 • Pan Xie, Zhi Cui, Yao Du, Mengyi Zhao, Jianwei Cui, Bin Wang, Xiaohui Hu
Continuous sign language recognition (cSLR) is a public significant task that transcribes a sign language video into an ordered gloss sequence.
1 code implementation • COLING 2020 • Pan Xie, Zhi Cui, Xiuyin Chen, Xiaohui Hu, Jianwei Cui, Bin Wang
Concretely, we insert a left-to-right mask to the same decoder of CMTM, and then induce it to autoregressively review whether each generated word from CMTM is supposed to be replaced or kept.