Search Results for author: Tengxiao Xi

Found 1 papers, 1 papers with code

ChineseWebText: Large-scale High-quality Chinese Web Text Extracted with Effective Evaluation Model

1 code implementation2 Nov 2023 Jianghao Chen, Pu Jian, Tengxiao Xi, Dongyi Yi, Qianlong Du, Chenglin Ding, Guibo Zhu, Chengqing Zong, Jinqiao Wang, Jiajun Zhang

Using our proposed approach, we release the largest and latest large-scale high-quality Chinese web text ChineseWebText, which consists of 1. 42 TB and each text is associated with a quality score, facilitating the LLM researchers to choose the data according to the desired quality thresholds.

Cannot find the paper you are looking for? You can Submit a new open access paper.