no code implementations • 12 Jun 2025 • Mozhi Zhang, Howe Tissue, Lu Wang, Xipeng Qiu
\textsc{Domain2Vec} maintains a vocabulary of meta-domains and uses a classifier to decompose any given dataset into a domain vector that corresponds to a distribution over this vocabulary.
1 code implementation • 26 May 2025 • Junteng Liu, Yuanxiang Fan, Zhuo Jiang, Han Ding, Yongyi Hu, Chi Zhang, Yiqi Shi, Shitong Weng, Aili Chen, Shiqi Chen, Yunan Huang, Mozhi Zhang, Pengyu Zhao, Junjie Yan, Junxian He
In our experiments, we validate the effectiveness of RL training on the SynLogic dataset based on 7B and 32B models.
1 code implementation • 14 Jan 2025 • MiniMax, Aonian Li, Bangwei Gong, Bo Yang, Boji Shan, Chang Liu, Cheng Zhu, Chunhao Zhang, Congchao Guo, Da Chen, Dong Li, Enwei Jiao, Gengxin Li, Guojun Zhang, Haohai Sun, Houze Dong, Jiadai Zhu, Jiaqi Zhuang, Jiayuan Song, Jin Zhu, Jingtao Han, Jingyang Li, Junbin Xie, Junhao Xu, Junjie Yan, Kaishun Zhang, Kecheng Xiao, Kexi Kang, Le Han, Leyang Wang, Lianfei Yu, Liheng Feng, Lin Zheng, Linbo Chai, Long Xing, Meizhi Ju, Mingyuan Chi, Mozhi Zhang, Peikai Huang, Pengcheng Niu, Pengfei Li, Pengyu Zhao, Qi Yang, Qidi Xu, Qiexiang Wang, Qin Wang, Qiuhui Li, Ruitao Leng, Shengmin Shi, Shuqi Yu, Sichen Li, Songquan Zhu, Tao Huang, Tianrun Liang, Weigao Sun, Weixuan Sun, Weiyu Cheng, Wenkai Li, Xiangjun Song, Xiao Su, Xiaodong Han, Xinjie Zhang, Xinzhu Hou, Xu Min, Xun Zou, Xuyang Shen, Yan Gong, Yingjie Zhu, Yipeng Zhou, Yiran Zhong, Yongyi Hu, Yuanxiang Fan, Yue Yu, Yufeng Yang, Yuhao Li, Yunan Huang, Yunji Li, Yunpeng Huang, Yunzhi Xu, Yuxin Mao, Zehan Li, Zekang Li, Zewei Tao, Zewen Ying, Zhaoyang Cong, Zhen Qin, Zhenhua Fan, Zhihang Yu, Zhuo Jiang, Zijia Wu
This approach enables us to conduct efficient training and inference on models with hundreds of billions of parameters across contexts spanning millions of tokens.
1 code implementation • 11 Nov 2024 • Mianqiu Huang, Xiaoran Liu, Shaojun Zhou, Mozhi Zhang, Chenkun Tan, Pengyu Wang, Qipeng Guo, Zhe Xu, Linyang Li, Zhikai Lei, Linlin Li, Qun Liu, Yaqian Zhou, Xipeng Qiu, Xuanjing Huang
With the development of large language models (LLMs), the sequence length of these models continues to increase, drawing significant attention to long-context language models.
1 code implementation • 18 Oct 2024 • Mozhi Zhang, Pengyu Wang, Chenkun Tan, Mianqiu Huang, Dong Zhang, Yaqian Zhou, Xipeng Qiu
Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora, making them powerful tools for various applications.
no code implementations • 3 Apr 2024 • Mozhi Zhang, Mianqiu Huang, Rundong Shi, Linsen Guo, Chong Peng, Peng Yan, Yaqian Zhou, Xipeng Qiu
Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless.
no code implementations • 15 Nov 2023 • Kyle Seelman, Mozhi Zhang, Jordan Boyd-Graber
To facilitate user interaction with these neural topic models, we have developed an interactive interface.
3 code implementations • 5 Oct 2023 • Qinyuan Cheng, Tianxiang Sun, Wenwei Zhang, Siyin Wang, Xiangyang Liu, Mozhi Zhang, Junliang He, Mianqiu Huang, Zhangyue Yin, Kai Chen, Xipeng Qiu
We analyze the primary types of hallucinations in different types of models and their causes.
1 code implementation • 20 May 2023 • Mozhi Zhang, Hang Yan, Yaqian Zhou, Xipeng Qiu
We use prompts that contains entity category information to construct label prototypes, which enables our model to fine-tune with only the support set.
1 code implementation • ACL 2021 • Mozhi Zhang, Wei Wang, Budhaditya Deb, Guoqing Zheng, Milad Shokouhi, Ahmed Hassan Awadallah
Reply suggestion models help users process emails and chats faster.
no code implementations • 10 May 2021 • Keyulu Xu, Mozhi Zhang, Stefanie Jegelka, Kenji Kawaguchi
Our results show that the training of GNNs is implicitly accelerated by skip connections, more depth, and/or a good label distribution.
no code implementations • NeurIPS 2021 • Jingling Li, Mozhi Zhang, Keyulu Xu, John P. Dickerson, Jimmy Ba
Our framework measures a network's robustness via the predictive power in its representations -- the test performance of a linear model trained on the learned representations using a small set of clean labels.
3 code implementations • ICLR 2021 • Keyulu Xu, Mozhi Zhang, Jingling Li, Simon S. Du, Ken-ichi Kawarabayashi, Stefanie Jegelka
Second, in connection to analyzing the successes and limitations of GNNs, these results suggest a hypothesis for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e. g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features.
no code implementations • ACL 2020 • Mozhi Zhang, Yoshinari Fujinuma, Michael J. Paul, Jordan Boyd-Graber
Cross-lingual word embeddings (CLWE) are often evaluated on bilingual lexicon induction (BLI).
Bilingual Lexicon Induction
Cross-Lingual Word Embeddings
+2
1 code implementation • EMNLP 2020 • Michelle Yuan, Mozhi Zhang, Benjamin Van Durme, Leah Findlater, Jordan Boyd-Graber
Cross-lingual word embeddings transfer knowledge between languages: models trained on high-resource languages can predict in low-resource languages.
no code implementations • ACL 2019 • Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, Jordan Boyd-Graber
Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings.
1 code implementation • 4 Jun 2019 • Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi, Stefanie Jegelka, Jordan Boyd-Graber
Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings.
1 code implementation • ICLR 2020 • Keyulu Xu, Jingling Li, Mozhi Zhang, Simon S. Du, Ken-ichi Kawarabayashi, Stefanie Jegelka
Neural networks have succeeded in many reasoning tasks.
no code implementations • 22 Dec 2018 • Mozhi Zhang, Yoshinari Fujinuma, Jordan Boyd-Graber
Text classification must sometimes be applied in a low-resource language with no labeled training data.
Cross-Lingual Document Classification
Document Classification
+3