2 code implementations • 14 Jan 2025 • MiniMax, Aonian Li, Bangwei Gong, Bo Yang, Boji Shan, Chang Liu, Cheng Zhu, Chunhao Zhang, Congchao Guo, Da Chen, Dong Li, Enwei Jiao, Gengxin Li, Guojun Zhang, Haohai Sun, Houze Dong, Jiadai Zhu, Jiaqi Zhuang, Jiayuan Song, Jin Zhu, Jingtao Han, Jingyang Li, Junbin Xie, Junhao Xu, Junjie Yan, Kaishun Zhang, Kecheng Xiao, Kexi Kang, Le Han, Leyang Wang, Lianfei Yu, Liheng Feng, Lin Zheng, Linbo Chai, Long Xing, Meizhi Ju, Mingyuan Chi, Mozhi Zhang, Peikai Huang, Pengcheng Niu, Pengfei Li, Pengyu Zhao, Qi Yang, Qidi Xu, Qiexiang Wang, Qin Wang, Qiuhui Li, Ruitao Leng, Shengmin Shi, Shuqi Yu, Sichen Li, Songquan Zhu, Tao Huang, Tianrun Liang, Weigao Sun, Weixuan Sun, Weiyu Cheng, Wenkai Li, Xiangjun Song, Xiao Su, Xiaodong Han, Xinjie Zhang, Xinzhu Hou, Xu Min, Xun Zou, Xuyang Shen, Yan Gong, Yingjie Zhu, Yipeng Zhou, Yiran Zhong, Yongyi Hu, Yuanxiang Fan, Yue Yu, Yufeng Yang, Yuhao Li, Yunan Huang, Yunji Li, Yunpeng Huang, Yunzhi Xu, Yuxin Mao, Zehan Li, Zekang Li, Zewei Tao, Zewen Ying, Zhaoyang Cong, Zhen Qin, Zhenhua Fan, Zhihang Yu, Zhuo Jiang, Zijia Wu
This approach enables us to conduct efficient training and inference on models with hundreds of billions of parameters across contexts spanning millions of tokens.
1 code implementation • 23 Oct 2024 • Shansan Gong, Shivam Agarwal, Yizhe Zhang, Jiacheng Ye, Lin Zheng, Mukai Li, Chenxin An, Peilin Zhao, Wei Bi, Jiawei Han, Hao Peng, Lingpeng Kong
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models.
1 code implementation • 18 Oct 2024 • Jiacheng Ye, Jiahui Gao, Shansan Gong, Lin Zheng, Xin Jiang, Zhenguo Li, Lingpeng Kong
Our work highlights the potential of diffusion-based approaches in advancing AI capabilities for sophisticated language understanding and problem-solving tasks.
1 code implementation • 20 Aug 2024 • Xueliang Zhao, Lin Zheng, Haige Bo, Changran Hu, Urmish Thakker, Lingpeng Kong
This paper introduces SubgoalXL, a novel approach that synergizes subgoal-based proofs with expert learning to enhance LLMs' capabilities in formal theorem proving within the Isabelle environment.
Ranked #3 on
Automated Theorem Proving
on miniF2F-test
(using extra training data)
1 code implementation • 12 Feb 2024 • Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong
Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models.
no code implementations • 18 Dec 2023 • Jun Zhang, Shuyang Jiang, Jiangtao Feng, Lin Zheng, Lingpeng Kong
Given that orthogonal memory compresses global information, we further dissect the context to amplify fine-grained local information.
no code implementations • 30 Nov 2023 • Jing Wang, Xiaofeng Liu, Fangyun Wang, Lin Zheng, Fengqiao Gao, Hanwen Zhang, Xin Zhang, Wanqing Xie, Binbin Wang
Our video-based model can diagnose with an accuracy of 93. 9\% (binary classification), and 92. 1\% (3-class classification) in a collected 2D video testing set, which does not need key-frame selection and view annotation in testing.
1 code implementation • 29 Nov 2023 • Lin Zheng, Jianbo Yuan, Zhi Zhang, Hongxia Yang, Lingpeng Kong
This work introduces self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding.
1 code implementation • 14 Oct 2023 • Shuyang Jiang, Jun Zhang, Jiangtao Feng, Lin Zheng, Lingpeng Kong
Furthermore, we marry AMLP with popular NAR models, deriving a highly efficient NAR-AMLP architecture with linear time and space complexity.
no code implementations • 28 Sep 2023 • Hui Zheng, Zhong-Tao Chen, Hai-Teng Wang, Jian-Yang Zhou, Lin Zheng, Pei-Yang Lin, Yun-Zhe Liu
Understanding semantic content from brain activity during sleep represents a major goal in neuroscience.
no code implementations • 17 Apr 2023 • Shuyu Miao, Lin Zheng, Jingjing Liu, and Hong Jin
The label-free model evaluation aims to predict the model performance on various test sets without relying on ground truths.
1 code implementation • 24 Feb 2023 • Chang Ma, Haiteng Zhao, Lin Zheng, Jiayi Xin, Qintong Li, Lijun Wu, Zhihong Deng, Yang Lu, Qi Liu, Lingpeng Kong
RSA links query protein sequences to a set of sequences with similar structures or properties in the database and combines these sequences for downstream prediction.
1 code implementation • 11 Feb 2023 • Lin Zheng, Jianbo Yuan, Lei Yu, Lingpeng Kong
This work studies discrete diffusion probabilistic models with applications to natural language generation.
1 code implementation • 9 Feb 2023 • Lin Zheng, Jianbo Yuan, Chong Wang, Lingpeng Kong
Built upon previous progress of RFA, we characterize this gap through the lens of control variates and show that RFA can be decomposed into a sum of multiple control variate estimators for each element in the sequence.
1 code implementation • 14 Oct 2022 • Jun Zhang, Shuyang Jiang, Jiangtao Feng, Lin Zheng, Lingpeng Kong
In this paper, we propose Comprehensive Attention Benchmark (CAB) under a fine-grained attention taxonomy with four distinguishable attention patterns, namely, noncausal self, causal self, noncausal cross, and causal cross attentions.
no code implementations • 13 Jun 2022 • Shuyu Miao, Lin Zheng, Hong Jin
Image recapture seriously breaks the fairness of artificial intelligent (AI) systems, which deceives the system by recapturing others' images.
no code implementations • 16 May 2022 • Naicheng Guo, Xiaolei Liu, Shaoshuai Li, Qiongxu Ma, Kaixin Gao, Bing Han, Lin Zheng, Xiaobo Guo
In this paper, we propose a Poincar\'{e}-based heterogeneous graph neural network named PHGR to model the sequential pattern information as well as hierarchical information contained in the data of SR scenarios simultaneously.
1 code implementation • 10 Apr 2022 • Lin Zheng, Chong Wang, Lingpeng Kong
By combining the expressiveness in RA and the efficiency in RFA, we develop a novel linear complexity self-attention mechanism called linear randomized attention (LARA).
no code implementations • 6 Oct 2021 • Lin Zheng, Huijie Pan, Lingpeng Kong
Transformer architectures are now central to sequence modeling tasks.
no code implementations • 6 Jul 2021 • Naicheng Guo, Xiaolei Liu, Shaoshuai Li, Qiongxu Ma, Yunan Zhao, Bing Han, Lin Zheng, Kaixin Gao, Xiaobo Guo
Session-based recommendation (SBR) learns users' preferences by capturing the short-term and sequential patterns from the evolution of user behaviors.
1 code implementation • ACL 2021 • Lin Zheng, Zhiyong Wu, Lingpeng Kong
Transformers have advanced the field of natural language processing (NLP) on a variety of important tasks.
no code implementations • ACL 2020 • Lin Zheng, Qinliang Su, Dinghan Shen, Changyou Chen
Generative semantic hashing is a promising technique for large-scale information retrieval thanks to its fast retrieval speed and small memory footprint.