Search Results for author: Yizhe Xiong

Found 9 papers, 5 papers with code

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

1 code implementation16 Dec 2024 Liang Chen, Zekun Wang, Shuhuai Ren, Lei LI, Haozhe Zhao, Yunshui Li, Zefan Cai, Hongcheng Guo, Lei Zhang, Yizhe Xiong, Yichi Zhang, Ruoyu Wu, Qingxiu Dong, Ge Zhang, Jian Yang, Lingwei Meng, Shujie Hu, Yulong Chen, Junyang Lin, Shuai Bai, Andreas Vlachos, Xu Tan, Minjia Zhang, Wen Xiao, Aaron Yee, Tianyu Liu, Baobao Chang

As Large Language Models (LLMs) have advanced to unify understanding and generation tasks within the textual modality, recent research has shown that tasks from different modalities can also be effectively encapsulated within the NTP framework, transforming the multimodal information into tokens and predict the next one given the context.

Language Modeling Language Modelling +2

Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models

no code implementations10 Dec 2024 Haoran Lian, Junmin Chen, Wei Huang, Yizhe Xiong, Wenping Hu, Guiguang Ding, Hui Chen, Jianwei Niu, Zijia Lin, Fuzheng Zhang, Di Zhang

In this paper, we introduce a novel single-stage continual pretraining method, Head-Adaptive Rotary Position Encoding (HARPE), to equip LLMs with long context modeling capabilities while simplifying the training process.

Continual Pretraining Language Modeling +2

LBPE: Long-token-first Tokenization to Improve Large Language Models

no code implementations8 Nov 2024 Haoran Lian, Yizhe Xiong, Zijia Lin, Jianwei Niu, Shasha Mo, Hui Chen, Peng Liu, Guiguang Ding

The prevalent use of Byte Pair Encoding (BPE) in Large Language Models (LLMs) facilitates robust handling of subword units and avoids issues of out-of-vocabulary words.

Language Modeling Language Modelling

CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts

1 code implementation21 Oct 2024 Zhenpeng Su, Xing Wu, Zijia Lin, Yizhe Xiong, Minxuan Lv, Guangyuan Ma, Hui Chen, Songlin Hu, Guiguang Ding

Large language models (LLM) have been attracting much attention from the community recently, due to their remarkable performance in all kinds of downstream tasks.

Scaffold-BPE: Enhancing Byte Pair Encoding for Large Language Models with Simple and Effective Scaffold Token Removal

no code implementations27 Apr 2024 Haoran Lian, Yizhe Xiong, Jianwei Niu, Shasha Mo, Zhenpeng Su, Zijia Lin, Hui Chen, Peng Liu, Jungong Han, Guiguang Ding

Since BPE iteratively merges the most frequent token pair in the text corpus to generate a new token and keeps all generated tokens in the vocabulary, it unavoidably holds tokens that primarily act as components of a longer token and appear infrequently on their own.

Language Modeling Language Modelling +1

Temporal Scaling Law for Large Language Models

no code implementations27 Apr 2024 Yizhe Xiong, Xiansheng Chen, Xin Ye, Hui Chen, Zijia Lin, Haoran Lian, Zhenpeng Su, Jianwei Niu, Guiguang Ding

In this paper, we propose the novel concept of Temporal Scaling Law, studying how the test loss of an LLM evolves as the training steps scale up.

Position

PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation

1 code implementation14 Mar 2024 Yizhe Xiong, Hui Chen, Tianxiang Hao, Zijia Lin, Jungong Han, Yuesong Zhang, Guoxin Wang, Yongjun Bao, Guiguang Ding

Consequently, a simple combination of them cannot guarantee accomplishing both training efficiency and inference efficiency with minimal costs.

Model Compression parameter-efficient fine-tuning

Confidence-based Visual Dispersal for Few-shot Unsupervised Domain Adaptation

1 code implementation ICCV 2023 Yizhe Xiong, Hui Chen, Zijia Lin, Sicheng Zhao, Guiguang Ding

To address this issue, recent works consider the Few-shot Unsupervised Domain Adaptation (FUDA) where only a few source samples are labeled, and conduct knowledge transfer via self-supervised learning methods.

Self-Supervised Learning Transfer Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.