1 code implementation • ACL 2022 • Xu Han, Guoyang Zeng, Weilin Zhao, Zhiyuan Liu, Zhengyan Zhang, Jie zhou, Jun Zhang, Jia Chao, Maosong Sun
In recent years, large-scale pre-trained language models (PLMs) containing billions of parameters have achieved promising results on various NLP tasks.
1 code implementation • Findings (ACL) 2022 • Yining Ye, Fanchao Qi, Zhiyuan Liu, Maosong Sun
However, all existing sememe prediction studies ignore the hierarchical structures of sememes, which are important in the sememe-based semantic description system.
1 code implementation • ACL 2022 • Xu Han, Yuqi Luo, Weize Chen, Zhiyuan Liu, Maosong Sun, Zhou Botong, Hao Fei, Suncong Zheng
In this paper, we propose a cross-lingual contrastive learning framework to learn FGET models for low-resource languages.
1 code implementation • Findings (ACL) 2022 • Xin Lv, Yankai Lin, Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu, Peng Li, Jie zhou
In recent years, pre-trained language models (PLMs) have been shown to capture factual knowledge from massive texts, which encourages the proposal of PLM-based knowledge graph completion (KGC) models.
no code implementations • COLING 2022 • Tianhao Gao, Jun Fang, Hanyu Liu, Zhiyuan Liu, Chao Liu, Pengzhang Liu, Yongjun Bao, Weipeng Yan
This paper proposes a unified generative multi-task framework that can solve multiple ABSA tasks by controlling the type of task prompts consisting of multiple element prompts.
Ranked #5 on
Aspect-Based Sentiment Analysis (ABSA)
on TASD
(using extra training data)
Aspect-Based Sentiment Analysis
Aspect-Based Sentiment Analysis (ABSA)
1 code implementation • EMNLP 2021 • Yuan YAO, Jiaju Du, Yankai Lin, Peng Li, Zhiyuan Liu, Jie zhou, Maosong Sun
Existing relation extraction (RE) methods typically focus on extracting relational facts between entity pairs within single sentences or documents.
1 code implementation • ACL 2022 • Biru Zhu, Yujia Qin, Fanchao Qi, Yangdong Deng, Zhiyuan Liu, Maosong Sun, Ming Gu
To validate our viewpoints, we design two methods to evaluate the robustness of FMS: (1) model disguise attack, which post-trains an inferior PTM with a contrastive objective, and (2) evaluation data selection, which selects a subset of the data points for FMS evaluation based on K-means clustering.
no code implementations • 21 Apr 2025 • Ji Qi, Yuan YAO, Yushi Bai, Bin Xu, Juanzi Li, Zhiyuan Liu, Tat-Seng Chua
Large Multimodal Models (LMMs) uniformly perceive video frames, creating computational inefficiency for videos with inherently varying temporal information density.
1 code implementation • 8 Apr 2025 • Haoyu Wang, Yujia Fu, Zhu Zhang, Shuo Wang, Zirui Ren, Xiaorong Wang, Zhili Li, Chaoqun He, Bo An, Zhiyuan Liu, Maosong Sun
Long-form generation is crucial for a wide range of practical applications, typically categorized into short-to-long and long-to-long generation.
no code implementations • 4 Apr 2025 • Bingxiang He, Wenbin Zhang, Jiaxi Song, Cheng Qian, Zixuan Fu, Bowen Sun, Ning Ding, Haiwen Hong, Longtao Huang, Hui Xue, Ganqu Cui, Wanxiang Che, Zhiyuan Liu, Maosong Sun
Preference learning is critical for aligning large language models (LLMs) with human values, yet its success hinges on high-quality datasets comprising three core components: Preference \textbf{A}nnotations, \textbf{I}nstructions, and \textbf{R}esponse Pairs.
no code implementations • 31 Mar 2025 • Fengxiang Wang, Hongzhen Wang, Mingshuo Chen, Di Wang, Yulin Wang, Zonghao Guo, Qiang Ma, Long Lan, Wenjing Yang, Jing Zhang, Zhiyuan Liu, Maosong Sun
On top of the XLRS-Bench, 16 sub-tasks are defined to evaluate MLLMs' 10 kinds of perceptual capabilities and 6 kinds of reasoning capabilities, with a primary emphasis on advanced cognitive processes that facilitate real-world decision-making and the capture of spatiotemporal changes.
1 code implementation • 31 Mar 2025 • Yuxuan Chen, Dewen Guo, Sen Mei, Xinze Li, Hao Chen, Yishan Li, YiXuan Wang, Chaoyue Tang, Ruobing Wang, Dingjun Wu, Yukun Yan, Zhenghao Liu, Shi Yu, Zhiyuan Liu, Maosong Sun
Retrieval-Augmented Generation (RAG) significantly enhances the performance of large language models (LLMs) in downstream tasks by integrating external knowledge.
no code implementations • 20 Mar 2025 • Zhiyuan Liu, Yuting Zhang, Feng Liu, Changwang Zhang, Ying Sun, Jun Wang
Multimodal Large Language Models (MLLMs) have gained significant traction for their ability to process diverse input data types and generate coherent, contextually relevant outputs across various applications.
no code implementations • 19 Mar 2025 • Yanchen Luo, Zhiyuan Liu, Yi Zhao, Sihang Li, Kenji Kawaguchi, Tat-Seng Chua, Xiang Wang
In this work, we propose \textbf{U}nified Variational \textbf{A}uto-\textbf{E}ncoder for \textbf{3D} Molecular Latent Diffusion Modeling (\textbf{UAE-3D}), a multi-modal VAE that compresses 3D molecules into latent sequences from a unified latent space, while maintaining near-zero reconstruction error.
1 code implementation • 17 Mar 2025 • Kairong Luo, Haodong Wen, Shengding Hu, Zhenbo Sun, Zhiyuan Liu, Maosong Sun, Kaifeng Lyu, WenGuang Chen
Training large models is both resource-intensive and time-consuming, making it crucial to understand the quantitative relationship between model performance and hyperparameters.
no code implementations • 15 Mar 2025 • Zhiyuan Liu, Shuhang Zhang, Qingyu Liu, Hongliang Zhang, Lingyang Song
Fine-grained radio map presents communication parameters of interest, e. g., received signal strength, at every point across a large geographical region.
1 code implementation • 12 Mar 2025 • Yaorui Shi, Jiaqi Yang, Sihang Li, Junfeng Fang, Xiang Wang, Zhiyuan Liu, Yang Zhang
Pre-trained language models (PLMs) have revolutionized scientific research, yet their application to single-cell analysis remains limited.
1 code implementation • 12 Mar 2025 • Yingfa Chen, Yutong Wu, Xu Han, Zhiyuan Liu, Maosong Sun
However, they overlook the impacts of context length and attention head configuration (the number of query and key-value heads in grouped-query attention) on training and inference.
no code implementations • 25 Feb 2025 • Yu Xia, Jingru Fan, Weize Chen, Siyu Yan, Xin Cong, Zhong Zhang, Yaxi Lu, Yankai Lin, Zhiyuan Liu, Maosong Sun
Based on this finding, we propose AgentRM, a generalizable reward model, to guide the policy model for effective test-time search.
1 code implementation • 21 Feb 2025 • Pengcheng Huang, Zhenghao Liu, Yukun Yan, Xiaoyuan Yi, Hao Chen, Zhiyuan Liu, Maosong Sun, Tong Xiao, Ge Yu, Chenyan Xiong
Knowledge-Augmented Generation (KAG) has shown great promise in updating the internal memory of Large Language Models (LLMs) by integrating external knowledge.
1 code implementation • 20 Feb 2025 • Shangqing Tu, Yucheng Wang, Daniel Zhang-li, Yushi Bai, Jifan Yu, Yuhao Wu, Lei Hou, Huiqin Liu, Zhiyuan Liu, Bin Xu, Juanzi Li
Moreover, to achieve long outputs that maintain high-fidelity to the input images, we employ Direct Preference Optimization (DPO) to the SFT model.
1 code implementation • 20 Feb 2025 • Jianling Li, Shangzhan Li, Zhenye Gao, Qi Shi, YuXuan Li, Zefan Wang, Jiacheng Huang, Haojie Wang, Jianrong Wang, Xu Han, Zhiyuan Liu, Maosong Sun
Despite advances in large language models (LLMs) for conventional code generation, these models struggle to generate accurate, performance-optimized Triton code, as they lack awareness of its specifications and the complexities of GPU programming.
no code implementations • 20 Feb 2025 • Weilin Zhao, Tengyu Pan, Xu Han, Yudi Zhang, Ao Sun, Yuxiang Huang, Kaihuo Zhang, Weilun Zhao, YuXuan Li, Jianyong Wang, Zhiyuan Liu, Maosong Sun
Speculative sampling has emerged as an important technique for accelerating the auto-regressive generation process of large language models (LLMs) by utilizing a draft-then-verify mechanism to produce multiple tokens per forward pass.
1 code implementation • 19 Feb 2025 • Shi Yu, Zhiyuan Liu, Chenyan Xiong
Web crawl is a main source of large language models' (LLMs) pretraining data, but the majority of crawled web pages are discarded in pretraining due to low data quality.
1 code implementation • 18 Feb 2025 • Zhiyuan Liu, Yanchen Luo, Han Huang, Enzhi Zhang, Sihang Li, Junfeng Fang, Yaorui Shi, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua
To combine these advantages for 3D molecule generation, we propose a foundation model -- NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation.
1 code implementation • 18 Feb 2025 • Yifan Ji, Zhipeng Xu, Zhenghao Liu, Yukun Yan, Shi Yu, Yishan Li, Zhiyuan Liu, Yu Gu, Ge Yu, Maosong Sun
Recent dense retrievers usually thrive on the emergency capabilities of Large Language Models (LLMs), using them to encode queries and documents into an embedding space for retrieval.
no code implementations • 17 Feb 2025 • Haoxuan Li, Jifan Yu, Xin Cong, Yang Dang, Yisi Zhan, Huiqin Liu, Zhiyuan Liu
Metacognitive education plays a crucial role in cultivating students' self-regulation and reflective thinking, providing essential support for those with learning difficulties through academic advising.
1 code implementation • 17 Feb 2025 • Yuxiang Huang, Mingye Li, Xu Han, Chaojun Xiao, Weilin Zhao, Sun Ao, Hao Zhou, Jie zhou, Zhiyuan Liu, Maosong Sun
While long-context inference is crucial for advancing large language model (LLM) applications, its prefill speed remains a significant bottleneck.
1 code implementation • 13 Feb 2025 • Chao Song, Zhiyuan Liu, Yu Rong, Qiang Liu, Shu Wu, Liang Wang
This survey aims to facilitate understanding and further flourishing development in this area.
1 code implementation • 3 Feb 2025 • Ganqu Cui, Lifan Yuan, Zefan Wang, Hanbin Wang, Wendi Li, Bingxiang He, Yuchen Fan, Tianyu Yu, Qixin Xu, Weize Chen, Jiarui Yuan, Huayu Chen, Kaiyan Zhang, Xingtai Lv, Shuo Wang, Yuan YAO, Xu Han, Hao Peng, Yu Cheng, Zhiyuan Liu, Maosong Sun, BoWen Zhou, Ning Ding
While dense rewards also offer an appealing choice for the reinforcement learning (RL) of LLMs since their fine-grained rewards have the potential to address some inherent issues of outcome rewards, such as training efficiency and credit assignment, this potential remains largely unrealized.
1 code implementation • 11 Jan 2025 • Xuanle Zhao, Xianzhen Luo, Qi Shi, Chi Chen, Shuo Wang, Wanxiang Che, Zhiyuan Liu, Maosong Sun
: (1) Low executability and poor restoration of chart details in the generated code and (2) Lack of large-scale and diverse training data.
no code implementations • 10 Jan 2025 • You Li, Heyu Huang, Chi Chen, Kaiyu Huang, Chao Huang, Zonghao Guo, Zhiyuan Liu, Jinan Xu, Yuhua Li, Ruixuan Li, Maosong Sun
The recent advancement of Multimodal Large Language Models (MLLMs) has significantly improved their fine-grained perception of single images and general comprehension across multiple images.
1 code implementation • 18 Dec 2024 • YiPeng Zhang, Yifan Liu, Zonghao Guo, Yidan Zhang, Xuesong Yang, Chi Chen, Jun Song, Bo Zheng, Yuan YAO, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun
To address this issue, we present LLaVA-UHD v2, an advanced MLLM centered around a Hierarchical window transformer that enables capturing diverse visual granularity by constructing and integrating a high-resolution feature pyramid.
no code implementations • 18 Dec 2024 • Shuyin Xia, Xinjun Ma, Zhiyuan Liu, Cheng Liu, Sen Zhao, Guoyin Wang
Experimental results demonstrate that our method achieves accuracy comparable to training on the original graph.
1 code implementation • 10 Dec 2024 • Jinyi Hu, Shengding Hu, Yuxuan Song, Yufei Huang, Mingxuan Wang, Hao Zhou, Zhiyuan Liu, Wei-Ying Ma, Maosong Sun
The analysis of the trade-off between autoregressive modeling and diffusion demonstrates the potential of ACDiT to be used in long-horizon visual generation tasks.
Ranked #8 on
Video Generation
on UCF-101
no code implementations • 10 Dec 2024 • Yaorui Shi, Sihang Li, Taiyan Zhang, Xi Fang, Jiankun Wang, Zhiyuan Liu, Guojiang Zhao, Zhengdan Zhu, Zhifeng Gao, Renxin Zhong, Linfeng Zhang, Guolin Ke, Weinan E, Hengxing Cai, Xiang Wang
Automated drug discovery offers significant potential for accelerating the development of novel therapeutics by substituting labor-intensive human workflows with machine-driven processes.
no code implementations • 5 Dec 2024 • Chaojun Xiao, Jie Cai, Weilin Zhao, Guoyang Zeng, Biyuan Lin, Jie zhou, Zhi Zheng, Xu Han, Zhiyuan Liu, Maosong Sun
This paper introduces the concept of ``\textit{capacity density}'' as a new metric to evaluate the quality of the LLMs across different scales and describes the trend of LLMs in terms of both effectiveness and efficiency.
2 code implementations • 2 Dec 2024 • Lifan Yuan, Wendi Li, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, BoWen Zhou, Zhiyuan Liu, Hao Peng
The only assumption is to parameterize the outcome reward as the log-likelihood ratios of the policy and reference models, which can be optimized regardless of the specific choice of loss objectives.
no code implementations • 2 Dec 2024 • Alan Wake, Bei Chen, C. X. Lv, Chao Li, Chengen Huang, Chenglin Cai, Chujie Zheng, Daniel Cooper, Fan Zhou, Feng Hu, Ge Zhang, Guoyin Wang, Heng Ji, Howard Qiu, Jiangcheng Zhu, Jun Tian, Katherine Su, Lihuan Zhang, Liying Li, Ming Song, Mou Li, Peng Liu, Qicheng Hu, Shawn Wang, Shijun Zhou, Shiming Yang, Shiyong Li, Tianhang Zhu, Wen Xie, Wenhao Huang, Xiang He, Xiaobo Chen, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Yanpeng Li, Yongke Zhao, Yongzhen Luo, Yuchi Xu, Yuxuan Sha, Zhaodong Yan, Zhiyuan Liu, Zirui Zhang, Zonghong Dai
This technical report presents Yi-Lightning, our latest flagship large language model (LLM).
1 code implementation • 22 Nov 2024 • Zheni Zeng, Yuxuan Chen, Shi Yu, Ruobing Wang, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, Maosong Sun
Humans can utilize techniques to quickly acquire knowledge from specific materials in advance, such as creating self-assessment questions, enabling us to achieving related tasks more efficiently.
1 code implementation • 11 Nov 2024 • Zanlin Ni, Yulin Wang, Renping Zhou, Yizeng Han, Jiayi Guo, Zhiyuan Liu, Yuan YAO, Gao Huang
At the spatial level, we disentangle the computations of visible and mask tokens by encoding visible tokens independently, while decoding mask tokens conditioned on the fully encoded visible tokens.
1 code implementation • 8 Nov 2024 • Shengda Fan, Xin Cong, Yuepeng Fu, Zhong Zhang, Shuyan Zhang, Yuanwei Liu, Yesai Wu, Yankai Lin, Zhiyuan Liu, Maosong Sun
Finally, we merge the synthetic samples that pass quality confirmation with the collected samples to obtain the WorkflowBench.
1 code implementation • 4 Nov 2024 • Yuqi Luo, Chenyang Song, Xu Han, Yingfa Chen, Chaojun Xiao, Zhiyuan Liu, Maosong Sun
These demonstrate that ReLU is more efficient as the activation function than SiLU and can leverage more training data to improve activation sparsity.
no code implementations • 30 Oct 2024 • Xujia Wang, Haiyan Zhao, Shuo Wang, Hanqing Wang, Zhiyuan Liu
Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have significantly improved the adaptation of LLMs to downstream tasks in a resource-efficient manner.
1 code implementation • 17 Oct 2024 • Xinze Li, Sen Mei, Zhenghao Liu, Yukun Yan, Shuo Wang, Shi Yu, Zheni Zeng, Hao Chen, Ge Yu, Zhiyuan Liu, Maosong Sun, Chenyan Xiong
Our experiments on various knowledge-intensive tasks demonstrate that DDR significantly outperforms the SFT method, particularly for LLMs with smaller-scale parameters that depend more on the retrieved knowledge.
1 code implementation • 16 Oct 2024 • Yaxi Lu, Shenzhi Yang, Cheng Qian, Guirong Chen, Qinyu Luo, Yesai Wu, Huadong Wang, Xin Cong, Zhong Zhang, Yankai Lin, Weiwen Liu, Yasheng Wang, Zhiyuan Liu, Fangming Liu, Maosong Sun
The labeled data is used to train a reward model that simulates human judgment and serves as an automatic evaluator of the proactiveness of LLM agents.
1 code implementation • 14 Oct 2024 • Shi Yu, Chaoyue Tang, Bokai Xu, Junbo Cui, Junhao Ran, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, Maosong Sun
In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.
1 code implementation • 12 Oct 2024 • Zihan Zhou, Chong Li, Xinyi Chen, Shuo Wang, Yu Chao, Zhili Li, Haoyu Wang, Rongqiao An, Qi Shi, Zhixing Tan, Xu Han, Xiaodong Shi, Zhiyuan Liu, Maosong Sun
The proposed LLM$\times$MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate answers to produce the final output.
1 code implementation • 11 Oct 2024 • Ruobing Wang, Daren Zha, Shi Yu, Qingfei Zhao, Yuxuan Chen, YiXuan Wang, Shuo Wang, Yukun Yan, Zhenghao Liu, Xu Han, Zhiyuan Liu, Maosong Sun
Retrieval-Augmented Generation (RAG) mitigates issues of the factual errors and hallucinated outputs generated by Large Language Models (LLMs) in open-domain question-answering tasks (OpenQA) via introducing external knowledge.
no code implementations • 10 Oct 2024 • Weize Chen, Jiarui Yuan, Chen Qian, Cheng Yang, Zhiyuan Liu, Maosong Sun
Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving, yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods.
no code implementations • 9 Oct 2024 • Yingfa Chen, Xinrong Zhang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun
For the second concern, we train a series of Mamba-2 models on long documents to empirically estimate the recurrent state capacity in language modeling and passkey retrieval.
1 code implementation • 9 Oct 2024 • Cheng Gao, Chaojun Xiao, Zhenghao Liu, Huimin Chen, Zhiyuan Liu, Maosong Sun
Besides, the construction method can also be applied to civil cases and achieve promising results.
no code implementations • 8 Oct 2024 • Ranchi Zhao, Zhen Leng Thai, Yifan Zhang, Shengding Hu, Yunqi Ba, Jie zhou, Jie Cai, Zhiyuan Liu, Maosong Sun
The performance of Large Language Models (LLMs) is substantially influenced by the pretraining corpus, which consists of vast quantities of unsupervised data processed by the models.
1 code implementation • 4 Oct 2024 • Zhengyan Zhang, Chaojun Xiao, Qiujieli Qin, Yankai Lin, Zhiyuan Zeng, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie zhou
SSD adaptively switches between the Mixtures-of-Experts (MoE) based sparse training and the conventional dense training during the pre-training process, leveraging the efficiency of sparse training and avoiding the static activation correlation of sparse training.
no code implementations • 4 Oct 2024 • Yanchen Luo, Junfeng Fang, Sihang Li, Zhiyuan Liu, Jiancan Wu, An Zhang, Wenjie Du, Xiang Wang
The de novo generation of molecules with targeted properties is crucial in biology, chemistry, and drug discovery.
1 code implementation • 2 Oct 2024 • Yuxiang Huang, Binhang Yuan, Xu Han, Chaojun Xiao, Zhiyuan Liu
Scaling the input context length of a large language model (LLM) incurs a significant increase in computation cost and memory footprint to maintain the attention key-value (KV) cache.
2 code implementations • 29 Sep 2024 • Xin Li, Weize Chen, Qizhi Chu, Haopeng Li, Zhaojun Sun, Ran Li, Chen Qian, Yiwei Wei, Zhiyuan Liu, Chuan Shi, Maosong Sun, Cheng Yang
Our results underscore that the capabilities of LLMs in handling structured data are still under-explored, and show the effectiveness of LLM4Graph in enhancing LLMs' proficiency of graph analysis.
no code implementations • 18 Sep 2024 • Wang Xu, Shuo Wang, Weilin Zhao, Xu Han, Yukun Yan, Yudi Zhang, Zhe Tao, Zhiyuan Liu, Wanxiang Che
To address this limitation, researchers have proposed duplex models.
no code implementations • 11 Sep 2024 • Daniel Zhang-li, Zheyuan Zhang, Jifan Yu, Joy Lim Jia Yin, Shangqing Tu, Linlu Gong, Haohua Wang, Zhiyuan Liu, Huiqin Liu, Lei Hou, Juanzi Li
We develop Slide2Lecture, a tuning-free and knowledge-regulated intelligent tutoring system that can (1) effectively convert an input lecture slide into a structured teaching agenda consisting of a set of heterogeneous teaching actions; (2) create and manage an interactive lecture that generates responsive interactions catering to student learning demands while regulating the interactions to follow teaching actions.
no code implementations • 5 Sep 2024 • Jifan Yu, Zheyuan Zhang, Daniel Zhang-li, Shangqing Tu, Zhanxin Hao, Rui Miao Li, Haoxuan Li, Yuanchun Wang, Hanming Li, Linlu Gong, Jie Cao, Jiayin Lin, Jinchang Zhou, Fei Qin, Haohua Wang, Jianxiao Jiang, Lijun Deng, Yisi Zhan, Chaojun Xiao, Xusheng Dai, Xuan Yan, Nianyi Lin, Nan Zhang, Ruixin Ni, Yang Dang, Lei Hou, Yu Zhang, Xu Han, Manli Li, Juanzi Li, Zhiyuan Liu, Huiqin Liu, Maosong Sun
Since the first instances of online education, where courses were uploaded to accessible and shared online platforms, this form of scaling the dissemination of human knowledge to reach a broader audience has sparked extensive discussion and widespread adoption.
no code implementations • 4 Sep 2024 • Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, GuanYu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun
We first formalize modules into emergent bricks - functional neuron partitions that emerge during the pre-training phase, and customized bricks - bricks constructed via additional post-training to improve the capabilities and knowledge of LLMs.
1 code implementation • 2 Sep 2024 • Yingfa Chen, Chenlong Hu, Cong Feng, Chenyang Song, Shi Yu, Xu Han, Zhiyuan Liu, Maosong Sun
This study presents a multi-modal multi-granularity tokenizer specifically designed for analyzing ancient Chinese scripts, focusing on the Chu bamboo slip (CBS) script used during the Spring and Autumn and Warring States period (771-256 BCE) in Ancient China.
1 code implementation • 31 Aug 2024 • Zanlin Ni, Yulin Wang, Renping Zhou, Rui Lu, Jiayi Guo, Jinyi Hu, Zhiyuan Liu, Yuan YAO, Gao Huang
As a representative work, non-autoregressive Transformers (NATs) are able to synthesize images with decent quality in a small number of steps.
1 code implementation • 27 Aug 2024 • Chi-Min Chan, Jianxuan Yu, Weize Chen, Chunyang Jiang, Xinyu Liu, Weijie Shi, Zhiyuan Liu, Wei Xue, Yike Guo
However, configuring an MAS for a task remains challenging, with performance only observable post-execution.
1 code implementation • 9 Aug 2024 • Weiqing Yang, Hanbin Wang, Zhenghao Liu, Xinze Li, Yukun Yan, Shuo Wang, Yu Gu, Minghe Yu, Zhiyuan Liu, Ge Yu
In this paper, we introduce DEBUGEVAL, a comprehensive benchmark for evaluating the debugging abilities of LLMs by emulating the multi-stage human debugging process.
2 code implementations • 3 Aug 2024 • Yuan YAO, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun
The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of AI research and industry, shedding light on a promising path toward the next AI milestone.
Ranked #7 on
Multiple-choice
on Neptune-Full
1 code implementation • 2 Aug 2024 • Kunlun Zhu, Yifan Luo, Dingling Xu, Yukun Yan, Zhenghao Liu, Shi Yu, Ruobing Wang, Shuo Wang, Yishan Li, Nan Zhang, Xu Han, Zhiyuan Liu, Maosong Sun
However, evaluating the effectiveness of RAG systems in specialized scenarios remains challenging due to the high costs of data construction and the lack of suitable evaluation metrics.
no code implementations • 19 Jul 2024 • Stephen M. Pizer, Zhiyuan Liu, Junjie Zhao, Nicholas Tapp-Hughes, James Damon, Miaomiao Zhang, JS Marron, Jared Vicory
We describe a representation targeted for anatomic objects which is designed to enable strong locational correspondence within object populations and thus to provide powerful object statistics.
1 code implementation • 17 Jul 2024 • Zheni Zeng, Jiayi Chen, Huimin Chen, Yukun Yan, Yuxuan Chen, Zhenghao Liu, Zhiyuan Liu, Maosong Sun
Large language models exhibit aspects of human-level intelligence that catalyze their application as human-like agents in domains such as social simulations, human-machine interactions, and collaborative multi-agent systems.
no code implementations • 16 Jul 2024 • JunHao Chen, Shengding Hu, Zhiyuan Liu, Maosong Sun
Our work presents a novel exploration of LLMs' symbolic calculation abilities and the underlying mechanisms.
1 code implementation • 9 Jul 2024 • Weize Chen, Ziming You, Ran Li, Yitong Guan, Chen Qian, Chenyang Zhao, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun
The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents.
no code implementations • 27 Jun 2024 • Zheyuan Zhang, Daniel Zhang-li, Jifan Yu, Linlu Gong, Jinchang Zhou, Zhanxin Hao, Jianxiao Jiang, Jie Cao, Huiqin Liu, Zhiyuan Liu, Lei Hou, Juanzi Li
In this work, we propose SimClass, a multi-agent classroom simulation teaching framework.
1 code implementation • 22 Jun 2024 • Xinrong Zhang, Yingfa Chen, Shengding Hu, Xu Han, Zihang Xu, Yuanwei Xu, Weilin Zhao, Maosong Sun, Zhiyuan Liu
Specifically, we divide the queries and responses of conversations into several time slices and then adopt a time-division-multiplexing (TDM) encoding-decoding strategy to pseudo-simultaneously process these slices.
1 code implementation • 19 Jun 2024 • He Cao, Yanjun Shao, Zhiyuan Liu, Zijing Liu, Xiangru Tang, Yuan YAO, Yu Li
Current approaches, however, often neglect the critical role of multiple molecule graph interaction in understanding chemical reactions, leading to suboptimal performance in synthetic chemistry tasks.
1 code implementation • 17 Jun 2024 • Fengxiang Wang, Hongzhen Wang, Di Wang, Zonghao Guo, Zhenyu Zhong, Long Lan, Jing Zhang, Zhiyuan Liu, Maosong Sun
To address these issues, we present a new pre-training pipeline for RS models, featuring the creation of a large-scale RS dataset and an efficient MIM approach.
1 code implementation • 17 Jun 2024 • Wentong Chen, Junbo Cui, Jinyi Hu, Yujia Qin, Junjie Fang, Yue Zhao, Chongyi Wang, Jun Liu, Guirong Chen, Yupeng Huo, Yuan YAO, Yankai Lin, Zhiyuan Liu, Maosong Sun
Utilizing Graphic User Interface (GUI) for human-computer interaction is essential for accessing a wide range of digital tools.
Ranked #15 on
Natural Language Visual Grounding
on ScreenSpot
Natural Language Visual Grounding
Optical Character Recognition (OCR)
no code implementations • 17 Jun 2024 • Bingxiang He, Ning Ding, Cheng Qian, Jia Deng, Ganqu Cui, Lifan Yuan, Huan-ang Gao, Huimin Chen, Zhiyuan Liu, Maosong Sun
For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.
1 code implementation • 13 Jun 2024 • Bowen Ping, Shuo Wang, Hanqing Wang, Xu Han, Yuzhuang Xu, Yukun Yan, Yun Chen, Baobao Chang, Zhiyuan Liu, Maosong Sun
Motivated by the long-tail distribution of singular values in the delta weights, we propose a delta quantization approach using mixed-precision.
1 code implementation • 11 Jun 2024 • Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Kunlun Zhu, Hanchen Xia, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun
Recent breakthroughs in large language model-driven autonomous agents have revealed that multi-agent collaboration often surpasses each individual through collective reasoning.
1 code implementation • CVPR 2024 • Zanlin Ni, Yulin Wang, Renping Zhou, Jiayi Guo, Jinyi Hu, Zhiyuan Liu, Shiji Song, Yuan YAO, Gao Huang
In this paper, we aim to re-evaluate the full potential of NATs by revisiting the design of their training and inference strategies.
1 code implementation • 6 Jun 2024 • Kaiyan Zhang, Sihang Zeng, Ermo Hua, Ning Ding, Zhang-Ren Chen, Zhiyuan Ma, Haoxin Li, Ganqu Cui, Biqing Qi, Xuekai Zhu, Xingtai Lv, Hu Jinfang, Zhiyuan Liu, BoWen Zhou
Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains and are moving towards more specialized areas.
5 code implementations • 27 May 2024 • Tianyu Yu, Haoye Zhang, Qiming Li, Qixin Xu, Yuan YAO, Da Chen, Xiaoman Lu, Ganqu Cui, Yunkai Dang, Taiwen He, Xiaocheng Feng, Jun Song, Bo Zheng, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun
Traditional feedback learning for hallucination reduction relies on labor-intensive manual labeling or expensive proprietary models.
Ranked #1 on
Visual Question Answering
on AMBER
1 code implementation • 23 May 2024 • Zhiyuan Liu, Yaorui Shi, An Zhang, Sihang Li, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua
To resolve the challenges above, we propose a new pretraining method, ReactXT, for reaction-text modeling, and a new dataset, OpenExp, for experimental procedure prediction.
1 code implementation • 21 May 2024 • Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua
ProtT3 empowers an LM to understand protein sequences of amino acids by incorporating a PLM as its protein understanding module, enabling effective protein-to-text generation.
no code implementations • 7 May 2024 • Chen Qian, Jiahao Li, Yufan Dang, Wei Liu, Yifei Wang, Zihao Xie, Weize Chen, Cheng Yang, Yingli Zhang, Zhiyuan Liu, Maosong Sun
We propose two fundamental patterns: the successive pattern, refining based on nearest experiences within a task batch, and the cumulative pattern, acquiring experiences across all previous task batches.
no code implementations • 28 Apr 2024 • Zhili Cheng, Zhitong Wang, Jinyi Hu, Shengding Hu, An Liu, Yuge Tu, Pengkai Li, Lei Shi, Zhiyuan Liu, Maosong Sun
Despite advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), their integration into language-grounded, human-like embodied agents remains incomplete, hindering complex real-life task performance in physical environments.
no code implementations • 16 Apr 2024 • Yuning Wang, Zhiyuan Liu, Haotian Lin, Junkai Jiang, Shaobing Xu, Jianqiang Wang
In this study, we propose PreGSU, a generalized pre-trained scene understanding model based on graph attention network to learn the universal interaction and reasoning of traffic scenes to support various downstream tasks.
1 code implementation • 11 Apr 2024 • Chaoqun He, Renjie Luo, Shengding Hu, Yuanqian Zhao, Jie zhou, Hanghao Wu, Jiajie Zhang, Xu Han, Zhiyuan Liu, Maosong Sun
The rapid development of LLMs calls for a lightweight and easy-to-use framework for swift evaluation deployment.
3 code implementations • 9 Apr 2024 • Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, Xinrong Zhang, Zheng Leng Thai, Kaihuo Zhang, Chongyi Wang, Yuan YAO, Chenyang Zhao, Jie zhou, Jie Cai, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun
For data scaling, we introduce a Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive to continuous training and domain adaptation.
1 code implementation • 2 Apr 2024 • Lifan Yuan, Ganqu Cui, Hanbin Wang, Ning Ding, Xingyao Wang, Jia Deng, Boji Shan, Huimin Chen, Ruobing Xie, Yankai Lin, Zhenghao Liu, BoWen Zhou, Hao Peng, Zhiyuan Liu, Maosong Sun
We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning.
1 code implementation • 26 Mar 2024 • Yingfa Chen, Zhengyan Zhang, Xu Han, Chaojun Xiao, Zhiyuan Liu, Chen Chen, Kuai Li, Tao Yang, Maosong Sun
Large language models (LLMs) can make predictions using parametric knowledge--knowledge encoded in the model weights--or contextual knowledge--knowledge presented in the context.
1 code implementation • 18 Mar 2024 • Ruyi Xu, Yuan YAO, Zonghao Guo, Junbo Cui, Zanlin Ni, Chunjiang Ge, Tat-Seng Chua, Zhiyuan Liu, Maosong Sun, Gao Huang
To address the challenges, we present LLaVA-UHD, a large multimodal model that can efficiently perceive images in any aspect ratio and high resolution.
Ranked #5 on
Long-Context Understanding
on MMNeedle
1 code implementation • 14 Mar 2024 • Ao Sun, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun
Effective attention modules have played a crucial role in the success of Transformer-based large language models (LLMs), but the quadratic time and memory complexities of these attention modules also pose a challenge when processing long sequences.
no code implementations • 13 Mar 2024 • Ning Ding, Yulin Chen, Ganqu Cui, Xingtai Lv, Weilin Zhao, Ruobing Xie, BoWen Zhou, Zhiyuan Liu, Maosong Sun
Underlying data distributions of natural language, programming code, and mathematical symbols vary vastly, presenting a complex challenge for large language models (LLMs) that strive to achieve high performance across all three domains simultaneously.
4 code implementations • 12 Mar 2024 • Zhicheng Guo, Sijie Cheng, Hao Wang, Shihao Liang, Yujia Qin, Peng Li, Zhiyuan Liu, Maosong Sun, Yang Liu
The virtual API server contains a caching system and API simulators which are complementary to alleviate the change in API status.
1 code implementation • 7 Mar 2024 • 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Guoyin Wang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie, Yanpeng Li, Yuchi Xu, Yudong Liu, Yue Wang, Yuxuan Cai, Zhenyu Gu, Zhiyuan Liu, Zonghong Dai
The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models.
Ranked #1 on
Chatbot
on AlpacaEval
(using extra training data)
no code implementations • 4 Mar 2024 • Si Sun, Hanqing Zhang, Zhiyuan Liu, Jie Bao, Dawei Song
Dense Retrieval (DR) is now considered as a promising tool to enhance the memorization capacity of Large Language Models (LLM) such as GPT3 and GPT-4 by incorporating external memories.
1 code implementation • 29 Feb 2024 • Yiju Guo, Ganqu Cui, Lifan Yuan, Ning Ding, Zexu Sun, Bowen Sun, Huimin Chen, Ruobing Xie, Jie zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun
In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e. g., harmlessness) can diminish performance in others (e. g., helpfulness).
1 code implementation • 28 Feb 2024 • Weize Chen, Chenfei Yuan, Jiarui Yuan, Yusheng Su, Chen Qian, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun
Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs).
1 code implementation • 26 Feb 2024 • Qinyu Luo, Yining Ye, Shihao Liang, Zhong Zhang, Yujia Qin, Yaxi Lu, Yesai Wu, Xin Cong, Yankai Lin, Yingli Zhang, Xiaoyin Che, Zhiyuan Liu, Maosong Sun
Generative models have demonstrated considerable potential in software engineering, particularly in tasks such as code generation and debugging.
no code implementations • 23 Feb 2024 • Yufei Huang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun
Recent studies have uncovered intriguing phenomena in deep learning, such as grokking, double descent, and emergent abilities in large language models, which challenge human intuition and are crucial for a deeper understanding of neural models.
1 code implementation • 22 Feb 2024 • Zhipeng Xu, Zhenghao Liu, Yukun Yan, Zhiyuan Liu, Ge Yu, Chenyan Xiong
The web contains large-scale, diverse, and abundant information to satisfy the information-seeking needs of humans.
1 code implementation • 21 Feb 2024 • Zhipeng Xu, Zhenghao Liu, Yukun Yan, Shuo Wang, Shi Yu, Zheni Zeng, Chaojun Xiao, Zhiyuan Liu, Ge Yu, Chenyan Xiong
Retrieval-Augmented Generation (RAG) enables Large Language Models (LLMs) to leverage external knowledge, enhancing their performance on knowledge-intensive tasks.
4 code implementations • 21 Feb 2024 • Xinrong Zhang, Yingfa Chen, Shengding Hu, Zihang Xu, JunHao Chen, Moo Khai Hao, Xu Han, Zhen Leng Thai, Shuo Wang, Zhiyuan Liu, Maosong Sun
Processing and reasoning over long contexts is crucial for many practical applications of Large Language Models (LLMs), such as document comprehension and agent construction.
1 code implementation • 21 Feb 2024 • Weilin Zhao, Yuxiang Huang, Xu Han, Wang Xu, Chaojun Xiao, Xinrong Zhang, Yewei Fang, Kaihuo Zhang, Zhiyuan Liu, Maosong Sun
To achieve this, we introduce Ouroboros, which can generate draft phrases to parallelize the drafting process and meanwhile lengthen drafts in a training-free manner.
1 code implementation • 21 Feb 2024 • Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Zhen Leng Thai, Junhao Shen, Jinyi Hu, Xu Han, Yujie Huang, Yuxiang Zhang, Jie Liu, Lei Qi, Zhiyuan Liu, Maosong Sun
Notably, the best-performing model, GPT-4V, attains an average score of 17. 97% on OlympiadBench, with a mere 10. 74% in physics, highlighting the benchmark rigor and the intricacy of physical reasoning.
1 code implementation • 21 Feb 2024 • Chenyang Song, Xu Han, Zhengyan Zhang, Shengding Hu, Xiyu Shi, Kuai Li, Chen Chen, Zhiyuan Liu, Guangli Li, Tao Yang, Maosong Sun
Some recent efforts have explored introducing ReLU or its variants as the substitutive activation function to help LLMs achieve activation sparsity and inference acceleration, but few can simultaneously obtain high sparsity and comparable model performance.
1 code implementation • 20 Feb 2024 • Xueyang Feng, Zhi-Yuan Chen, Yujia Qin, Yankai Lin, Xu Chen, Zhiyuan Liu, Ji-Rong Wen
We construct a human-agent collaboration dataset to train this policy model in an offline reinforcement learning environment.
1 code implementation • 18 Feb 2024 • Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun
Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns.
no code implementations • 18 Feb 2024 • Hanqing Wang, Bowen Ping, Shuo Wang, Xu Han, Yun Chen, Zhiyuan Liu, Maosong Sun
Most prior works on LoRA combination primarily rely on task-level weights for each involved LoRA, making different examples and tokens share the same LoRA weights.
1 code implementation • 17 Feb 2024 • Yuzhuang Xu, Xu Han, Zonghan Yang, Shuo Wang, Qingfu Zhu, Zhiyuan Liu, Weidong Liu, Wanxiang Che
Model quantification uses low bit-width values to represent the weight matrices of existing models to be quantized, which is a promising approach to reduce both storage and computational overheads of deploying highly anticipated LLMs.
1 code implementation • 14 Feb 2024 • Cheng Qian, Bingxiang He, Zhong Zhuang, Jia Deng, Yujia Qin, Xin Cong, Zhong Zhang, Jie zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions.
1 code implementation • 7 Feb 2024 • Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun
In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning.
1 code implementation • 7 Feb 2024 • Haoyu Wang, Shuo Wang, Yukun Yan, Xujia Wang, Zhiyu Yang, Yuzhuang Xu, Zhenghao Liu, Liner Yang, Ning Ding, Xu Han, Zhiyuan Liu, Maosong Sun
Different from previous works that simply translate English instructions, we consider both the language-specific and language-agnostic abilities of LLMs.
no code implementations • 6 Feb 2024 • Zhengyan Zhang, Yixin Song, Guanghui Yu, Xu Han, Yankai Lin, Chaojun Xiao, Chenyang Song, Zhiyuan Liu, Zeyu Mi, Maosong Sun
To find the most efficient activation function for sparse computation, we propose a systematic framework to examine the sparsity of LLMs from three aspects: the trade-off between sparsity and performance, the predictivity of sparsity, and the hardware affinity.
1 code implementation • 6 Feb 2024 • Junfeng Fang, Shuai Zhang, Chang Wu, Zhengyi Yang, Zhiyuan Liu, Sihang Li, Kun Wang, Wenjie Du, Xiang Wang
Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research.
1 code implementation • 5 Feb 2024 • Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yankai Lin, Yukun Yan, Xiaodong Shi, Sen Song, Zhiyuan Liu, Maosong Sun
Distinguished by its four core dimensions-Memory Management, Memory Writing, Memory Reading, and Memory Injection, UniMem empowers researchers to conduct systematic exploration of long-context methods.
no code implementations • 25 Jan 2024 • Cheng Qian, Shihao Liang, Yujia Qin, Yining Ye, Xin Cong, Yankai Lin, Yesai Wu, Zhiyuan Liu, Maosong Sun
This paper introduces Investigate-Consolidate-Exploit (ICE), a novel strategy for enhancing the adaptability and flexibility of AI agents through inter-task self-evolution.
1 code implementation • 25 Jan 2024 • Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, Qi Tian
Through 3D molecule-text alignment and 3D molecule-centric instruction tuning, 3D-MoLM establishes an integration of 3D molecular encoder and LM.
1 code implementation • 9 Jan 2024 • Runchu Tian, Yining Ye, Yujia Qin, Xin Cong, Yankai Lin, Yinxu Pan, Yesai Wu, Haotian Hui, Weichuan Liu, Zhiyuan Liu, Maosong Sun
Previous evaluations of LLMs' debugging ability are significantly limited by the risk of data leakage, the scale of the dataset, and the variety of tested bugs.
no code implementations • 28 Dec 2023 • Bohan Lyu, Xin Cong, Heyang Yu, Pan Yang, Yujia Qin, Yining Ye, Yaxi Lu, Zhong Zhang, Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun
As GitHub has hosted a multitude of repositories which can be seen as a good resource for tools, a promising solution is that LLM-based agents can autonomously integrate the repositories in GitHub according to the user queries to extend their tool set.
1 code implementation • 28 Dec 2023 • Chen Qian, Yufan Dang, Jiahao Li, Wei Liu, Zihao Xie, Yifei Wang, Weize Chen, Cheng Yang, Xin Cong, Xiaoyin Che, Zhiyuan Liu, Maosong Sun
Recent advancements in large language models (LLMs) have brought significant changes to various domains, especially through LLM-driven autonomous agents.
1 code implementation • 3 Dec 2023 • Xuanhe Zhou, Guoliang Li, Zhaoyan Sun, Zhiyuan Liu, Weize Chen, Jianming Wu, Jiesi Liu, Ruohang Feng, Guoyang Zeng
Database administrators (DBAs) play an important role in managing, maintaining and optimizing database systems.
3 code implementations • CVPR 2024 • Tianyu Yu, Yuan YAO, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, Tat-Seng Chua
Multimodal Large Language Models (MLLMs) have recently demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction.
Ranked #1 on
Visual Question Answering
on VQA v2
1 code implementation • 20 Nov 2023 • Ning Ding, Xingtai Lv, Qiaosen Wang, Yulin Chen, BoWen Zhou, Zhiyuan Liu, Maosong Sun
Recognizing the need for more flexible adaptation, we extend the methodology of LoRA to an innovative approach we call sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process.
1 code implementation • 16 Nov 2023 • Hanbin Wang, Zhenghao Liu, Shuo Wang, Ganqu Cui, Ning Ding, Zhiyuan Liu, Ge Yu
INTERVENOR prompts Large Language Models (LLMs) to play distinct roles during the code repair process, functioning as both a Code Learner and a Code Teacher.
Ranked #29 on
Code Generation
on MBPP
1 code implementation • 15 Nov 2023 • Xiaozhi Wang, Hao Peng, Yong Guan, Kaisheng Zeng, Jianhui Chen, Lei Hou, Xu Han, Yankai Lin, Zhiyuan Liu, Ruobing Xie, Jie zhou, Juanzi Li
Understanding events in texts is a core objective of natural language understanding, which requires detecting event occurrences, extracting event arguments, and analyzing inter-event relationships.
1 code implementation • 8 Nov 2023 • Ao Zhang, Yuan YAO, Wei Ji, Zhiyuan Liu, Tat-Seng Chua
The development of large language models (LLMs) has greatly advanced the field of multimodal understanding, leading to the emergence of large multimodal models (LMMs).
1 code implementation • 2 Nov 2023 • Yining Ye, Xin Cong, Shizuo Tian, Jiannan Cao, Hao Wang, Yujia Qin, Yaxi Lu, Heyang Yu, Huadong Wang, Yankai Lin, Zhiyuan Liu, Maosong Sun
Empirical experiments are conducted to detail its construction and execution procedure of workflow, showcasing the feasibility of APA, unveiling the possibility of a new paradigm of automation driven by agents.
2 code implementations • 25 Oct 2023 • Peixuan Han, Zhenghao Liu, Zhiyuan Liu, Chenyan Xiong
In this paper, we introduce WebDRO, an efficient approach for clustering the web graph data and optimizing group weights to enhance the robustness of dense retrieval models.
1 code implementation • 24 Oct 2023 • Qingquan Li, Yiran Hu, Feng Yao, Chaojun Xiao, Zhiyuan Liu, Maosong Sun, Weixing Shen
Furthermore, the case similarities are typically measured solely by the textual semantics of the fact descriptions, which may fail to capture the full complexity of legal cases from the perspective of legal knowledge.
1 code implementation • 24 Oct 2023 • Chaojun Xiao, Yuqi Luo, Wenbin Zhang, Pengle Zhang, Xu Han, Yankai Lin, Zhengyan Zhang, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie zhou
Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs.
1 code implementation • NeurIPS 2023 • Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua
Our results show that a subgraph-level tokenizer and a sufficiently expressive decoder with remask decoding have a large impact on the encoder's representation learning.
1 code implementation • 21 Oct 2023 • Tianshuo Zhou, Sen Mei, Xinze Li, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu, Yu Gu, Ge Yu
To facilitate the multi-modal retrieval tasks, we build the ClueWeb22-MM dataset based on the ClueWeb22 dataset, which regards anchor texts as queries, and extracts the related text and image documents from anchor-linked web pages.
no code implementations • 20 Oct 2023 • Zekai Qu, Ruobing Xie, Chaojun Xiao, Yuan YAO, Zhiyuan Liu, Fengzong Lian, Zhanhui Kang, Jie zhou
With the thriving of pre-trained language model (PLM) widely verified in various of NLP tasks, pioneer efforts attempt to explore the possible cooperation of the general textual information in PLM with the personalized behavioral information in user historical behavior sequences to enhance sequential recommendation (SR).
1 code implementation • 20 Oct 2023 • Yaorui Shi, An Zhang, Enzhi Zhang, Zhiyuan Liu, Xiang Wang
Predicting chemical reactions, a fundamental challenge in chemistry, involves forecasting the resulting products from a given reaction process.
1 code implementation • 19 Oct 2023 • Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua
MolCA enables an LM (e. g., Galactica) to understand both text- and graph-based molecular contents via the cross-modal projector.
Ranked #7 on
Molecule Captioning
on ChEBI-20
no code implementations • 19 Oct 2023 • Weize Chen, Xiaoyue Xu, Xu Han, Yankai Lin, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie zhou
Parameter-shared pre-trained language models (PLMs) have emerged as a successful approach in resource-constrained environments, enabling substantial reductions in model storage and memory costs without significant performance compromise.
1 code implementation • 8 Oct 2023 • Cheng Qian, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu
We first validate the efficacy of Toolink in harnessing the model's creativity and CoS ability on ChatGPT.
no code implementations • 5 Oct 2023 • Shengding Hu, Xin Liu, Xu Han, Xinrong Zhang, Chaoqun He, Weilin Zhao, Yankai Lin, Ning Ding, Zebin Ou, Guoyang Zeng, Zhiyuan Liu, Maosong Sun
With PassUntil, we conduct a quantitative investigation into the scaling law of task performance.
4 code implementations • 2 Oct 2023 • Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Bingxiang He, Wei Zhu, Yuan Ni, Guotong Xie, Ruobing Xie, Yankai Lin, Zhiyuan Liu, Maosong Sun
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models, serving as a solid foundation for future feedback learning research.
2 code implementations • 1 Oct 2023 • Tianyu Yu, Jinyi Hu, Yuan YAO, Haoye Zhang, Yue Zhao, Chongyi Wang, Shan Wang, Yinxv Pan, Jiao Xue, Dahai Li, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun
The capabilities of MLLMs depend on two crucial factors: the model architecture to facilitate the feature alignment of visual modules and large language models; the multimodal instruction tuning datasets for human instruction following.
1 code implementation • 26 Sep 2023 • Chenyang Song, Xu Han, Zheni Zeng, Kuai Li, Chen Chen, Zhiyuan Liu, Maosong Sun, Tao Yang
First, Static ConPET can adapt former continual learning methods originally designed for relatively smaller models to LLMs through PET and a dynamic replay strategy, which largely reduces the tuning costs and alleviates the over-fitting and forgetting issue.
no code implementations • 19 Sep 2023 • Kunlun Zhu, Shihao Liang, Xu Han, Zhi Zheng, Guoyang Zeng, Zhiyuan Liu, Maosong Sun
Recent years have witnessed the success of question answering (QA), especially its potential to be a foundation paradigm for tackling diverse NLP tasks.
no code implementations • 18 Sep 2023 • Rong Liu, Enyu Zhao, Zhiyuan Liu, Andrew Feng, Scott John Easley
Photorealistic style transfer aims to apply stylization while preserving the realism and structure of input content.
no code implementations • 15 Sep 2023 • Yulin Chen, Ning Ding, Hai-Tao Zheng, Zhiyuan Liu, Maosong Sun, BoWen Zhou
Artificial intelligence has been applied in various aspects of online education to facilitate teaching and learning.
1 code implementation • 27 Aug 2023 • Zhenghao Liu, Sen Mei, Chenyan Xiong, Xiaohua LI, Shi Yu, Zhiyuan Liu, Yu Gu, Ge Yu
TASTE alleviates the cold start problem by representing long-tail items using full-text modeling and bringing the benefits of pretrained language models to recommendation systems.
no code implementations • 24 Aug 2023 • Yining Ye, Xin Cong, Shizuo Tian, Yujia Qin, Chong Liu, Yankai Lin, Zhiyuan Liu, Maosong Sun
Central to the development of rationality is the construction of an internalized utility judgment, capable of assigning numerical utilities to each decision.
2 code implementations • 23 Aug 2023 • Jinyi Hu, Yuan YAO, Chongyi Wang, Shan Wang, Yinxu Pan, Qianyu Chen, Tianyu Yu, Hanghao Wu, Yue Zhao, Haoye Zhang, Xu Han, Yankai Lin, Jiao Xue, Dahai Li, Zhiyuan Liu, Maosong Sun
Building a competitive counterpart in other languages is highly challenging due to the low-resource nature of non-English multimodal data (i. e., lack of large-scale, high-quality image-text data).
1 code implementation • 21 Aug 2023 • Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie zhou
Autonomous agents empowered by Large Language Models (LLMs) have undergone significant improvements, enabling them to generalize across a broad spectrum of tasks.
3 code implementations • 14 Aug 2023 • Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, Zhiyuan Liu
Text evaluation has historically posed significant challenges, often demanding substantial labor and time cost.
1 code implementation • 10 Aug 2023 • Xuanhe Zhou, Guoliang Li, Zhiyuan Liu
Database administrators (DBAs) play a crucial role in managing, maintaining and optimizing a database system to ensure data availability, performance, and reliability.
2 code implementations • 31 Jul 2023 • Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, Maosong Sun
Based on ToolBench, we fine-tune LLaMA to obtain an LLM ToolLLaMA, and equip it with a neural API retriever to recommend appropriate APIs for each instruction.
Ranked #3 on
Trajectory Planning
on ToolBench
1 code implementation • 28 Jul 2023 • Shihao Liang, Runchu Tian, Kunlun Zhu, Yujia Qin, Huadong Wang, Xin Cong, Zhiyuan Liu, Xiaojiang Liu, Maosong Sun
Instruction tuning has emerged as a promising approach to enhancing large language models in following human instructions.
1 code implementation • 16 Jul 2023 • Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, Maosong Sun
Numerous studies used deep learning to improve specific phases in a waterfall model, such as design, coding, and testing.
1 code implementation • 15 Jul 2023 • Weilin Zhao, Yuxiang Huang, Xu Han, Zhiyuan Liu, Zhengyan Zhang, Kuai Li, Chen Chen, Tao Yang, Maosong Sun
To address this, there are two key methods available: the first is model compression, which compresses LLMs into smaller sizes; the second is LoRA, which can transfer an LLM to other tasks with very few parameters, avoiding the storage of multiple model variants in multi-task scenarios by only preserving LoRAs.
1 code implementation • 5 Jul 2023 • Shengding Hu, Yifan Luo, Huadong Wang, Xingyi Cheng, Zhiyuan Liu, Maosong Sun
In this paper, we find that the PLMs already possess the knowledge required to rebut such questions, and the key is how to activate the knowledge.
1 code implementation • 5 Jul 2023 • Shengding Hu, Ning Ding, Weilin Zhao, Xingtai Lv, Zhen Zhang, Zhiyuan Liu, Maosong Sun
The scale of large pre-trained models (PTMs) poses significant challenges in adapting to downstream tasks due to the high optimization overhead and storage costs associated with full-parameter fine-tuning.
1 code implementation • 27 Jun 2023 • Weihua Du, Jinglun Zhao, Chao Yu, Xingcheng Yao, Zimeng Song, Siyang Wu, Ruifeng Luo, Zhiyuan Liu, Xianzhong Zhao, Yi Wu
Directly applying end-to-end reinforcement learning (RL) methods to truss layout design is infeasible either, since only a tiny portion of the entire layout space is valid under the physical constraints, leading to particularly sparse rewards for RL training.
1 code implementation • 21 Jun 2023 • Zheni Zeng, Bangchen Yin, Shipeng Wang, Jiarui Liu, Cheng Yang, Haishen Yao, Xingzhi Sun, Maosong Sun, Guotong Xie, Zhiyuan Liu
Natural language is expected to be a key medium for various human-machine interactions in the era of large language models.
1 code implementation • 15 Jun 2023 • Jifan Yu, Xiaozhi Wang, Shangqing Tu, Shulin Cao, Daniel Zhang-li, Xin Lv, Hao Peng, Zijun Yao, Xiaohan Zhang, Hanming Li, Chunyang Li, Zheyuan Zhang, Yushi Bai, Yantao Liu, Amy Xin, Nianyi Lin, Kaifeng Yun, Linlu Gong, Jianhui Chen, Zhili Wu, Yunjia Qi, Weikai Li, Yong Guan, Kaisheng Zeng, Ji Qi, Hailong Jin, Jinxin Liu, Yu Gu, Yuan YAO, Ning Ding, Lei Hou, Zhiyuan Liu, Bin Xu, Jie Tang, Juanzi Li
The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations.
1 code implementation • 12 Jun 2023 • Hao Peng, Xiaozhi Wang, Feng Yao, Kaisheng Zeng, Lei Hou, Juanzi Li, Zhiyuan Liu, Weixing Shen
In this paper, we check the reliability of EE evaluations and identify three major pitfalls: (1) The data preprocessing discrepancy makes the evaluation results on the same dataset not directly comparable, but the data preprocessing details are not widely noted and specified in papers.
1 code implementation • 7 Jun 2023 • Lifan Yuan, Yangyi Chen, Ganqu Cui, Hongcheng Gao, Fangyuan Zou, Xingyi Cheng, Heng Ji, Zhiyuan Liu, Maosong Sun
Then we introduce BOSS, a Benchmark suite for Out-of-distribution robustneSS evaluation covering 5 tasks and 20 datasets.
1 code implementation • 4 Jun 2023 • Yusheng Su, Chi-Min Chan, Jiali Cheng, Yujia Qin, Yankai Lin, Shengding Hu, Zonghan Yang, Ning Ding, Xingzhi Sun, Guotong Xie, Zhiyuan Liu, Maosong Sun
Our investigations reveal that model scaling (1) mitigates the effects of the positions of tunable parameters on performance, and (2) enables tuning methods to achieve performance comparable to full-parameter fine-tuning by optimizing fewer tunable parameters.
2 code implementations • 31 May 2023 • Xinze Li, Zhenghao Liu, Chenyan Xiong, Shi Yu, Yu Gu, Zhiyuan Liu, Ge Yu
SANTA proposes two pretraining methods to make language models structure-aware and learn effective representations for structured data: 1) Structured Data Alignment, which utilizes the natural alignment relations between structured data and unstructured data for structure-aware pretraining.
no code implementations • 31 May 2023 • Yulin Chen, Ning Ding, Xiaobin Wang, Shengding Hu, Hai-Tao Zheng, Zhiyuan Liu, Pengjun Xie
Consistently scaling pre-trained language models (PLMs) imposes substantial burdens on model adaptation, necessitating more efficient alternatives to conventional fine-tuning.
1 code implementation • 29 May 2023 • Yangyi Chen, Hongcheng Gao, Ganqu Cui, Lifan Yuan, Dehan Kong, Hanlu Wu, Ning Shi, Bo Yuan, Longtao Huang, Hui Xue, Zhiyuan Liu, Maosong Sun, Heng Ji
In our experiments, we conduct a robustness evaluation of RoBERTa models to demonstrate the effectiveness of our evaluation framework, and further show the rationality of each component in the framework.
1 code implementation • 28 May 2023 • Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Chaojun Xiao, Xiaozhi Wang, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie zhou
In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes.
1 code implementation • 28 May 2023 • Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Huadong Wang, Deming Ye, Chaojun Xiao, Xu Han, Zhiyuan Liu, Peng Li, Maosong Sun, Jie zhou
Experimental results on three knowledge-driven NLP tasks show that existing injection methods are not suitable for the new paradigm, while map-tuning effectively improves the performance of downstream models.
1 code implementation • 28 May 2023 • Chaojun Xiao, Zhengyan Zhang, Xu Han, Chi-Min Chan, Yankai Lin, Zhiyuan Liu, Xiangyang Li, Zhonghua Li, Zhao Cao, Maosong Sun
By inserting document plugins into the backbone PTM for downstream tasks, we can encode a document one time to handle multiple tasks, which is more efficient than conventional encoding-task coupling methods that simultaneously encode documents and input queries using task-specific encoders.
1 code implementation • 28 May 2023 • Weize Chen, Xu Han, Yankai Lin, Zhiyuan Liu, Maosong Sun, Jie zhou
Since it is non-trivial to directly model the intermediate states and design a running cost function, we propose to use latent stochastic bridges to regularize the intermediate states and use the regularization as the running cost of PETs.
2 code implementations • 27 May 2023 • Zichun Yu, Chenyan Xiong, Shi Yu, Zhiyuan Liu
Retrieval augmentation can aid language models (LMs) in knowledge-intensive tasks by supplying them with external information.
1 code implementation • 24 May 2023 • Shi Yu, Chenghao Fan, Chenyan Xiong, David Jin, Zhiyuan Liu, Zhenghao Liu
Common document ranking pipelines in search systems are cascade systems that involve multiple ranking layers to integrate different information step-by-step.
2 code implementations • 23 May 2023 • Cheng Qian, Chi Han, Yi R. Fung, Yujia Qin, Zhiyuan Liu, Heng Ji
Additionally, we introduce the Creation Challenge dataset, featuring 2K diverse questions, to emphasize the necessity and benefits of LLMs' tool creation ability.
1 code implementation • 23 May 2023 • Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Zhi Zheng, Shengding Hu, Zhiyuan Liu, Maosong Sun, BoWen Zhou
Fine-tuning on instruction data has been widely validated as an effective practice for implementing chat language models like ChatGPT.
no code implementations • 19 May 2023 • Jinyi Hu, Xu Han, Xiaoyuan Yi, Yutong Chen, Wenhao Li, Zhiyuan Liu, Maosong Sun
IAP optimizes only a separate Chinese text encoder with all other parameters fixed to align Chinese semantics space to the English one in CLIP.
1 code implementation • 15 May 2023 • Yujia Qin, Cheng Qian, Xu Han, Yankai Lin, Huadong Wang, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie zhou
In pilot studies, we find that after continual pre-training, the upgraded PLM remains compatible with the outdated adapted weights to some extent.
1 code implementation • 11 May 2023 • Yujia Qin, Zihan Cai, Dian Jin, Lan Yan, Shihao Liang, Kunlun Zhu, Yankai Lin, Xu Han, Ning Ding, Huadong Wang, Ruobing Xie, Fanchao Qi, Zhiyuan Liu, Maosong Sun, Jie zhou
We recruit annotators to search for relevant information using our interface and then answer questions.
no code implementations • 4 May 2023 • Zhiyuan Liu, Chunjie Cao, Fangjian Tao, Jingzhang Sun
In the paper, we delve into the misconception and propose Multi-GNN and Augmented Graph contrastive framework MAG, which unified the existing GCAD methods in the contrastive self-supervised perspective.
1 code implementation • NeurIPS 2023 • Ao Zhang, Hao Fei, Yuan YAO, Wei Ji, Li Li, Zhiyuan Liu, Tat-Seng Chua
While developing a new multimodal LLM (MLLM) by pre-training on tremendous image-text pairs from scratch can be exceedingly resource-consuming, connecting an existing LLM with a comparatively lightweight visual prompt generator (VPG) becomes a feasible paradigm.
3 code implementations • 17 Apr 2023 • Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Yufei Huang, Chaojun Xiao, Chi Han, Yi Ren Fung, Yusheng Su, Huadong Wang, Cheng Qian, Runchu Tian, Kunlun Zhu, Shihao Liang, Xingyu Shen, Bokai Xu, Zhen Zhang, Yining Ye, Bowen Li, Ziwei Tang, Jing Yi, Yuzhang Zhu, Zhenning Dai, Lan Yan, Xin Cong, Yaxi Lu, Weilin Zhao, Yuxiang Huang, Junxi Yan, Xu Han, Xian Sun, Dahai Li, Jason Phang, Cheng Yang, Tongshuang Wu, Heng Ji, Zhiyuan Liu, Maosong Sun
Considering the lack of a systematic tool learning evaluation in prior works, we experiment with 18 representative tools and show the potential of current foundation models in skillfully utilizing tools.
1 code implementation • 12 Apr 2023 • Si Sun, Yida Lu, Shi Yu, Xiangyang Li, Zhonghua Li, Zhao Cao, Zhiyuan Liu, Deiming Ye, Jie Bao
Moreover, the dataset is disjointed into base and novel classes, allowing DR models to be continuously trained on ample data from base classes and a few samples in novel classes.
no code implementations • 8 Apr 2023 • Zihao Fu, Wai Lam, Qian Yu, Anthony Man-Cho So, Shengding Hu, Zhiyuan Liu, Nigel Collier
Grounded on our analysis, we propose a novel partial attention language model to solve the attention degeneration problem.
1 code implementation • 19 Feb 2023 • Ming Li, Yusheng Su, Hsiu-Yuan Huang, Jiali Cheng, Xin Hu, Xinmiao Zhang, Huadong Wang, Yujia Qin, Xiaozhi Wang, Kristen A. Lindquist, Zhiyuan Liu, Dan Zhang
Humans no doubt use language to communicate about their emotional experiences, but does language in turn help humans understand emotions, or is language just a vehicle of communication?
1 code implementation • 14 Feb 2023 • Chenglei Si, Zhengyan Zhang, Yingfa Chen, Xiaozhi Wang, Zhiyuan Liu, Maosong Sun
In order to fill this important gap, we construct READIN: a Chinese multi-task benchmark with REalistic And Diverse Input Noises.
no code implementations • 10 Feb 2023 • Lan Fu, Zhiyuan Liu, Jinlong Li, Jeff Simmons, Hongkai Yu, Song Wang
Accurate detection of large-scale, elliptical-shape fibers, including their parameters of center, orientation and major/minor axes, on the 2D cross-sectioned image slices is very important for characterizing the underlying cylinder 3D structures in microscopic material images.
3 code implementations • 16 Dec 2022 • Ganqu Cui, Wentao Li, Ning Ding, Longtao Huang, Zhiyuan Liu, Maosong Sun
With the evergrowing sizes of pre-trained models (PTMs), it has been an emerging practice to only provide the inference APIs for users, namely model-as-a-service (MaaS) setting.
no code implementations • 11 Dec 2022 • Zhiyuan Liu, Chunjie Cao, Jingzhang Sun
For a more comprehensive conclusion, we further investigate the effect of the objective function and the number of fused views on detection performance.
1 code implementation • 22 Nov 2022 • Yuan YAO, Tianyu Yu, Ao Zhang, Mengdi Li, Ruobing Xie, Cornelius Weber, Zhiyuan Liu, Hai-Tao Zheng, Stefan Wermter, Tat-Seng Chua, Maosong Sun
In this work, we present CLEVER, which formulates CKE as a distantly supervised multi-instance learning problem, where models learn to summarize commonsense relations from a bag of images about an entity pair without any human annotation on image instances.
1 code implementation • 14 Nov 2022 • Xiaozhi Wang, Yulin Chen, Ning Ding, Hao Peng, Zimu Wang, Yankai Lin, Xu Han, Lei Hou, Juanzi Li, Zhiyuan Liu, Peng Li, Jie zhou
It contains 103, 193 event coreference chains, 1, 216, 217 temporal relations, 57, 992 causal relations, and 15, 841 subevent relations, which is larger than existing datasets of all the ERE tasks by at least an order of magnitude.
1 code implementation • 14 Nov 2022 • Xiaozhi Wang, Kaiyue Wen, Zhengyan Zhang, Lei Hou, Zhiyuan Liu, Juanzi Li
Furthermore, we demonstrate the skill neurons are most likely generated in pre-training rather than fine-tuning by showing that the skill neurons found with prompt tuning are also crucial for other fine-tuning methods freezing neuron weights, such as the adapter-based tuning and BitFit.
1 code implementation • 13 Nov 2022 • Yufei Huang, Yujia Qin, Huadong Wang, Yichun Yin, Maosong Sun, Zhiyuan Liu, Qun Liu
Inspired by these observations, we propose Fast Prompt Tuning (FPT), which starts by conducting PT using a small-scale partial PLM, and then progressively expands its depth and width until the full-model size.
no code implementations • 10 Nov 2022 • Ning Ding, Yulin Chen, Ganqu Cui, Xiaobin Wang, Hai-Tao Zheng, Zhiyuan Liu, Pengjun Xie
Moreover, it is more convenient to perform metric-based classification with hypersphere prototypes than statistical modeling, as we only need to calculate the distance from a data point to the surface of the hypersphere.
1 code implementation • 8 Nov 2022 • Hao Peng, Xiaozhi Wang, Shengding Hu, Hailong Jin, Lei Hou, Juanzi Li, Zhiyuan Liu, Qun Liu
We believe this is a critical bottleneck for realizing human-like cognition in PLMs.
1 code implementation • NIPS 2022 • Shengding Hu, Zhen Zhang, Ning Ding, Yadao Wang, Yasheng Wang, Zhiyuan Liu, Maosong Sun
Generally, DT methods exquisitely design delta modules (DT modules) which could be applied to arbitrary fine-grained positions inside PTMs.
1 code implementation • 31 Oct 2022 • Si Sun, Chenyan Xiong, Yue Yu, Arnold Overwijk, Zhiyuan Liu, Jie Bao
In this paper, we investigate the instability in the standard dense retrieval training, which iterates between model training and hard negative selection using the being-trained model.
2 code implementations • 31 Oct 2022 • Yangyi Chen, Lifan Yuan, Ganqu Cui, Zhiyuan Liu, Heng Ji
We observe a consistent change in calibration performance across six factors.
1 code implementation • 25 Oct 2022 • Yujia Qin, Cheng Qian, Jing Yi, Weize Chen, Yankai Lin, Xu Han, Zhiyuan Liu, Maosong Sun, Jie zhou
(3) How does the PLM's task knowledge change along the path connecting two minima?
1 code implementation • 24 Oct 2022 • Jing Yi, Weize Chen, Yujia Qin, Yankai Lin, Ning Ding, Xu Han, Zhiyuan Liu, Maosong Sun, Jie zhou
To fathom the mystery, we hypothesize that the adaptations of different DETs could all be reparameterized as low-dimensional optimizations in a unified optimization subspace, which could be found by jointly decomposing independent solutions of different DETs.