Search Results for author: Zhiyuan Liu

Found 427 papers, 307 papers with code

BMInf: An Efficient Toolkit for Big Model Inference and Tuning

1 code implementation ACL 2022 Xu Han, Guoyang Zeng, Weilin Zhao, Zhiyuan Liu, Zhengyan Zhang, Jie zhou, Jun Zhang, Jia Chao, Maosong Sun

In recent years, large-scale pre-trained language models (PLMs) containing billions of parameters have achieved promising results on various NLP tasks.

Quantization Scheduling

Going “Deeper”: Structured Sememe Prediction via Transformer with Tree Attention

1 code implementation Findings (ACL) 2022 Yining Ye, Fanchao Qi, Zhiyuan Liu, Maosong Sun

However, all existing sememe prediction studies ignore the hierarchical structures of sememes, which are important in the sememe-based semantic description system.

Do Pre-trained Models Benefit Knowledge Graph Completion? A Reliable Evaluation and a Reasonable Approach

1 code implementation Findings (ACL) 2022 Xin Lv, Yankai Lin, Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu, Peng Li, Jie zhou

In recent years, pre-trained language models (PLMs) have been shown to capture factual knowledge from massive texts, which encourages the proposal of PLM-based knowledge graph completion (KGC) models.

Knowledge Graph Completion Link Prediction

CodRED: A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild

1 code implementation EMNLP 2021 Yuan YAO, Jiaju Du, Yankai Lin, Peng Li, Zhiyuan Liu, Jie zhou, Maosong Sun

Existing relation extraction (RE) methods typically focus on extracting relational facts between entity pairs within single sentences or documents.

Relation Relation Extraction

Pass off Fish Eyes for Pearls: Attacking Model Selection of Pre-trained Models

1 code implementation ACL 2022 Biru Zhu, Yujia Qin, Fanchao Qi, Yangdong Deng, Zhiyuan Liu, Maosong Sun, Ming Gu

To validate our viewpoints, we design two methods to evaluate the robustness of FMS: (1) model disguise attack, which post-trains an inferior PTM with a contrastive objective, and (2) evaluation data selection, which selects a subset of the data points for FMS evaluation based on K-means clustering.

Backdoor Attack Model Selection

An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes

no code implementations21 Apr 2025 Ji Qi, Yuan YAO, Yushi Bai, Bin Xu, Juanzi Li, Zhiyuan Liu, Tat-Seng Chua

Large Multimodal Models (LMMs) uniformly perceive video frames, creating computational inefficiency for videos with inherently varying temporal information density.

MME Video MME +1

LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources

1 code implementation8 Apr 2025 Haoyu Wang, Yujia Fu, Zhu Zhang, Shuo Wang, Zirui Ren, Xiaorong Wang, Zhili Li, Chaoqun He, Bo An, Zhiyuan Liu, Maosong Sun

Long-form generation is crucial for a wide range of practical applications, typically categorized into short-to-long and long-to-long generation.

Form

AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset

no code implementations4 Apr 2025 Bingxiang He, Wenbin Zhang, Jiaxi Song, Cheng Qian, Zixuan Fu, Bowen Sun, Ning Ding, Haiwen Hong, Longtao Huang, Hui Xue, Ganqu Cui, Wanxiang Che, Zhiyuan Liu, Maosong Sun

Preference learning is critical for aligning large language models (LLMs) with human values, yet its success hinges on high-quality datasets comprising three core components: Preference \textbf{A}nnotations, \textbf{I}nstructions, and \textbf{R}esponse Pairs.

XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?

no code implementations31 Mar 2025 Fengxiang Wang, Hongzhen Wang, Mingshuo Chen, Di Wang, Yulin Wang, Zonghao Guo, Qiang Ma, Long Lan, Wenjing Yang, Jing Zhang, Zhiyuan Liu, Maosong Sun

On top of the XLRS-Bench, 16 sub-tasks are defined to evaluate MLLMs' 10 kinds of perceptual capabilities and 6 kinds of reasoning capabilities, with a primary emphasis on advanced cognitive processes that facilitate real-world decision-making and the capture of spatiotemporal changes.

UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation

1 code implementation31 Mar 2025 Yuxuan Chen, Dewen Guo, Sen Mei, Xinze Li, Hao Chen, Yishan Li, YiXuan Wang, Chaoyue Tang, Ruobing Wang, Dingjun Wu, Yukun Yan, Zhenghao Liu, Shi Yu, Zhiyuan Liu, Maosong Sun

Retrieval-Augmented Generation (RAG) significantly enhances the performance of large language models (LLMs) in downstream tasks by integrating external knowledge.

RAG Retrieval

OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning

no code implementations20 Mar 2025 Zhiyuan Liu, Yuting Zhang, Feng Liu, Changwang Zhang, Ying Sun, Jun Wang

Multimodal Large Language Models (MLLMs) have gained significant traction for their ability to process diverse input data types and generate coherent, contextually relevant outputs across various applications.

Reinforcement Learning (RL)

Towards Unified Latent Space for 3D Molecular Latent Diffusion Modeling

no code implementations19 Mar 2025 Yanchen Luo, Zhiyuan Liu, Yi Zhao, Sihang Li, Kenji Kawaguchi, Tat-Seng Chua, Xiang Wang

In this work, we propose \textbf{U}nified Variational \textbf{A}uto-\textbf{E}ncoder for \textbf{3D} Molecular Latent Diffusion Modeling (\textbf{UAE-3D}), a multi-modal VAE that compresses 3D molecules into latent sequences from a unified latent space, while maintaining near-zero reconstruction error.

3D Molecule Generation Drug Discovery +1

A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules

1 code implementation17 Mar 2025 Kairong Luo, Haodong Wen, Shengding Hu, Zhenbo Sun, Zhiyuan Liu, Maosong Sun, Kaifeng Lyu, WenGuang Chen

Training large models is both resource-intensive and time-consuming, making it crucial to understand the quantitative relationship between model performance and hyperparameters.

WiFi-Diffusion: Achieving Fine-Grained WiFi Radio Map Estimation With Ultra-Low Sampling Rate by Diffusion Models

no code implementations15 Mar 2025 Zhiyuan Liu, Shuhang Zhang, Qingyu Liu, Hongliang Zhang, Lingyang Song

Fine-grained radio map presents communication parameters of interest, e. g., received signal strength, at every point across a large geographical region.

Multimodal Language Modeling for High-Accuracy Single Cell Transcriptomics Analysis and Generation

1 code implementation12 Mar 2025 Yaorui Shi, Jiaqi Yang, Sihang Li, Junfeng Fang, Xiang Wang, Zhiyuan Liu, Yang Zhang

Pre-trained language models (PLMs) have revolutionized scientific research, yet their application to single-cell analysis remains limited.

Language Modeling Language Modelling

Cost-Optimal Grouped-Query Attention for Long-Context LLMs

1 code implementation12 Mar 2025 Yingfa Chen, Yutong Wu, Xu Han, Zhiyuan Liu, Maosong Sun

However, they overlook the impacts of context length and attention head configuration (the number of query and key-value heads in grouped-query attention) on training and inference.

AgentRM: Enhancing Agent Generalization with Reward Modeling

no code implementations25 Feb 2025 Yu Xia, Jingru Fan, Weize Chen, Siyu Yan, Xin Cong, Zhong Zhang, Yaxi Lu, Yankai Lin, Zhiyuan Liu, Maosong Sun

Based on this finding, we propose AgentRM, a generalizable reward model, to guide the policy model for effective test-time search.

Answer Generation

PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning

1 code implementation21 Feb 2025 Pengcheng Huang, Zhenghao Liu, Yukun Yan, Xiaoyuan Yi, Hao Chen, Zhiyuan Liu, Maosong Sun, Tong Xiao, Ge Yu, Chenyan Xiong

Knowledge-Augmented Generation (KAG) has shown great promise in updating the internal memory of Large Language Models (LLMs) by integrating external knowledge.

Hallucination

LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models

1 code implementation20 Feb 2025 Shangqing Tu, Yucheng Wang, Daniel Zhang-li, Yushi Bai, Jifan Yu, Yuhao Wu, Lei Hou, Huiqin Liu, Zhiyuan Liu, Bin Xu, Juanzi Li

Moreover, to achieve long outputs that maintain high-fidelity to the input images, we employ Direct Preference Optimization (DPO) to the SFT model.

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

1 code implementation20 Feb 2025 Jianling Li, Shangzhan Li, Zhenye Gao, Qi Shi, YuXuan Li, Zefan Wang, Jiacheng Huang, Haojie Wang, Jianrong Wang, Xu Han, Zhiyuan Liu, Maosong Sun

Despite advances in large language models (LLMs) for conventional code generation, these models struggle to generate accurate, performance-optimized Triton code, as they lack awareness of its specifications and the complexities of GPU programming.

Benchmarking Code Generation +3

FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling

no code implementations20 Feb 2025 Weilin Zhao, Tengyu Pan, Xu Han, Yudi Zhang, Ao Sun, Yuxiang Huang, Kaihuo Zhang, Weilun Zhao, YuXuan Li, Jianyong Wang, Zhiyuan Liu, Maosong Sun

Speculative sampling has emerged as an important technique for accelerating the auto-regressive generation process of large language models (LLMs) by utilizing a draft-then-verify mechanism to produce multiple tokens per forward pass.

Language Modeling Language Modelling

Craw4LLM: Efficient Web Crawling for LLM Pretraining

1 code implementation19 Feb 2025 Shi Yu, Zhiyuan Liu, Chenyan Xiong

Web crawl is a main source of large language models' (LLMs) pretraining data, but the majority of crawled web pages are discarded in pretraining due to low data quality.

10-shot image generation

NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation

1 code implementation18 Feb 2025 Zhiyuan Liu, Yanchen Luo, Han Huang, Enzhi Zhang, Sihang Li, Junfeng Fang, Yaorui Shi, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

To combine these advantages for 3D molecule generation, we propose a foundation model -- NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation.

3D Generation 3D Molecule Generation +4

Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search

1 code implementation18 Feb 2025 Yifan Ji, Zhipeng Xu, Zhenghao Liu, Yukun Yan, Shi Yu, Yishan Li, Zhiyuan Liu, Yu Gu, Ge Yu, Maosong Sun

Recent dense retrievers usually thrive on the emergency capabilities of Large Language Models (LLMs), using them to encode queries and documents into an embedding space for retrieval.

Retrieval

Exploring LLM-based Student Simulation for Metacognitive Cultivation

no code implementations17 Feb 2025 Haoxuan Li, Jifan Yu, Xin Cong, Yang Dang, Yisi Zhan, Huiqin Liu, Zhiyuan Liu

Metacognitive education plays a crucial role in cultivating students' self-regulation and reflective thinking, providing essential support for those with learning difficulties through academic advising.

APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs

1 code implementation17 Feb 2025 Yuxiang Huang, Mingye Li, Xu Han, Chaojun Xiao, Weilin Zhao, Sun Ao, Hao Zhou, Jie zhou, Zhiyuan Liu, Maosong Sun

While long-context inference is crucial for advancing large language model (LLM) applications, its prefill speed remains a significant bottleneck.

Language Modeling Language Modelling +1

Diffusion Models for Molecules: A Survey of Methods and Tasks

1 code implementation13 Feb 2025 Chao Song, Zhiyuan Liu, Yu Rong, Qiang Liu, Shu Wu, Liang Wang

This survey aims to facilitate understanding and further flourishing development in this area.

Diversity Drug Discovery +2

Process Reinforcement through Implicit Rewards

1 code implementation3 Feb 2025 Ganqu Cui, Lifan Yuan, Zefan Wang, Hanbin Wang, Wendi Li, Bingxiang He, Yuchen Fan, Tianyu Yu, Qixin Xu, Weize Chen, Jiarui Yuan, Huayu Chen, Kaiyan Zhang, Xingtai Lv, Shuo Wang, Yuan YAO, Xu Han, Hao Peng, Yu Cheng, Zhiyuan Liu, Maosong Sun, BoWen Zhou, Ning Ding

While dense rewards also offer an appealing choice for the reinforcement learning (RL) of LLMs since their fine-grained rewards have the potential to address some inherent issues of outcome rewards, such as training efficiency and credit assignment, this potential remains largely unrealized.

Math Reinforcement Learning (RL)

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

1 code implementation11 Jan 2025 Xuanle Zhao, Xianzhen Luo, Qi Shi, Chi Chen, Shuo Wang, Wanxiang Che, Zhiyuan Liu, Maosong Sun

: (1) Low executability and poor restoration of chart details in the generated code and (2) Lack of large-scale and diverse training data.

Chart Understanding Code Generation +4

Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models

no code implementations10 Jan 2025 You Li, Heyu Huang, Chi Chen, Kaiyu Huang, Chao Huang, Zonghao Guo, Zhiyuan Liu, Jinan Xu, Yuhua Li, Ruixuan Li, Maosong Sun

The recent advancement of Multimodal Large Language Models (MLLMs) has significantly improved their fine-grained perception of single images and general comprehension across multiple images.

Form Image Comprehension +1

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

1 code implementation18 Dec 2024 YiPeng Zhang, Yifan Liu, Zonghao Guo, Yidan Zhang, Xuesong Yang, Chi Chen, Jun Song, Bo Zheng, Yuan YAO, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

To address this issue, we present LLaVA-UHD v2, an advanced MLLM centered around a Hierarchical window transformer that enables capturing diverse visual granularity by constructing and integrating a high-resolution feature pyramid.

Attribute Text Generation

Graph Coarsening via Supervised Granular-Ball for Scalable Graph Neural Network Training

no code implementations18 Dec 2024 Shuyin Xia, Xinjun Ma, Zhiyuan Liu, Cheng Liu, Sen Zhao, Guoyin Wang

Experimental results demonstrate that our method achieves accuracy comparable to training on the original graph.

Graph Neural Network

ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer

1 code implementation10 Dec 2024 Jinyi Hu, Shengding Hu, Yuxuan Song, Yufei Huang, Mingxuan Wang, Hao Zhou, Zhiyuan Liu, Wei-Ying Ma, Maosong Sun

The analysis of the trade-off between autoregressive modeling and diffusion demonstrates the potential of ACDiT to be used in long-horizon visual generation tasks.

Denoising Image Generation +1

Intelligent System for Automated Molecular Patent Infringement Assessment

no code implementations10 Dec 2024 Yaorui Shi, Sihang Li, Taiyan Zhang, Xi Fang, Jiankun Wang, Zhiyuan Liu, Guojiang Zhao, Zhengdan Zhu, Zhifeng Gao, Renxin Zhong, Linfeng Zhang, Guolin Ke, Weinan E, Hengxing Cai, Xiang Wang

Automated drug discovery offers significant potential for accelerating the development of novel therapeutics by substituting labor-intensive human workflows with machine-driven processes.

Drug Discovery

Densing Law of LLMs

no code implementations5 Dec 2024 Chaojun Xiao, Jie Cai, Weilin Zhao, Guoyang Zeng, Biyuan Lin, Jie zhou, Zhi Zheng, Xu Han, Zhiyuan Liu, Maosong Sun

This paper introduces the concept of ``\textit{capacity density}'' as a new metric to evaluate the quality of the LLMs across different scales and describes the trend of LLMs in terms of both effectiveness and efficiency.

Free Process Rewards without Process Labels

2 code implementations2 Dec 2024 Lifan Yuan, Wendi Li, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, BoWen Zhou, Zhiyuan Liu, Hao Peng

The only assumption is to parameterize the outcome reward as the log-likelihood ratios of the policy and reference models, which can be optimized regardless of the specific choice of loss objectives.

Math

KBAlign: Efficient Self Adaptation on Specific Knowledge Bases

1 code implementation22 Nov 2024 Zheni Zeng, Yuxuan Chen, Shi Yu, Ruobing Wang, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, Maosong Sun

Humans can utilize techniques to quickly acquire knowledge from specific materials in advance, such as creating self-assessment questions, enabling us to achieving related tasks more efficiently.

Self-Learning

ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis

1 code implementation11 Nov 2024 Zanlin Ni, Yulin Wang, Renping Zhou, Yizeng Han, Jiayi Guo, Zhiyuan Liu, Yuan YAO, Gao Huang

At the spatial level, we disentangle the computations of visible and mask tokens by encoding visible tokens independently, while decoding mask tokens conditioned on the fully encoded visible tokens.

Image Generation

Sparsing Law: Towards Large Language Models with Greater Activation Sparsity

1 code implementation4 Nov 2024 Yuqi Luo, Chenyang Song, Xu Han, Yingfa Chen, Chaojun Xiao, Zhiyuan Liu, Maosong Sun

These demonstrate that ReLU is more efficient as the activation function than SiLU and can leverage more training data to improve activation sparsity.

MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning

no code implementations30 Oct 2024 Xujia Wang, Haiyan Zhao, Shuo Wang, Hanqing Wang, Zhiyuan Liu

Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have significantly improved the adaptation of LLMs to downstream tasks in a resource-efficient manner.

Computational Efficiency Mixture-of-Experts +2

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

1 code implementation17 Oct 2024 Xinze Li, Sen Mei, Zhenghao Liu, Yukun Yan, Shuo Wang, Shi Yu, Zheni Zeng, Hao Chen, Ge Yu, Zhiyuan Liu, Maosong Sun, Chenyan Xiong

Our experiments on various knowledge-intensive tasks demonstrate that DDR significantly outperforms the SFT method, particularly for LLMs with smaller-scale parameters that depend more on the retrieved knowledge.

RAG Retrieval

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

1 code implementation16 Oct 2024 Yaxi Lu, Shenzhi Yang, Cheng Qian, Guirong Chen, Qinyu Luo, Yesai Wu, Huadong Wang, Xin Cong, Zhong Zhang, Yankai Lin, Weiwen Liu, Yasheng Wang, Zhiyuan Liu, Fangming Liu, Maosong Sun

The labeled data is used to train a reward model that simulates human judgment and serves as an automatic evaluator of the proactiveness of LLM agents.

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

1 code implementation14 Oct 2024 Shi Yu, Chaoyue Tang, Bokai Xu, Junbo Cui, Junhao Ran, Yukun Yan, Zhenghao Liu, Shuo Wang, Xu Han, Zhiyuan Liu, Maosong Sun

In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.

RAG Retrieval

LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models

1 code implementation12 Oct 2024 Zihan Zhou, Chong Li, Xinyi Chen, Shuo Wang, Yu Chao, Zhili Li, Haoyu Wang, Rongqiao An, Qi Shi, Zhixing Tan, Xu Han, Xiaodong Shi, Zhiyuan Liu, Maosong Sun

The proposed LLM$\times$MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate answers to produce the final output.

document understanding

Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation

1 code implementation11 Oct 2024 Ruobing Wang, Daren Zha, Shi Yu, Qingfei Zhao, Yuxuan Chen, YiXuan Wang, Shuo Wang, Yukun Yan, Zhenghao Liu, Xu Han, Zhiyuan Liu, Maosong Sun

Retrieval-Augmented Generation (RAG) mitigates issues of the factual errors and hallucinated outputs generated by Large Language Models (LLMs) in open-domain question-answering tasks (OpenQA) via introducing external knowledge.

Open-Domain Question Answering RAG +1

Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System

no code implementations10 Oct 2024 Weize Chen, Jiarui Yuan, Chen Qian, Cheng Yang, Zhiyuan Liu, Maosong Sun

Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving, yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods.

Large Language Model Question Answering

Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling

no code implementations9 Oct 2024 Yingfa Chen, Xinrong Zhang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun

For the second concern, we train a series of Mamba-2 models on long documents to empirically estimate the recurrent state capacity in language modeling and passkey retrieval.

Attribute Language Modeling +3

DecorateLM: Data Engineering through Corpus Rating, Tagging, and Editing with Language Models

no code implementations8 Oct 2024 Ranchi Zhao, Zhen Leng Thai, Yifan Zhang, Shengding Hu, Yunqi Ba, Jie zhou, Jie Cai, Zhiyuan Liu, Maosong Sun

The performance of Large Language Models (LLMs) is substantially influenced by the pretraining corpus, which consists of vast quantities of unsupervised data processed by the models.

Language Modeling Language Modelling +2

Exploring the Benefit of Activation Sparsity in Pre-training

1 code implementation4 Oct 2024 Zhengyan Zhang, Chaojun Xiao, Qiujieli Qin, Yankai Lin, Zhiyuan Zeng, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie zhou

SSD adaptively switches between the Mixtures-of-Experts (MoE) based sparse training and the conventional dense training during the pre-training process, leveraging the efficiency of sparse training and avoiding the static activation correlation of sparse training.

Text-guided Diffusion Model for 3D Molecule Generation

no code implementations4 Oct 2024 Yanchen Luo, Junfeng Fang, Sihang Li, Zhiyuan Liu, Jiancan Wu, An Zhang, Wenjie Du, Xiang Wang

The de novo generation of molecules with targeted properties is crucial in biology, chemistry, and drug discovery.

3D Molecule Generation Diversity +2

Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices

1 code implementation2 Oct 2024 Yuxiang Huang, Binhang Yuan, Xu Han, Chaojun Xiao, Zhiyuan Liu

Scaling the input context length of a large language model (LLM) incurs a significant increase in computation cost and memory footprint to maintain the attention key-value (KV) cache.

Language Modeling Language Modelling +2

Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models

2 code implementations29 Sep 2024 Xin Li, Weize Chen, Qizhi Chu, Haopeng Li, Zhaojun Sun, Ran Li, Chen Qian, Yiwei Wei, Zhiyuan Liu, Chuan Shi, Maosong Sun, Cheng Yang

Our results underscore that the capabilities of LLMs in handling structured data are still under-explored, and show the effectiveness of LLM4Graph in enhancing LLMs' proficiency of graph analysis.

Recommendation Systems

Awaking the Slides: A Tuning-free and Knowledge-regulated AI Tutoring System via Language Model Coordination

no code implementations11 Sep 2024 Daniel Zhang-li, Zheyuan Zhang, Jifan Yu, Joy Lim Jia Yin, Shangqing Tu, Linlu Gong, Haohua Wang, Zhiyuan Liu, Huiqin Liu, Lei Hou, Juanzi Li

We develop Slide2Lecture, a tuning-free and knowledge-regulated intelligent tutoring system that can (1) effectively convert an input lecture slide into a structured teaching agenda consisting of a set of heterogeneous teaching actions; (2) create and manage an interactive lecture that generates responsive interactions catering to student learning demands while regulating the interactions to follow teaching actions.

Language Modeling Language Modelling

From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents

no code implementations5 Sep 2024 Jifan Yu, Zheyuan Zhang, Daniel Zhang-li, Shangqing Tu, Zhanxin Hao, Rui Miao Li, Haoxuan Li, Yuanchun Wang, Hanming Li, Linlu Gong, Jie Cao, Jiayin Lin, Jinchang Zhou, Fei Qin, Haohua Wang, Jianxiao Jiang, Lijun Deng, Yisi Zhan, Chaojun Xiao, Xusheng Dai, Xuan Yan, Nianyi Lin, Nan Zhang, Ruixin Ni, Yang Dang, Lei Hou, Yu Zhang, Xu Han, Manli Li, Juanzi Li, Zhiyuan Liu, Huiqin Liu, Maosong Sun

Since the first instances of online education, where courses were uploaded to accessible and shared online platforms, this form of scaling the dissemination of human knowledge to reach a broader audience has sparked extensive discussion and widespread adoption.

Configurable Foundation Models: Building LLMs from a Modular Perspective

no code implementations4 Sep 2024 Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, GuanYu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun

We first formalize modules into emergent bricks - functional neuron partitions that emerge during the pre-training phase, and customized bricks - bricks constructed via additional post-training to improve the capabilities and knowledge of LLMs.

Computational Efficiency Mixture-of-Experts

Multi-Modal Multi-Granularity Tokenizer for Chu Bamboo Slip Scripts

1 code implementation2 Sep 2024 Yingfa Chen, Chenlong Hu, Cong Feng, Chenyang Song, Shi Yu, Xu Han, Zhiyuan Liu, Maosong Sun

This study presents a multi-modal multi-granularity tokenizer specifically designed for analyzing ancient Chinese scripts, focusing on the Chu bamboo slip (CBS) script used during the Spring and Autumn and Warring States period (771-256 BCE) in Ancient China.

Part-Of-Speech Tagging

AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation

1 code implementation31 Aug 2024 Zanlin Ni, Yulin Wang, Renping Zhou, Rui Lu, Jiayi Guo, Jinyi Hu, Zhiyuan Liu, Yuan YAO, Gao Huang

As a representative work, non-autoregressive Transformers (NATs) are able to synthesize images with decent quality in a small number of steps.

Image Generation Scheduling

AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems

1 code implementation27 Aug 2024 Chi-Min Chan, Jianxuan Yu, Weize Chen, Chunyang Jiang, Xinyu Liu, Weijie Shi, Zhiyuan Liu, Wei Xue, Yike Guo

However, configuring an MAS for a task remains challenging, with performance only observable post-execution.

COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis

1 code implementation9 Aug 2024 Weiqing Yang, Hanbin Wang, Zhenghao Liu, Xinze Li, Yukun Yan, Shuo Wang, Yu Gu, Minghe Yu, Zhiyuan Liu, Ge Yu

In this paper, we introduce DEBUGEVAL, a comprehensive benchmark for evaluating the debugging abilities of LLMs by emulating the multi-stage human debugging process.

Code Generation Code Repair

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

2 code implementations3 Aug 2024 Yuan YAO, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of AI research and industry, shedding light on a promising path toward the next AI milestone.

Hallucination Multiple-choice +3

RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework

1 code implementation2 Aug 2024 Kunlun Zhu, Yifan Luo, Dingling Xu, Yukun Yan, Zhenghao Liu, Shi Yu, Ruobing Wang, Shuo Wang, Yishan Li, Nan Zhang, Xu Han, Zhiyuan Liu, Maosong Sun

However, evaluating the effectiveness of RAG systems in specialized scenarios remains challenging due to the high costs of data construction and the lack of suitable evaluation metrics.

Benchmarking Dataset Generation +5

Interior Object Geometry via Fitted Frames

no code implementations19 Jul 2024 Stephen M. Pizer, Zhiyuan Liu, Junjie Zhao, Nicholas Tapp-Hughes, James Damon, Miaomiao Zhang, JS Marron, Jared Vicory

We describe a representation targeted for anatomic objects which is designed to enable strong locational correspondence within object populations and thus to provide powerful object statistics.

Object

PersLLM: A Personified Training Approach for Large Language Models

1 code implementation17 Jul 2024 Zheni Zeng, Jiayi Chen, Huimin Chen, Yukun Yan, Yuxuan Chen, Zhenghao Liu, Zhiyuan Liu, Maosong Sun

Large language models exhibit aspects of human-level intelligence that catalyze their application as human-like agents in domains such as social simulations, human-machine interactions, and collaborative multi-agent systems.

Prompt Engineering

States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly

no code implementations16 Jul 2024 JunHao Chen, Shengding Hu, Zhiyuan Liu, Maosong Sun

Our work presents a novel exploration of LLMs' symbolic calculation abilities and the underlying mechanisms.

Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

1 code implementation9 Jul 2024 Weize Chen, Ziming You, Ran Li, Yitong Guan, Chen Qian, Chenyang Zhao, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun

The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents.

Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

1 code implementation22 Jun 2024 Xinrong Zhang, Yingfa Chen, Shengding Hu, Xu Han, Zihang Xu, Yuanwei Xu, Weilin Zhao, Maosong Sun, Zhiyuan Liu

Specifically, we divide the queries and responses of conversations into several time slices and then adopt a time-division-multiplexing (TDM) encoding-decoding strategy to pseudo-simultaneously process these slices.

PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

1 code implementation19 Jun 2024 He Cao, Yanjun Shao, Zhiyuan Liu, Zijing Liu, Xiangru Tang, Yuan YAO, Yu Li

Current approaches, however, often neglect the critical role of multiple molecule graph interaction in understanding chemical reactions, leading to suboptimal performance in synthetic chemistry tasks.

cross-modal alignment

Scaling Efficient Masked Image Modeling on Large Remote Sensing Dataset

1 code implementation17 Jun 2024 Fengxiang Wang, Hongzhen Wang, Di Wang, Zonghao Guo, Zhenyu Zhong, Long Lan, Jing Zhang, Zhiyuan Liu, Maosong Sun

To address these issues, we present a new pre-training pipeline for RS models, featuring the creation of a large-scale RS dataset and an efficient MIM approach.

Aerial Scene Classification Diversity +4

Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity

no code implementations17 Jun 2024 Bingxiang He, Ning Ding, Cheng Qian, Jia Deng, Ganqu Cui, Lifan Yuan, Huan-ang Gao, Huimin Chen, Zhiyuan Liu, Maosong Sun

For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.

Continual Learning Zero-shot Generalization

Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

1 code implementation13 Jun 2024 Bowen Ping, Shuo Wang, Hanqing Wang, Xu Han, Yuzhuang Xu, Yukun Yan, Yun Chen, Baobao Chang, Zhiyuan Liu, Maosong Sun

Motivated by the long-tail distribution of singular values in the delta weights, we propose a delta quantization approach using mixed-precision.

Math Quantization

Scaling Large Language Model-based Multi-Agent Collaboration

1 code implementation11 Jun 2024 Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Kunlun Zhu, Hanchen Xia, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun

Recent breakthroughs in large language model-driven autonomous agents have revealed that multi-agent collaboration often surpasses each individual through collective reasoning.

Language Modeling Language Modelling +2

Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

1 code implementation CVPR 2024 Zanlin Ni, Yulin Wang, Renping Zhou, Jiayi Guo, Jinyi Hu, Zhiyuan Liu, Shiji Song, Yuan YAO, Gao Huang

In this paper, we aim to re-evaluate the full potential of NATs by revisiting the design of their training and inference strategies.

Image Generation

UltraMedical: Building Specialized Generalists in Biomedicine

1 code implementation6 Jun 2024 Kaiyan Zhang, Sihang Zeng, Ermo Hua, Ning Ding, Zhang-Ren Chen, Zhiyuan Ma, Haoxin Li, Ganqu Cui, Biqing Qi, Xuekai Zhu, Xingtai Lv, Hu Jinfang, Zhiyuan Liu, BoWen Zhou

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains and are moving towards more specialized areas.

ReactXT: Understanding Molecular "Reaction-ship" via Reaction-Contextualized Molecule-Text Pretraining

1 code implementation23 May 2024 Zhiyuan Liu, Yaorui Shi, An Zhang, Sihang Li, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

To resolve the challenges above, we propose a new pretraining method, ReactXT, for reaction-text modeling, and a new dataset, OpenExp, for experimental procedure prediction.

Molecule Captioning Prediction +1

ProtT3: Protein-to-Text Generation for Text-based Protein Understanding

1 code implementation21 May 2024 Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

ProtT3 empowers an LM to understand protein sequences of amino acids by incorporating a PLM as its protein understanding module, enabling effective protein-to-text generation.

Property Prediction Question Answering +2

Iterative Experience Refinement of Software-Developing Agents

no code implementations7 May 2024 Chen Qian, Jiahao Li, Yufan Dang, Wei Liu, Yifei Wang, Zihao Xie, Weize Chen, Cheng Yang, Yingli Zhang, Zhiyuan Liu, Maosong Sun

We propose two fundamental patterns: the successive pattern, refining based on nearest experiences within a task batch, and the cumulative pattern, acquiring experiences across all previous task batches.

LEGENT: Open Platform for Embodied Agents

no code implementations28 Apr 2024 Zhili Cheng, Zhitong Wang, Jinyi Hu, Shengding Hu, An Liu, Yuge Tu, Pengkai Li, Lei Shi, Zhiyuan Liu, Maosong Sun

Despite advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), their integration into language-grounded, human-like embodied agents remains incomplete, hindering complex real-life task performance in physical environments.

Vision-Language-Action

PreGSU-A Generalized Traffic Scene Understanding Model for Autonomous Driving based on Pre-trained Graph Attention Network

no code implementations16 Apr 2024 Yuning Wang, Zhiyuan Liu, Haotian Lin, Junkai Jiang, Shaobing Xu, Jianqiang Wang

In this study, we propose PreGSU, a generalized pre-trained scene understanding model based on graph attention network to learn the universal interaction and reasoning of traffic scenes to support various downstream tasks.

Autonomous Driving Feature Engineering +4

UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs

1 code implementation11 Apr 2024 Chaoqun He, Renjie Luo, Shengding Hu, Yuanqian Zhao, Jie zhou, Hanghao Wu, Jiajie Zhang, Xu Han, Zhiyuan Liu, Maosong Sun

The rapid development of LLMs calls for a lightweight and easy-to-use framework for swift evaluation deployment.

Robust and Scalable Model Editing for Large Language Models

1 code implementation26 Mar 2024 Yingfa Chen, Zhengyan Zhang, Xu Han, Chaojun Xiao, Zhiyuan Liu, Chen Chen, Kuai Li, Tao Yang, Maosong Sun

Large language models (LLMs) can make predictions using parametric knowledge--knowledge encoded in the model weights--or contextual knowledge--knowledge presented in the context.

Model Editing

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

1 code implementation18 Mar 2024 Ruyi Xu, Yuan YAO, Zonghao Guo, Junbo Cui, Zanlin Ni, Chunjiang Ge, Tat-Seng Chua, Zhiyuan Liu, Maosong Sun, Gao Huang

To address the challenges, we present LLaVA-UHD, a large multimodal model that can efficiently perceive images in any aspect ratio and high resolution.

Long-Context Understanding TextVQA

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

1 code implementation14 Mar 2024 Ao Sun, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun

Effective attention modules have played a crucial role in the success of Transformer-based large language models (LLMs), but the quadratic time and memory complexities of these attention modules also pose a challenge when processing long sequences.

Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models

no code implementations13 Mar 2024 Ning Ding, Yulin Chen, Ganqu Cui, Xingtai Lv, Weilin Zhao, Ruobing Xie, BoWen Zhou, Zhiyuan Liu, Maosong Sun

Underlying data distributions of natural language, programming code, and mathematical symbols vary vastly, presenting a complex challenge for large language models (LLMs) that strive to achieve high performance across all three domains simultaneously.

Math

StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models

4 code implementations12 Mar 2024 Zhicheng Guo, Sijie Cheng, Hao Wang, Shihao Liang, Yujia Qin, Peng Li, Zhiyuan Liu, Maosong Sun, Yang Liu

The virtual API server contains a caching system and API simulators which are complementary to alleviate the change in API status.

Benchmarking

Yi: Open Foundation Models by 01.AI

1 code implementation7 Mar 2024 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Guoyin Wang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie, Yanpeng Li, Yuchi Xu, Yudong Liu, Yue Wang, Yuxuan Cai, Zhenyu Gu, Zhiyuan Liu, Zonghong Dai

The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models.

 Ranked #1 on Chatbot on AlpacaEval (using extra training data)

Attribute Chatbot +4

LLM-Oriented Retrieval Tuner

no code implementations4 Mar 2024 Si Sun, Hanqing Zhang, Zhiyuan Liu, Jie Bao, Dawei Song

Dense Retrieval (DR) is now considered as a promising tool to enhance the memorization capacity of Large Language Models (LLM) such as GPT3 and GPT-4 by incorporating external memories.

Memorization Retrieval +1

Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

1 code implementation29 Feb 2024 Yiju Guo, Ganqu Cui, Lifan Yuan, Ning Ding, Zexu Sun, Bowen Sun, Huimin Chen, Ruobing Xie, Jie zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun

In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e. g., harmlessness) can diminish performance in others (e. g., helpfulness).

Navigate

Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication

1 code implementation28 Feb 2024 Weize Chen, Chenfei Yuan, Jiarui Yuan, Yusheng Su, Chen Qian, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun

Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs).

Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

no code implementations23 Feb 2024 Yufei Huang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun

Recent studies have uncovered intriguing phenomena in deep learning, such as grokking, double descent, and emergent abilities in large language models, which challenge human intuition and are crucial for a deeper understanding of neural models.

Memorization Multi-Task Learning

Cleaner Pretraining Corpus Curation with Neural Web Scraping

1 code implementation22 Feb 2024 Zhipeng Xu, Zhenghao Liu, Yukun Yan, Zhiyuan Liu, Ge Yu, Chenyan Xiong

The web contains large-scale, diverse, and abundant information to satisfy the information-seeking needs of humans.

Language Modeling Language Modelling

ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents

1 code implementation21 Feb 2024 Zhipeng Xu, Zhenghao Liu, Yukun Yan, Shuo Wang, Shi Yu, Zheni Zeng, Chaojun Xiao, Zhiyuan Liu, Ge Yu, Chenyan Xiong

Retrieval-Augmented Generation (RAG) enables Large Language Models (LLMs) to leverage external knowledge, enhancing their performance on knowledge-intensive tasks.

Active Learning Position +3

$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens

4 code implementations21 Feb 2024 Xinrong Zhang, Yingfa Chen, Shengding Hu, Zihang Xu, JunHao Chen, Moo Khai Hao, Xu Han, Zhen Leng Thai, Shuo Wang, Zhiyuan Liu, Maosong Sun

Processing and reasoning over long contexts is crucial for many practical applications of Large Language Models (LLMs), such as document comprehension and agent construction.

Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding

1 code implementation21 Feb 2024 Weilin Zhao, Yuxiang Huang, Xu Han, Wang Xu, Chaojun Xiao, Xinrong Zhang, Yewei Fang, Kaihuo Zhang, Zhiyuan Liu, Maosong Sun

To achieve this, we introduce Ouroboros, which can generate draft phrases to parallelize the drafting process and meanwhile lengthen drafts in a training-free manner.

Text Generation

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

1 code implementation21 Feb 2024 Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Zhen Leng Thai, Junhao Shen, Jinyi Hu, Xu Han, Yujie Huang, Yuxiang Zhang, Jie Liu, Lei Qi, Zhiyuan Liu, Maosong Sun

Notably, the best-performing model, GPT-4V, attains an average score of 17. 97% on OlympiadBench, with a mere 10. 74% in physics, highlighting the benchmark rigor and the intricacy of physical reasoning.

Logical Fallacies

ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models

1 code implementation21 Feb 2024 Chenyang Song, Xu Han, Zhengyan Zhang, Shengding Hu, Xiyu Shi, Kuai Li, Chen Chen, Zhiyuan Liu, Guangli Li, Tao Yang, Maosong Sun

Some recent efforts have explored introducing ReLU or its variants as the substitutive activation function to help LLMs achieve activation sparsity and inference acceleration, but few can simultaneously obtain high sparsity and comparable model performance.

Large Language Model-based Human-Agent Collaboration for Complex Task Solving

1 code implementation20 Feb 2024 Xueyang Feng, Zhi-Yuan Chen, Yujia Qin, Yankai Lin, Xu Chen, Zhiyuan Liu, Ji-Rong Wen

We construct a human-agent collaboration dataset to train this policy model in an offline reinforcement learning environment.

Language Modeling Language Modelling +3

MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization

1 code implementation18 Feb 2024 Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun

Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns.

Code Generation Data Visualization

LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks

no code implementations18 Feb 2024 Hanqing Wang, Bowen Ping, Shuo Wang, Xu Han, Yun Chen, Zhiyuan Liu, Maosong Sun

Most prior works on LoRA combination primarily rely on task-level weights for each involved LoRA, making different examples and tokens share the same LoRA weights.

Math

OneBit: Towards Extremely Low-bit Large Language Models

1 code implementation17 Feb 2024 Yuzhuang Xu, Xu Han, Zonghan Yang, Shuo Wang, Qingfu Zhu, Zhiyuan Liu, Weidong Liu, Wanxiang Che

Model quantification uses low bit-width values to represent the weight matrices of existing models to be quantized, which is a promising approach to reduce both storage and computational overheads of deploying highly anticipated LLMs.

Quantization

Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents

1 code implementation14 Feb 2024 Cheng Qian, Bingxiang He, Zhong Zhuang, Jia Deng, Yujia Qin, Xin Cong, Zhong Zhang, Jie zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun

Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions.

Language Modeling Language Modelling

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

1 code implementation7 Feb 2024 Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun

In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning.

UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset

1 code implementation7 Feb 2024 Haoyu Wang, Shuo Wang, Yukun Yan, Xujia Wang, Zhiyu Yang, Yuzhuang Xu, Zhenghao Liu, Liner Yang, Ning Ding, Xu Han, Zhiyuan Liu, Maosong Sun

Different from previous works that simply translate English instructions, we consider both the language-specific and language-agnostic abilities of LLMs.

Cross-Lingual Transfer Data Augmentation

ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs

no code implementations6 Feb 2024 Zhengyan Zhang, Yixin Song, Guanghui Yu, Xu Han, Yankai Lin, Chaojun Xiao, Chenyang Song, Zhiyuan Liu, Zeyu Mi, Maosong Sun

To find the most efficient activation function for sparse computation, we propose a systematic framework to examine the sparsity of LLMs from three aspects: the trade-off between sparsity and performance, the predictivity of sparsity, and the hardware affinity.

MolTC: Towards Molecular Relational Modeling In Language Models

1 code implementation6 Feb 2024 Junfeng Fang, Shuai Zhang, Chang Wu, Zhengyi Yang, Zhiyuan Liu, Sihang Li, Kun Wang, Wenjie Du, Xiang Wang

Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research.

Relational Reasoning

UniMem: Towards a Unified View of Long-Context Large Language Models

1 code implementation5 Feb 2024 Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yankai Lin, Yukun Yan, Xiaodong Shi, Sen Song, Zhiyuan Liu, Maosong Sun

Distinguished by its four core dimensions-Memory Management, Memory Writing, Memory Reading, and Memory Injection, UniMem empowers researchers to conduct systematic exploration of long-context methods.

Management

Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent Self-Evolution

no code implementations25 Jan 2024 Cheng Qian, Shihao Liang, Yujia Qin, Yining Ye, Xin Cong, Yankai Lin, Yesai Wu, Zhiyuan Liu, Maosong Sun

This paper introduces Investigate-Consolidate-Exploit (ICE), a novel strategy for enhancing the adaptability and flexibility of AI agents through inter-task self-evolution.

Towards 3D Molecule-Text Interpretation in Language Models

1 code implementation25 Jan 2024 Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, Qi Tian

Through 3D molecule-text alignment and 3D molecule-centric instruction tuning, 3D-MoLM establishes an integration of 3D molecular encoder and LM.

Instruction Following Language Modeling +3

DebugBench: Evaluating Debugging Capability of Large Language Models

1 code implementation9 Jan 2024 Runchu Tian, Yining Ye, Yujia Qin, Xin Cong, Yankai Lin, Yinxu Pan, Yesai Wu, Haotian Hui, Weichuan Liu, Zhiyuan Liu, Maosong Sun

Previous evaluations of LLMs' debugging ability are significantly limited by the risk of data leakage, the scale of the dataset, and the variety of tested bugs.

Code Generation

GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension

no code implementations28 Dec 2023 Bohan Lyu, Xin Cong, Heyang Yu, Pan Yang, Yujia Qin, Yining Ye, Yaxi Lu, Zhong Zhang, Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun

As GitHub has hosted a multitude of repositories which can be seen as a good resource for tools, a promising solution is that LLM-based agents can autonomously integrate the repositories in GitHub according to the user queries to extend their tool set.

Experiential Co-Learning of Software-Developing Agents

1 code implementation28 Dec 2023 Chen Qian, Yufan Dang, Jiahao Li, Wei Liu, Zihao Xie, Yifei Wang, Weize Chen, Cheng Yang, Xin Cong, Xiaoyin Che, Zhiyuan Liu, Maosong Sun

Recent advancements in large language models (LLMs) have brought significant changes to various domains, especially through LLM-driven autonomous agents.

D-Bot: Database Diagnosis System using Large Language Models

1 code implementation3 Dec 2023 Xuanhe Zhou, Guoliang Li, Zhaoyan Sun, Zhiyuan Liu, Weize Chen, Jianming Wu, Jiesi Liu, Ruohang Feng, Guoyang Zeng

Database administrators (DBAs) play an important role in managing, maintaining and optimizing database systems.

Sparse Low-rank Adaptation of Pre-trained Language Models

1 code implementation20 Nov 2023 Ning Ding, Xingtai Lv, Qiaosen Wang, Yulin Chen, BoWen Zhou, Zhiyuan Liu, Maosong Sun

Recognizing the need for more flexible adaptation, we extend the methodology of LoRA to an innovative approach we call sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process.

Memorization

INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair

1 code implementation16 Nov 2023 Hanbin Wang, Zhenghao Liu, Shuo Wang, Ganqu Cui, Ning Ding, Zhiyuan Liu, Ge Yu

INTERVENOR prompts Large Language Models (LLMs) to play distinct roles during the code repair process, functioning as both a Code Learner and a Code Teacher.

Code Repair Code Translation

MAVEN-Arg: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation

1 code implementation15 Nov 2023 Xiaozhi Wang, Hao Peng, Yong Guan, Kaisheng Zeng, Jianhui Chen, Lei Hou, Xu Han, Yankai Lin, Zhiyuan Liu, Ruobing Xie, Jie zhou, Juanzi Li

Understanding events in texts is a core objective of natural language understanding, which requires detecting event occurrences, extracting event arguments, and analyzing inter-event relationships.

All Event Argument Extraction +4

NExT-Chat: An LMM for Chat, Detection and Segmentation

1 code implementation8 Nov 2023 Ao Zhang, Yuan YAO, Wei Ji, Zhiyuan Liu, Tat-Seng Chua

The development of large language models (LLMs) has greatly advanced the field of multimodal understanding, leading to the emergence of large multimodal models (LMMs).

Referring Expression Referring Expression Segmentation +1

ProAgent: From Robotic Process Automation to Agentic Process Automation

1 code implementation2 Nov 2023 Yining Ye, Xin Cong, Shizuo Tian, Jiannan Cao, Hao Wang, Yujia Qin, Yaxi Lu, Heyang Yu, Huadong Wang, Yankai Lin, Zhiyuan Liu, Maosong Sun

Empirical experiments are conducted to detail its construction and execution procedure of workflow, showcasing the feasibility of APA, unveiling the possibility of a new paradigm of automation driven by agents.

Decision Making

Enhancing Dense Retrievers' Robustness with Group-level Reweighting

2 code implementations25 Oct 2023 Peixuan Han, Zhenghao Liu, Zhiyuan Liu, Chenyan Xiong

In this paper, we introduce WebDRO, an efficient approach for clustering the web graph data and optimizing group weights to enhance the robustness of dense retrieval models.

Clustering Link Prediction +2

MUSER: A Multi-View Similar Case Retrieval Dataset

1 code implementation24 Oct 2023 Qingquan Li, Yiran Hu, Feng Yao, Chaojun Xiao, Zhiyuan Liu, Maosong Sun, Weixing Shen

Furthermore, the case similarities are typically measured solely by the textual semantics of the fact descriptions, which may fail to capture the full complexity of legal cases from the perspective of legal knowledge.

Fairness Retrieval +3

Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules

1 code implementation24 Oct 2023 Chaojun Xiao, Yuqi Luo, Wenbin Zhang, Pengle Zhang, Xu Han, Yankai Lin, Zhengyan Zhang, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie zhou

Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs.

Computational Efficiency

Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

1 code implementation NeurIPS 2023 Zhiyuan Liu, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua

Our results show that a subgraph-level tokenizer and a sufficiently expressive decoder with remask decoding have a large impact on the encoder's representation learning.

Decoder Representation Learning +1

MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Module Plugin

1 code implementation21 Oct 2023 Tianshuo Zhou, Sen Mei, Xinze Li, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu, Yu Gu, Ge Yu

To facilitate the multi-modal retrieval tasks, we build the ClueWeb22-MM dataset based on the ClueWeb22 dataset, which regards anchor texts as queries, and extracts the related text and image documents from anchor-linked web pages.

Language Modelling Text Retrieval

Thoroughly Modeling Multi-domain Pre-trained Recommendation as Language

no code implementations20 Oct 2023 Zekai Qu, Ruobing Xie, Chaojun Xiao, Yuan YAO, Zhiyuan Liu, Fengzong Lian, Zhanhui Kang, Jie zhou

With the thriving of pre-trained language model (PLM) widely verified in various of NLP tasks, pioneer efforts attempt to explore the possible cooperation of the general textual information in PLM with the personalized behavioral information in user historical behavior sequences to enhance sequential recommendation (SR).

Informativeness Language Modeling +2

ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction

1 code implementation20 Oct 2023 Yaorui Shi, An Zhang, Enzhi Zhang, Zhiyuan Liu, Xiang Wang

Predicting chemical reactions, a fundamental challenge in chemistry, involves forecasting the resulting products from a given reaction process.

Chemical Reaction Prediction Prediction

Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models

no code implementations19 Oct 2023 Weize Chen, Xiaoyue Xu, Xu Han, Yankai Lin, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie zhou

Parameter-shared pre-trained language models (PLMs) have emerged as a successful approach in resource-constrained environments, enabling substantial reductions in model storage and memory costs without significant performance compromise.

Toolink: Linking Toolkit Creation and Using through Chain-of-Solving on Open-Source Model

1 code implementation8 Oct 2023 Cheng Qian, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu

We first validate the efficacy of Toolink in harnessing the model's creativity and CoS ability on ChatGPT.

valid

UltraFeedback: Boosting Language Models with Scaled AI Feedback

4 code implementations2 Oct 2023 Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Bingxiang He, Wei Zhu, Yuan Ni, Guotong Xie, Ruobing Xie, Yankai Lin, Zhiyuan Liu, Maosong Sun

Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models, serving as a solid foundation for future feedback learning research.

Language Modelling

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

2 code implementations1 Oct 2023 Tianyu Yu, Jinyi Hu, Yuan YAO, Haoye Zhang, Yue Zhao, Chongyi Wang, Shan Wang, Yinxv Pan, Jiao Xue, Dahai Li, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun

The capabilities of MLLMs depend on two crucial factors: the model architecture to facilitate the feature alignment of visual modules and large language models; the multimodal instruction tuning datasets for human instruction following.

Instruction Following

ConPET: Continual Parameter-Efficient Tuning for Large Language Models

1 code implementation26 Sep 2023 Chenyang Song, Xu Han, Zheni Zeng, Kuai Li, Chen Chen, Zhiyuan Liu, Maosong Sun, Tao Yang

First, Static ConPET can adapt former continual learning methods originally designed for relatively smaller models to LLMs through PET and a dynamic replay strategy, which largely reduces the tuning costs and alleviates the over-fitting and forgetting issue.

Continual Learning

QASnowball: An Iterative Bootstrapping Framework for High-Quality Question-Answering Data Generation

no code implementations19 Sep 2023 Kunlun Zhu, Shihao Liang, Xu Han, Zhi Zheng, Guoyang Zeng, Zhiyuan Liu, Maosong Sun

Recent years have witnessed the success of question answering (QA), especially its potential to be a foundation paradigm for tackling diverse NLP tasks.

Data Augmentation Question Answering

Universal Photorealistic Style Transfer: A Lightweight and Adaptive Approach

no code implementations18 Sep 2023 Rong Liu, Enyu Zhao, Zhiyuan Liu, Andrew Feng, Scott John Easley

Photorealistic style transfer aims to apply stylization while preserving the realism and structure of input content.

Style Transfer Super-Resolution +1

Empowering Private Tutoring by Chaining Large Language Models

no code implementations15 Sep 2023 Yulin Chen, Ning Ding, Hai-Tao Zheng, Zhiyuan Liu, Maosong Sun, BoWen Zhou

Artificial intelligence has been applied in various aspects of online education to facilitate teaching and learning.

Text Matching Improves Sequential Recommendation by Reducing Popularity Biases

1 code implementation27 Aug 2023 Zhenghao Liu, Sen Mei, Chenyan Xiong, Xiaohua LI, Shi Yu, Zhiyuan Liu, Yu Gu, Ge Yu

TASTE alleviates the cold start problem by representing long-tail items using full-text modeling and bringing the benefits of pretrained language models to recommendation systems.

Sequential Recommendation Text Matching

Rational Decision-Making Agent with Internalized Utility Judgment

no code implementations24 Aug 2023 Yining Ye, Xin Cong, Shizuo Tian, Yujia Qin, Chong Liu, Yankai Lin, Zhiyuan Liu, Maosong Sun

Central to the development of rationality is the construction of an internalized utility judgment, capable of assigning numerical utilities to each decision.

Decision Making Language Modelling +1

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

2 code implementations23 Aug 2023 Jinyi Hu, Yuan YAO, Chongyi Wang, Shan Wang, Yinxu Pan, Qianyu Chen, Tianyu Yu, Hanghao Wu, Yue Zhao, Haoye Zhang, Xu Han, Yankai Lin, Jiao Xue, Dahai Li, Zhiyuan Liu, Maosong Sun

Building a competitive counterpart in other languages is highly challenging due to the low-resource nature of non-English multimodal data (i. e., lack of large-scale, high-quality image-text data).

Image to text Language Modeling +3

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

1 code implementation21 Aug 2023 Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie zhou

Autonomous agents empowered by Large Language Models (LLMs) have undergone significant improvements, enabling them to generalize across a broad spectrum of tasks.

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

3 code implementations14 Aug 2023 Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, Zhiyuan Liu

Text evaluation has historically posed significant challenges, often demanding substantial labor and time cost.

Text Generation

LLM As DBA

1 code implementation10 Aug 2023 Xuanhe Zhou, Guoliang Li, Zhiyuan Liu

Database administrators (DBAs) play a crucial role in managing, maintaining and optimizing a database system to ensure data availability, performance, and reliability.

Exploring Format Consistency for Instruction Tuning

1 code implementation28 Jul 2023 Shihao Liang, Runchu Tian, Kunlun Zhu, Yujia Qin, Huadong Wang, Xin Cong, Zhiyuan Liu, Xiaojiang Liu, Maosong Sun

Instruction tuning has emerged as a promising approach to enhancing large language models in following human instructions.

Denoising Diversity

ChatDev: Communicative Agents for Software Development

1 code implementation16 Jul 2023 Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, Maosong Sun

Numerous studies used deep learning to improve specific phases in a waterfall model, such as design, coding, and testing.

Decision Making

CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal Devices

1 code implementation15 Jul 2023 Weilin Zhao, Yuxiang Huang, Xu Han, Zhiyuan Liu, Zhengyan Zhang, Kuai Li, Chen Chen, Tao Yang, Maosong Sun

To address this, there are two key methods available: the first is model compression, which compresses LLMs into smaller sizes; the second is LoRA, which can transfer an LLM to other tasks with very few parameters, avoiding the storage of multiple model variants in multi-task scenarios by only preserving LoRAs.

Model Compression

Won't Get Fooled Again: Answering Questions with False Premises

1 code implementation5 Jul 2023 Shengding Hu, Yifan Luo, Huadong Wang, Xingyi Cheng, Zhiyuan Liu, Maosong Sun

In this paper, we find that the PLMs already possess the knowledge required to rebut such questions, and the key is how to activate the knowledge.

Question Answering

OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models

1 code implementation5 Jul 2023 Shengding Hu, Ning Ding, Weilin Zhao, Xingtai Lv, Zhen Zhang, Zhiyuan Liu, Maosong Sun

The scale of large pre-trained models (PTMs) poses significant challenges in adapting to downstream tasks due to the high optimization overhead and storage costs associated with full-parameter fine-tuning.

Automatic Truss Design with Reinforcement Learning

1 code implementation27 Jun 2023 Weihua Du, Jinglun Zhao, Chao Yu, Xingcheng Yao, Zimeng Song, Siyang Wu, Ruifeng Luo, Zhiyuan Liu, Xianzhong Zhao, Yi Wu

Directly applying end-to-end reinforcement learning (RL) methods to truss layout design is infeasible either, since only a tiny portion of the entire layout space is valid under the physical constraints, leading to particularly sparse rewards for RL training.

Combinatorial Optimization Layout Design +4

Interactive Molecular Discovery with Natural Language

1 code implementation21 Jun 2023 Zheni Zeng, Bangchen Yin, Shipeng Wang, Jiarui Liu, Cheng Yang, Haishen Yao, Xingzhi Sun, Maosong Sun, Guotong Xie, Zhiyuan Liu

Natural language is expected to be a key medium for various human-machine interactions in the era of large language models.

Property Prediction

The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation

1 code implementation12 Jun 2023 Hao Peng, Xiaozhi Wang, Feng Yao, Kaisheng Zeng, Lei Hou, Juanzi Li, Zhiyuan Liu, Weixing Shen

In this paper, we check the reliability of EE evaluations and identify three major pitfalls: (1) The data preprocessing discrepancy makes the evaluation results on the same dataset not directly comparable, but the data preprocessing details are not widely noted and specified in papers.

Event Argument Extraction Event Detection +1

Exploring the Impact of Model Scaling on Parameter-Efficient Tuning

1 code implementation4 Jun 2023 Yusheng Su, Chi-Min Chan, Jiali Cheng, Yujia Qin, Yankai Lin, Shengding Hu, Zonghan Yang, Ning Ding, Xingzhi Sun, Guotong Xie, Zhiyuan Liu, Maosong Sun

Our investigations reveal that model scaling (1) mitigates the effects of the positions of tunable parameters on performance, and (2) enables tuning methods to achieve performance comparable to full-parameter fine-tuning by optimizing fewer tunable parameters.

Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data

2 code implementations31 May 2023 Xinze Li, Zhenghao Liu, Chenyan Xiong, Shi Yu, Yu Gu, Zhiyuan Liu, Ge Yu

SANTA proposes two pretraining methods to make language models structure-aware and learn effective representations for structured data: 1) Structured Data Alignment, which utilizes the natural alignment relations between structured data and unstructured data for structure-aware pretraining.

Code Search Language Modeling +2

Exploring Lottery Prompts for Pre-trained Language Models

no code implementations31 May 2023 Yulin Chen, Ning Ding, Xiaobin Wang, Shengding Hu, Hai-Tao Zheng, Zhiyuan Liu, Pengjun Xie

Consistently scaling pre-trained language models (PLMs) imposes substantial burdens on model adaptation, necessitating more efficient alternatives to conventional fine-tuning.

From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework

1 code implementation29 May 2023 Yangyi Chen, Hongcheng Gao, Ganqu Cui, Lifan Yuan, Dehan Kong, Hanlu Wu, Ning Shi, Bo Yuan, Longtao Huang, Hui Xue, Zhiyuan Liu, Maosong Sun, Heng Ji

In our experiments, we conduct a robustness evaluation of RoBERTa models to demonstrate the effectiveness of our evaluation framework, and further show the rationality of each component in the framework.

Adversarial Attack

Emergent Modularity in Pre-trained Transformers

1 code implementation28 May 2023 Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Chaojun Xiao, Xiaozhi Wang, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie zhou

In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes.

Mixture-of-Experts

Plug-and-Play Knowledge Injection for Pre-trained Language Models

1 code implementation28 May 2023 Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Huadong Wang, Deming Ye, Chaojun Xiao, Xu Han, Zhiyuan Liu, Peng Li, Maosong Sun, Jie zhou

Experimental results on three knowledge-driven NLP tasks show that existing injection methods are not suitable for the new paradigm, while map-tuning effectively improves the performance of downstream models.

Plug-and-Play Document Modules for Pre-trained Models

1 code implementation28 May 2023 Chaojun Xiao, Zhengyan Zhang, Xu Han, Chi-Min Chan, Yankai Lin, Zhiyuan Liu, Xiangyang Li, Zhonghua Li, Zhao Cao, Maosong Sun

By inserting document plugins into the backbone PTM for downstream tasks, we can encode a document one time to handle multiple tasks, which is more efficient than conventional encoding-task coupling methods that simultaneously encode documents and input queries using task-specific encoders.

Question Answering

Stochastic Bridges as Effective Regularizers for Parameter-Efficient Tuning

1 code implementation28 May 2023 Weize Chen, Xu Han, Yankai Lin, Zhiyuan Liu, Maosong Sun, Jie zhou

Since it is non-trivial to directly model the intermediate states and design a running cost function, we propose to use latent stochastic bridges to regularize the intermediate states and use the regularization as the running cost of PETs.

Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In

2 code implementations27 May 2023 Zichun Yu, Chenyan Xiong, Shi Yu, Zhiyuan Liu

Retrieval augmentation can aid language models (LMs) in knowledge-intensive tasks by supplying them with external information.

MMLU Retrieval +1

Fusion-in-T5: Unifying Document Ranking Signals for Improved Information Retrieval

1 code implementation24 May 2023 Shi Yu, Chenghao Fan, Chenyan Xiong, David Jin, Zhiyuan Liu, Zhenghao Liu

Common document ranking pipelines in search systems are cascade systems that involve multiple ranking layers to integrate different information step-by-step.

Document Ranking Information Retrieval +3

CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models

2 code implementations23 May 2023 Cheng Qian, Chi Han, Yi R. Fung, Yujia Qin, Zhiyuan Liu, Heng Ji

Additionally, we introduce the Creation Challenge dataset, featuring 2K diverse questions, to emphasize the necessity and benefits of LLMs' tool creation ability.

2k Math +1

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

1 code implementation23 May 2023 Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Zhi Zheng, Shengding Hu, Zhiyuan Liu, Maosong Sun, BoWen Zhou

Fine-tuning on instruction data has been widely validated as an effective practice for implementing chat language models like ChatGPT.

Diversity

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots

no code implementations19 May 2023 Jinyi Hu, Xu Han, Xiaoyuan Yi, Yutong Chen, Wenhao Li, Zhiyuan Liu, Maosong Sun

IAP optimizes only a separate Chinese text encoder with all other parameters fixed to align Chinese semantics space to the English one in CLIP.

Cross-Lingual Transfer Image Generation

Recyclable Tuning for Continual Pre-training

1 code implementation15 May 2023 Yujia Qin, Cheng Qian, Xu Han, Yankai Lin, Huadong Wang, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie zhou

In pilot studies, we find that after continual pre-training, the upgraded PLM remains compatible with the outdated adapted weights to some extent.

Revisiting Graph Contrastive Learning for Anomaly Detection

no code implementations4 May 2023 Zhiyuan Liu, Chunjie Cao, Fangjian Tao, Jingzhang Sun

In the paper, we delve into the misconception and propose Multi-GNN and Augmented Graph contrastive framework MAG, which unified the existing GCAD methods in the contrastive self-supervised perspective.

Anomaly Detection Attribute +1

VPGTrans: Transfer Visual Prompt Generator across LLMs

1 code implementation NeurIPS 2023 Ao Zhang, Hao Fei, Yuan YAO, Wei Ji, Li Li, Zhiyuan Liu, Tat-Seng Chua

While developing a new multimodal LLM (MLLM) by pre-training on tremendous image-text pairs from scratch can be exceedingly resource-consuming, connecting an existing LLM with a comparatively lightweight visual prompt generator (VPG) becomes a feasible paradigm.

Transfer Learning

Rethinking Dense Retrieval's Few-Shot Ability

1 code implementation12 Apr 2023 Si Sun, Yida Lu, Shi Yu, Xiangyang Li, Zhonghua Li, Zhao Cao, Zhiyuan Liu, Deiming Ye, Jie Bao

Moreover, the dataset is disjointed into base and novel classes, allowing DR models to be continuously trained on ample data from base classes and a few samples in novel classes.

Retrieval

Language-Specific Representation of Emotion-Concept Knowledge Causally Supports Emotion Inference

1 code implementation19 Feb 2023 Ming Li, Yusheng Su, Hsiu-Yuan Huang, Jiali Cheng, Xin Hu, Xinmiao Zhang, Huadong Wang, Yujia Qin, Xiaozhi Wang, Kristen A. Lindquist, Zhiyuan Liu, Dan Zhang

Humans no doubt use language to communicate about their emotional experiences, but does language in turn help humans understand emotions, or is language just a vehicle of communication?

Attribute Language Modelling

READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises

1 code implementation14 Feb 2023 Chenglei Si, Zhengyan Zhang, Yingfa Chen, Xiaozhi Wang, Zhiyuan Liu, Maosong Sun

In order to fill this important gap, we construct READIN: a Chinese multi-task benchmark with REalistic And Diverse Input Noises.

Data Augmentation Fairness +2

Semi-supervised Large-scale Fiber Detection in Material Images with Synthetic Data

no code implementations10 Feb 2023 Lan Fu, Zhiyuan Liu, Jinlong Li, Jeff Simmons, Hongkai Yu, Song Wang

Accurate detection of large-scale, elliptical-shape fibers, including their parameters of center, orientation and major/minor axes, on the 2D cross-sectioned image slices is very important for characterizing the underlying cylinder 3D structures in microscopic material images.

Domain Adaptation

Decoder Tuning: Efficient Language Understanding as Decoding

3 code implementations16 Dec 2022 Ganqu Cui, Wentao Li, Ning Ding, Longtao Huang, Zhiyuan Liu, Maosong Sun

With the evergrowing sizes of pre-trained models (PTMs), it has been an emerging practice to only provide the inference APIs for users, namely model-as-a-service (MaaS) setting.

Decoder Natural Language Understanding

Mul-GAD: a semi-supervised graph anomaly detection framework via aggregating multi-view information

no code implementations11 Dec 2022 Zhiyuan Liu, Chunjie Cao, Jingzhang Sun

For a more comprehensive conclusion, we further investigate the effect of the objective function and the number of fused views on detection performance.

Graph Anomaly Detection

Visually Grounded Commonsense Knowledge Acquisition

1 code implementation22 Nov 2022 Yuan YAO, Tianyu Yu, Ao Zhang, Mengdi Li, Ruobing Xie, Cornelius Weber, Zhiyuan Liu, Hai-Tao Zheng, Stefan Wermter, Tat-Seng Chua, Maosong Sun

In this work, we present CLEVER, which formulates CKE as a distantly supervised multi-instance learning problem, where models learn to summarize commonsense relations from a bag of images about an entity pair without any human annotation on image instances.

Language Modelling

MAVEN-ERE: A Unified Large-scale Dataset for Event Coreference, Temporal, Causal, and Subevent Relation Extraction

1 code implementation14 Nov 2022 Xiaozhi Wang, Yulin Chen, Ning Ding, Hao Peng, Zimu Wang, Yankai Lin, Xu Han, Lei Hou, Juanzi Li, Zhiyuan Liu, Peng Li, Jie zhou

It contains 103, 193 event coreference chains, 1, 216, 217 temporal relations, 57, 992 causal relations, and 15, 841 subevent relations, which is larger than existing datasets of all the ERE tasks by at least an order of magnitude.

Event Relation Extraction Relation +1

Finding Skill Neurons in Pre-trained Transformer-based Language Models

1 code implementation14 Nov 2022 Xiaozhi Wang, Kaiyue Wen, Zhengyan Zhang, Lei Hou, Zhiyuan Liu, Juanzi Li

Furthermore, we demonstrate the skill neurons are most likely generated in pre-training rather than fine-tuning by showing that the skill neurons found with prompt tuning are also crucial for other fine-tuning methods freezing neuron weights, such as the adapter-based tuning and BitFit.

Network Pruning

FPT: Improving Prompt Tuning Efficiency via Progressive Training

1 code implementation13 Nov 2022 Yufei Huang, Yujia Qin, Huadong Wang, Yichun Yin, Maosong Sun, Zhiyuan Liu, Qun Liu

Inspired by these observations, we propose Fast Prompt Tuning (FPT), which starts by conducting PT using a small-scale partial PLM, and then progressively expands its depth and width until the full-model size.

Few-shot Classification with Hypersphere Modeling of Prototypes

no code implementations10 Nov 2022 Ning Ding, Yulin Chen, Ganqu Cui, Xiaobin Wang, Hai-Tao Zheng, Zhiyuan Liu, Pengjun Xie

Moreover, it is more convenient to perform metric-based classification with hypersphere prototypes than statistical modeling, as we only need to calculate the distance from a data point to the surface of the hypersphere.

Classification Few-Shot Learning +1

Sparse Structure Search for Delta Tuning

1 code implementation NIPS 2022 Shengding Hu, Zhen Zhang, Ning Ding, Yadao Wang, Yasheng Wang, Zhiyuan Liu, Maosong Sun

Generally, DT methods exquisitely design delta modules (DT modules) which could be applied to arbitrary fine-grained positions inside PTMs.

Reduce Catastrophic Forgetting of Dense Retrieval Training with Teleportation Negatives

1 code implementation31 Oct 2022 Si Sun, Chenyan Xiong, Yue Yu, Arnold Overwijk, Zhiyuan Liu, Jie Bao

In this paper, we investigate the instability in the standard dense retrieval training, which iterates between model training and hard negative selection using the being-trained model.

Retrieval

A Close Look into the Calibration of Pre-trained Language Models

2 code implementations31 Oct 2022 Yangyi Chen, Lifan Yuan, Ganqu Cui, Zhiyuan Liu, Heng Ji

We observe a consistent change in calibration performance across six factors.

Exploring Mode Connectivity for Pre-trained Language Models

1 code implementation25 Oct 2022 Yujia Qin, Cheng Qian, Jing Yi, Weize Chen, Yankai Lin, Xu Han, Zhiyuan Liu, Maosong Sun, Jie zhou

(3) How does the PLM's task knowledge change along the path connecting two minima?

Different Tunes Played with Equal Skill: Exploring a Unified Optimization Subspace for Delta Tuning

1 code implementation24 Oct 2022 Jing Yi, Weize Chen, Yujia Qin, Yankai Lin, Ning Ding, Xu Han, Zhiyuan Liu, Maosong Sun, Jie zhou

To fathom the mystery, we hypothesize that the adaptations of different DETs could all be reparameterized as low-dimensional optimizations in a unified optimization subspace, which could be found by jointly decomposing independent solutions of different DETs.

Cannot find the paper you are looking for? You can Submit a new open access paper.