Search Results for author: Zili Wang

Found 37 papers, 17 papers with code

Event Graph based Sentence Fusion

no code implementations EMNLP 2021 Ruifeng Yuan, Zili Wang, Wenjie Li

Sentence fusion is a conditional generation task that merges several related sentences into a coherent one, which can be deemed as a summary sentence.

Abstractive Text Summarization Sentence +2

Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining

no code implementations6 Mar 2025 Houyi Li, Wenzheng Zheng, Jingcheng Hu, Qiufeng Wang, Hanshan Zhang, Zili Wang, Shijie Xuyang, Yuantao Fan, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang

Through extensive empirical studies involving grid searches across diverse configurations, we discover universal scaling laws governing these hyperparameters: optimal learning rate follows a power-law relationship with both model parameters and data sizes, while optimal batch size scales primarily with data sizes.

Hyperparameter Optimization Language Modeling +3

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

no code implementations20 Feb 2025 M-A-P Team, Xinrun Du, Yifan Yao, Kaijing Ma, Bingli Wang, Tianyu Zheng, King Zhu, Minghao Liu, Yiming Liang, Xiaolong Jin, Zhenlin Wei, Chujie Zheng, Kaixin Deng, Shawn Gavin, Shian Jia, Sichao Jiang, Yiyan Liao, Rui Li, Qinrui Li, Sirun Li, Yizhi Li, Yunwen Li, David Ma, Yuansheng Ni, Haoran Que, Qiyao Wang, Zhoufutu Wen, Siwei Wu, Tyshawn Hsing, Ming Xu, Zhenzhu Yang, Zekun Moore Wang, Junting Zhou, Yuelin Bai, Xingyuan Bu, Chenglin Cai, Liang Chen, Yifan Chen, Chengtuo Cheng, Tianhao Cheng, Keyi Ding, Siming Huang, Yun Huang, Yaoru Li, Yizhe Li, Zhaoqun Li, Tianhao Liang, Chengdong Lin, Hongquan Lin, Yinghao Ma, Tianyang Pang, Zhongyuan Peng, Zifan Peng, Qige Qi, Shi Qiu, Xingwei Qu, Shanghaoran Quan, Yizhou Tan, Zili Wang, Chenqing Wang, Hao Wang, Yiya Wang, YuBo Wang, Jiajun Xu, Kexin Yang, Ruibin Yuan, Yuanhao Yue, Tianyang Zhan, Chun Zhang, Jinyang Zhang, Xiyue Zhang, Xingjian Zhang, Yue Zhang, Yongchi Zhao, Xiangyu Zheng, Chenghua Zhong, Yang Gao, Zhoujun Li, Dayiheng Liu, Qian Liu, Tianyu Liu, Shiwen Ni, Junran Peng, Yujia Qin, Wenbo Su, Guoyin Wang, Shi Wang, Jian Yang, Min Yang, Meng Cao, Xiang Yue, Zhaoxiang Zhang, Wangchunshu Zhou, Jiaheng Liu, Qunshu Lin, Wenhao Huang, Ge Zhang

To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines.

Collaborative Filtering

Multi-matrix Factorization Attention

no code implementations26 Dec 2024 Jingcheng Hu, Houyi Li, Yinmin Zhang, Zili Wang, Shuigeng Zhou, Xiangyu Zhang, Heung-Yeung Shum, Daxin Jiang

Existing variants for standard Multi-Head Attention (MHA), including SOTA methods like MLA, fail to maintain as strong performance under stringent Key-Value cache (KV cache) constraints.

Continuous Speculative Decoding for Autoregressive Image Generation

1 code implementation18 Nov 2024 Zili Wang, Robert Zhang, Kun Ding, Qi Yang, Fei Li, Shiming Xiang

Experimental results show that our continuous speculative decoding achieves a remarkable $2. 33\times$ speed-up on off-the-shelf models while maintaining the output distribution.

Denoising Image Generation

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

no code implementations7 Nov 2024 Siming Huang, Tianhao Cheng, J. K. Liu, Jiaran Hao, Liuyihan Song, Yang Xu, J. Yang, Jiaheng Liu, Chenchen Zhang, Linzheng Chai, Ruifeng Yuan, Zhaoxiang Zhang, Jie Fu, Qian Liu, Ge Zhang, Zili Wang, Yuan Qi, Yinghui Xu, Wei Chu

To address the gap, we introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.

Code Generation

Post-hoc Reward Calibration: A Case Study on Length Bias

1 code implementation25 Sep 2024 Zeyu Huang, Zihan Qiu, Zili Wang, Edoardo M. Ponti, Ivan Titov

Central to this process is the reward model (RM), which translates human feedback into training signals for optimising LLM behaviour.

Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis

no code implementations10 Sep 2024 Qi Yang, Binjie Mao, Zili Wang, Xing Nie, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang

These challenges encompass maintaining the content consistency between the input video and the generated audio, as well as the alignment of temporal and loudness properties within the video.

Audio Synthesis Audio-Visual Synchronization

Layerwise Recurrent Router for Mixture-of-Experts

1 code implementation13 Aug 2024 Zihan Qiu, Zeyu Huang, Shuang Cheng, Yizhi Zhou, Zili Wang, Ivan Titov, Jie Fu

The scaling of large language models (LLMs) has revolutionized their capabilities in various tasks, yet this growth must be matched with efficient computational strategies.

Attribute Mixture-of-Experts

AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation

3 code implementations3 Aug 2024 Zili Wang, Qi Yang, Linsu Shi, Jiazhong Yu, Qinghua Liang, Fei Li, Shiming Xiang

By characterizing attention maps of the network, we identify two key obstacles in AVS models: 1) attention dissipation, corresponding to the over-concentrated attention weights by Softmax within restricted frames, and 2) inefficient, burdensome transformer decoder, caused by narrow focus patterns in early stages.

Decoder

R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection

no code implementations15 Jul 2024 Zheyuan Zhou, Le Wang, Naiyu Fang, Zili Wang, Lemiao Qiu, Shuyou Zhang

However, there are two major challenges to the practical application of the current approaches: 1) the embedded models suffer the prohibitive computational and storage due to the memory bank structure; 2) the reconstructive models based on the MAE mechanism fail to detect anomalies in the unmasked regions.

3D Anomaly Detection

A Closer Look into Mixture-of-Experts in Large Language Models

1 code implementation26 Jun 2024 Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, Jie Fu

Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance, especially for language tasks.

Computational Efficiency Diversity +1

GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory

1 code implementation18 Jun 2024 Haoze Wu, Zihan Qiu, Zili Wang, Hang Zhao, Jie Fu

Therefore, these tokens can acquire the necessary knowledge from any expert during inference and become less sensitive to the choice.

Code Generation Mathematical Problem-Solving +4

AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning

no code implementations11 Jun 2024 Jun Gao, Qian Qiao, Ziqiang Cao, Zili Wang, Wenjie Li

In-context learning (ICL) facilitates Large Language Models (LLMs) exhibiting emergent ability on downstream tasks without updating billions of parameters.

In-Context Learning

Lyapunov Neural Network with Region of Attraction Search

no code implementations15 Mar 2024 Zili Wang, Sean B. Andersson, Roberto Tron

Deep learning methods have been widely used in robotic applications, making learning-enabled control design for complex nonlinear systems a promising direction.

Deep Reinforcement Learning valid

Beyond Language Models: Byte Models are Digital World Simulators

no code implementations29 Feb 2024 Shangda Wu, Xu Tan, Zili Wang, Rui Wang, Xiaobing Li, Maosong Sun

Traditional deep learning often overlooks bytes, the basic units of the digital world, where all forms of information and operations are encoded and manipulated in binary format.

Prediction

m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers

1 code implementation26 Feb 2024 Ka Man Lo, Yiming Liang, Wenyu Du, Yuantao Fan, Zili Wang, Wenhao Huang, Lei Ma, Jie Fu

Additionally, the V-MoE-Base model trained with m2mKD achieves 3. 5% higher accuracy than end-to-end training on ImageNet-1k.

Knowledge Distillation Mixture-of-Experts

HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts

1 code implementation20 Feb 2024 Hao Zhao, Zihan Qiu, Huijia Wu, Zili Wang, Zhaofeng He, Jie Fu

The Mixture of Experts (MoE) for language models has been proven effective in augmenting the capacity of models by dynamically routing each input token to a specific subset of experts for processing.

Mixture-of-Experts Multi-Task Learning

Personalized Large Language Model Assistant with Evolving Conditional Memory

no code implementations22 Dec 2023 Ruifeng Yuan, Shichao Sun, Yongqi Li, Zili Wang, Ziqiang Cao, Wenjie Li

With the rapid development of large language models, AI assistants like ChatGPT have become increasingly integrated into people's works and lives but are limited in personalized services.

Language Modeling Language Modelling +2

GSDC Transformer: An Efficient and Effective Cue Fusion for Monocular Multi-Frame Depth Estimation

no code implementations29 Sep 2023 Naiyu Fang, Lemiao Qiu, Shuyou Zhang, Zili Wang, Zheyuan Zhou, Kerui Hu

To address these issues, we propose the GSDC Transformer, an efficient and effective component for cue fusion in monocular multi-frame depth estimation.

Autonomous Driving Monocular Depth Estimation

RefGPT: Dialogue Generation of GPT, by GPT, and for GPT

1 code implementation24 May 2023 Dongjie Yang, Ruifeng Yuan, Yuantao Fan, Yifei Yang, Zili Wang, Shusen Wang, Hai Zhao

Therefore, we propose a method called RefGPT to generate enormous truthful and customized dialogues without worrying about factual errors caused by the model hallucination.

Dialogue Generation Hallucination

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

no code implementations2 May 2023 Gašper Beguš, Thomas Lu, Zili Wang

We introduce spontaneous concatenation: a phenomenon where convolutional neural networks (CNNs) trained on acoustic recordings of individual words start generating outputs with two or even three words concatenated without ever accessing data with multiple words in the input.

Generative Adversarial Network

PG-VTON: A Novel Image-Based Virtual Try-On Method via Progressive Inference Paradigm

1 code implementation18 Apr 2023 Naiyu Fang, Lemiao Qiu, Shuyou Zhang, Zili Wang, Kerui Hu

To address these limitations, we propose a novel virtual try-on method via progressive inference paradigm (PGVTON) that leverages a top-down inference pipeline and a general garment try-on strategy.

Virtual Try-on

A Cross-Scale Hierarchical Transformer with Correspondence-Augmented Attention for inferring Bird's-Eye-View Semantic Segmentation

no code implementations7 Apr 2023 Naiyu Fang, Lemiao Qiu, Shuyou Zhang, Zili Wang, Kerui Hu, Kang Wang

To save the computation increase caused by this hierarchical framework, we exploit the cross-scale Transformer to learn feature relationships in a reversed-aligning way, and leverage the residual connection of BEV features to facilitate information transmission between scales.

Autonomous Driving Bird's-Eye View Semantic Segmentation +2

Few-shot Query-Focused Summarization with Prefix-Merging

no code implementations29 Nov 2022 Ruifeng Yuan, Zili Wang, Ziqiang Cao, Wenjie Li

Drawn inspiration from prefix-tuning, we are allowed to integrate the task knowledge from text summarization and question answering into a properly designed prefix and apply the merged prefix to query-focused summarization.

Few-Shot Learning Query-focused Summarization +2

Physical Logic Enhanced Network for Small-Sample Bi-Layer Metallic Tubes Bending Springback Prediction

no code implementations20 Sep 2022 Chang Sun, Zili Wang, Shuyou Zhang, Le Wang, Jianrong Tan

In the second stage, under the physical logic, the PE-NET is assembled by ES-NET and SP-NET and then fine-tuned with the small sample BMT dataset and composite loss function.

Digital-twin-enhanced metal tube bending forming real-time prediction method based on Multi-source-input MTL

no code implementations3 Jul 2022 Chang Sun, Zili Wang, Shuyou Zhang, Taotao Zhou, Jie Li, Jianrong Tan

To address this issue, a digital-twin-enhanced (DT-enhanced) metal tube bending forming real-time prediction method based on multi-source-input multi-task learning (MTL) is proposed.

Multi-Task Learning Prediction

Bearing-Based Formation Control with Optimal Motion Trajectory

no code implementations22 Mar 2022 Zili Wang, Sean B. Andersson, Roberto Tron

We form and solve a nonlinear optimization problem with the sum of path lengths of the agent trajectories as the objective and subject to the original equilibria and global convergence conditions for formation control.

Fact-level Extractive Summarization with Hierarchical Graph Mask on BERT

1 code implementation COLING 2020 Ruifeng Yuan, Zili Wang, Wenjie Li

We also introduce a hierarchical structure, which incorporates the multi-level of granularities of the textual information into the model.

Extractive Summarization Natural Language Understanding +1

Query-aware Tip Generation for Vertical Search

no code implementations19 Oct 2020 Yang Yang, Junmei Hao, Canjia Li, Zili Wang, Jingang Wang, Fuzheng Zhang, Rao Fu, Peixu Hou, Gong Zhang, Zhongyuan Wang

Existing work on tip generation does not take query into consideration, which limits the impact of tips in search scenarios.

Decision Making Decoder

MSnet: A BERT-based Network for Gendered Pronoun Resolution

1 code implementation WS 2019 Zili Wang

In stage 1 of the gendered pronoun resolution task, a variant of this model, trained in the fine-tuning approach, reduced the multi-class logarithmic loss to 0. 3033 in the 5-fold cross-validation of training set and 0. 2795 in testing set.

Semantic Similarity Semantic Textual Similarity

Cannot find the paper you are looking for? You can Submit a new open access paper.