no code implementations • 22 May 2025 • Jipeng Zhang, Haolin Yang, Kehao Miao, Ruiyuan Zhang, Renjie Pi, Jiahui Gao, Xiaofang Zhou
However, real-world applications require SQL generation across multiple dialects with varying syntax and specialized features, which remains a challenge for current models.
1 code implementation • 15 May 2025 • Zhaowei Wang, Wenhao Yu, Xiyu Ren, Jipeng Zhang, Yu Zhao, Rohit Saxena, Liang Cheng, Ginny Wong, Simon See, Pasquale Minervini, Yangqiu Song, Mark Steedman
The rapid extension of context windows in large vision-language models has given rise to long-context vision-language models (LCVLMs), which are capable of handling hundreds of images with interleaved text tokens in a single forward pass.
no code implementations • 17 Apr 2025 • Weijie Shi, Jipeng Zhang, Yaguang Wu, Jingzhi Fang, Ruiyuan Zhang, Jiajie Xu, Jia Zhu, Hao Chen, Yao Zhao, Sirui Han, Xiaofang Zhou
Large language models (LLMs) are commonly trained on multi-domain datasets, where domain sampling strategies significantly impact model performance due to varying domain importance across downstream tasks.
no code implementations • 17 Apr 2025 • Weijie Shi, Chengyi Ju, Chengzhong Liu, Jiaming Ji, Jipeng Zhang, Ruiyuan Zhang, Jia Zhu, Jiajie Xu, Yaodong Yang, Sirui Han, Yike Guo
Moreover, we demonstrate that NaVAB can be combined with alignment techniques to effectively reduce value concerns by aligning LLMs' values with the target country.
1 code implementation • 5 Mar 2025 • Ruida Wang, Rui Pan, Yuxin Li, Jipeng Zhang, Yizhen Jia, Shizhe Diao, Renjie Pi, Junjie Hu, Tong Zhang
To solve these issues, we propose MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought framework, (to the best of our knowledge), the first multi-agent framework for Lean4 theorem proving that balance high-level NL reasoning and FL verification in Long CoT.
no code implementations • 5 Feb 2025 • Boyao Wang, Rui Pan, Shizhe Diao, Xingyuan Pan, Jipeng Zhang, Renjie Pi, Tong Zhang
Small language models (SLMs) have attracted considerable attention from both academia and industry due to their broad range of applications in edge devices.
no code implementations • 13 Dec 2024 • Runtao Liu, Chen I Chieh, Jindong Gu, Jipeng Zhang, Renjie Pi, Qifeng Chen, Philip Torr, Ashkan Khakzar, Fabio Pizzati
Using a custom DPO strategy and this dataset, we train safety experts, in the form of low-rank adaptation (LoRA) matrices, able to guide the generation process away from specific safety-related concepts.
no code implementations • 3 Dec 2024 • Yuci Liang, Xinheng Lyu, Meidan Ding, WenTing Chen, Jipeng Zhang, Yuexiang Ren, Xiangjian He, Song Wu, Sen yang, Xiyue Wang, Xiaohan Xing, Linlin Shen
Recent advancements in computational pathology have produced patch-level Multi-modal Large Language Models (MLLMs), but these models are limited by their inability to analyze whole slide images (WSIs) comprehensively and their tendency to bypass crucial morphological features that pathologists rely on for diagnosis.
no code implementations • 8 Nov 2024 • Zijian Hu, Jipeng Zhang, Rui Pan, Zhaozhuo Xu, Shanshan Han, Han Jin, Alay Dilipbhai Shah, Dimitris Stripelis, Yuhang Yao, Salman Avestimehr, Chaoyang He, Tong Zhang
Aiming to improve the pre-training efficiency, Fox-1-1. 6B model introduces a novel 3-stage data curriculum across all the training data with 2K-8K sequence length.
no code implementations • 7 Nov 2024 • Yide Ran, Zhaozhuo Xu, Yuhang Yao, Zijian Hu, Shanshan Han, Han Jin, Alay Dilipbhai Shah, Jipeng Zhang, Dimitris Stripelis, Tong Zhang, Salman Avestimehr, Chaoyang He
The rapid advancement of Large Language Models (LLMs) has led to their increased integration into mobile devices for personalized assistance, which enables LLMs to call external API functions to enhance their performance.
no code implementations • 24 Oct 2024 • Jipeng Zhang, Jianshu Zhang, Yuanzhe Li, Renjie Pi, Rui Pan, Runtao Liu, Ziqiang Zheng, Tong Zhang
The underlying cause of this issue is the gap between natural language to programming language gap (NL-PL Gap), which is especially pronounced in LRPLs due to limited aligned data.
1 code implementation • 9 Oct 2024 • Renjie Pi, Jianshu Zhang, Tianyang Han, Jipeng Zhang, Rui Pan, Tong Zhang
In this paper, we introduce Personalized Visual Instruction Tuning (PVIT), a novel data curation and training framework designed to enable MLLMs to identify target individuals within an image and engage in personalized and coherent dialogues.
1 code implementation • 22 Aug 2024 • Kashun Shum, Minrui Xu, Jianshu Zhang, Zixin Chen, Shizhe Diao, Hanze Dong, Jipeng Zhang, Muhammad Omer Raza
Then we further propose a brand new method named Efficient Trustworthy Distillation (FIRST), which utilizes a small portion of teacher's knowledge to obtain a reliable language model in a cost-efficient way.
no code implementations • 22 Aug 2024 • Dimitris Stripelis, Zijian Hu, Jipeng Zhang, Zhaozhuo Xu, Alay Dilipbhai Shah, Han Jin, Yuhang Yao, Salman Avestimehr, Chaoyang He
With the rapid growth of Large Language Models (LLMs) across various domains, numerous new LLMs have emerged, each possessing domain-specific expertise.
1 code implementation • 21 Jul 2024 • Jipeng Zhang, Yaxuan Qin, Renjie Pi, Weizhong Zhang, Rui Pan, Tong Zhang
Achieving this goal poses non-trivial challenges: 1) data selection requires accurate data representations that reflect the training samples' quality, 2) considering the diverse nature of instruction datasets, and 3) ensuring the efficiency of the coreset selection algorithm for large models.
1 code implementation • 3 Jul 2024 • Ruida Wang, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, Tong Zhang
However, due to the scarcity of aligned NL and Formal Language (FL) theorem-proving data most modern LLMs exhibit suboptimal performance. This scarcity results in a paucity of methodologies for training LLMs and techniques to fully utilize their capabilities in composing formal proofs.
no code implementations • 28 Jun 2024 • Rui Pan, Dylan Zhang, Hanning Zhang, Xingyuan Pan, Minrui Xu, Jipeng Zhang, Renjie Pi, Xiaoyu Wang, Tong Zhang
Bilevel optimization has shown its utility across various machine learning settings, yet most algorithms in practice require second-order information, making it challenging to scale them up.
1 code implementation • 11 Jun 2024 • Renjie Pi, Jianshu Zhang, Jipeng Zhang, Rui Pan, Zhekai Chen, Tong Zhang
Image description datasets play a crucial role in the advancement of various applications such as image understanding, text-to-image generation, and text-image retrieval.
2 code implementations • 4 Jun 2024 • Jianqiao Lu, Yingjia Wan, Zhengying Liu, Yinya Huang, Jing Xiong, Chengwu Liu, Jianhao Shen, Hui Jin, Jipeng Zhang, Haiming Wang, Zhicheng Yang, Jing Tang, Zhijiang Guo
Autoformalization, the conversion of natural language mathematics into formal languages, offers significant potential for advancing mathematical reasoning.
1 code implementation • 26 Mar 2024 • Rui Pan, Xiang Liu, Shizhe Diao, Renjie Pi, Jipeng Zhang, Chi Han, Tong Zhang
Attempting to complement this deficiency, we investigate the layerwise properties of LoRA on fine-tuning tasks and observe an unexpected but consistent skewness of weight norms across different layers.
no code implementations • 13 Mar 2024 • Renjie Pi, Tianyang Han, Wei Xiong, Jipeng Zhang, Runtao Liu, Rui Pan, Tong Zhang
To mitigate this issue, we propose Bootstrapped Preference Optimization (BPO), which conducts preference learning with datasets containing negative responses bootstrapped from the model itself.
Ranked #102 on
Visual Question Answering
on MM-Vet
no code implementations • 6 Feb 2024 • Zhenwen Liang, Kehan Guo, Gang Liu, Taicheng Guo, Yujun Zhou, Tianyu Yang, Jiajun Jiao, Renjie Pi, Jipeng Zhang, Xiangliang Zhang
The paper introduces SceMQA, a novel benchmark for scientific multimodal question answering at the college entrance level.
1 code implementation • 6 Feb 2024 • Tianyang Han, Qing Lian, Rui Pan, Renjie Pi, Jipeng Zhang, Shizhe Diao, Yong Lin, Tong Zhang
In this paper, we identify a typical class of inputs that baffles MLLMs, which consist of images that are highly relevant but inconsistent with answers, causing MLLMs to suffer from visual illusion.
1 code implementation • 31 Jan 2024 • Ying Su, Jipeng Zhang, Yangqiu Song, Tong Zhang
To facilitate the evaluation of pruned subgraphs, we also propose a graph attention network (GAT) based module to reason with the subgraph data.
1 code implementation • 5 Jan 2024 • Renjie Pi, Tianyang Han, Jianshu Zhang, Yueqi Xie, Rui Pan, Qing Lian, Hanze Dong, Jipeng Zhang, Tong Zhang
The deployment of multimodal large language models (MLLMs) has brought forth a unique vulnerability: susceptibility to malicious attacks through visual inputs.
no code implementations • 4 Jan 2024 • Ziqiang Zheng, YiWei Chen, Jipeng Zhang, Tuan-Anh Vu, Huimin Zeng, Yue Him Wong Tim, Sai-Kit Yeung
In this study, we carry out the preliminary and comprehensive case study of utilizing GPT-4V for marine analysis.
3 code implementations • 18 Dec 2023 • Jiahui Gao, Renjie Pi, Jipeng Zhang, Jiacheng Ye, Wanjun Zhong, YuFei Wang, Lanqing Hong, Jianhua Han, Hang Xu, Zhenguo Li, Lingpeng Kong
We first analyze the limitations of current Multimodal Large Language Models (MLLMs) in this area: they struggle to accurately comprehending basic geometric elements and their relationships.
1 code implementation • 14 Nov 2023 • Rui Pan, Shuo Xing, Shizhe Diao, Wenhe Sun, Xiang Liu, Kashun Shum, Renjie Pi, Jipeng Zhang, Tong Zhang
Since the emergence of large language models, prompt learning has become a popular method for optimizing and customizing these models.
no code implementations • CVPR 2024 • Renjie Pi, Lewei Yao, Jiahui Gao, Jipeng Zhang, Tong Zhang
In this paper, we present a novel end-to-end framework named PerceptionGPT, which efficiently and effectively equips the VLLMs with visual perception abilities by leveraging the representation power of LLMs' token embedding.
1 code implementation • 20 Oct 2023 • Ziqiang Zheng, Jipeng Zhang, Tuan-Anh Vu, Shizhe Diao, Yue Him Wong Tim, Sai-Kit Yeung
Large language models (LLMs), such as ChatGPT/GPT-4, have proven to be powerful tools in promoting the user experience as an AI assistant.
1 code implementation • 19 Oct 2023 • Yi Bin, Wenhao Shi, Bin Ji, Jipeng Zhang, Yujuan Ding, Yang Yang
Existing sentence ordering approaches generally employ encoder-decoder frameworks with the pointer net to recover the coherence by recurrently predicting each sentence step-by-step.
1 code implementation • 12 Sep 2023 • Yong Lin, Hangyu Lin, Wei Xiong, Shizhe Diao, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan YAO, Tong Zhang
Building on the analysis and the observation that averaging different layers of the transformer leads to significantly different alignment-forgetting trade-offs, we propose Heterogeneous Model Averaging (HMA) to Heterogeneously find various combination ratios of model layers.
1 code implementation • 12 Jul 2023 • Lang Zeng, Jipeng Zhang, Wei Chen, Ying Ding
In pursuit of constructing a dynamic prediction model for a progressive eye disorder, age-related macular degeneration (AMD), we propose a time-dependent Cox survival neural network (tdCoxSNN) to predict its progression using longitudinal fundus images.
1 code implementation • 21 Jun 2023 • Shizhe Diao, Rui Pan, Hanze Dong, Ka Shun Shum, Jipeng Zhang, Wei Xiong, Tong Zhang
As the number of available foundation models and specialized tasks keeps growing, the job of training scientific language models becomes highly nontrivial.
1 code implementation • 23 May 2023 • Renjie Pi, Jiahui Gao, Shizhe Diao, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, Lingpeng Kong, Tong Zhang
Overall, our proposed paradigm and DetGPT demonstrate the potential for more sophisticated and intuitive interactions between humans and machines.
no code implementations • 22 May 2023 • Xiaoyu Wang, Rui Pan, Renjie Pi, Jipeng Zhang
To address this issue, we propose a reformulation of bilevel optimization as a minimax problem, effectively decoupling the outer-inner dependency.
1 code implementation • 13 Apr 2023 • Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum, Tong Zhang
Utilizing a reward model and a sufficient number of samples, our approach selects the high-quality samples, discarding those that exhibit undesired behavior, and subsequently enhancing the model by fine-tuning on these filtered samples.
1 code implementation • 12 Jan 2023 • Xinsong Zhang, Yan Zeng, Jipeng Zhang, Hang Li
X-FM has one language encoder, one vision encoder, and one fusion encoder, as well as a new training method.
Ranked #3 on
Visual Reasoning
on NLVR2 Test
1 code implementation • 1 Dec 2022 • Zhenwen Liang, Jipeng Zhang, Xiangliang Zhang
In this paper, we propose to build a novel MWP solver by leveraging analogical MWPs, which advance the solver's generalization ability across different kinds of MWPs.
1 code implementation • 1 Dec 2022 • Zhenwen Liang, Jipeng Zhang, Lei Wang, Yan Wang, Jie Shao, Xiangliang Zhang
In this paper, we design a new training framework for an MWP solver by introducing a solution buffer and a solution discriminator.
2 code implementations • 22 Nov 2022 • Yan Zeng, Xinsong Zhang, Hang Li, Jiawei Wang, Jipeng Zhang, Wangchunshu Zhou
Vision language pre-training aims to learn alignments between vision and language from a large amount of data.
Ranked #1 on
Cross-Modal Retrieval
on Flickr30k
(using extra training data)
1 code implementation • 17 Nov 2022 • JunJie Huang, Chenglong Wang, Jipeng Zhang, Cong Yan, Haotian Cui, Jeevana Priya Inala, Colin Clement, Nan Duan, Jianfeng Gao
Code generation models can benefit data scientists' productivity by automatically generating code from context and text descriptions.
1 code implementation • Findings (NAACL) 2022 • Zhenwen Liang, Jipeng Zhang, Lei Wang, Wei Qin, Yunshi Lan, Jie Shao, Xiangliang Zhang
Math word problem (MWP) solving faces a dilemma in number representation learning.
Ranked #5 on
Math Word Problem Solving
on MathQA
1 code implementation • ACL 2020 • Jipeng Zhang, Lei Wang, Roy Ka-Wei Lee, Yi Bin, Yan Wang, Jie Shao, Ee-Peng Lim
While the recent tree-based neural models have demonstrated promising results in generating solution expression for the math word problem (MWP), most of these models do not capture the relationships and order information among the quantities well.
Ranked #11 on
Math Word Problem Solving
on Math23K
1 code implementation • AAAI 2019 • Lei Wang, Dongxiang Zhang, Jipeng Zhang, Xing Xu, Lianli Gao, Bing Tian Dai, Heng Tao Shen
Then, we design a recursive neural network to encode the quantity with Bi-LSTM and self attention, and infer the unknown operator nodes in a bottom-up manner.
1 code implementation • ACL 2019 • Jierui Li, Lei Wang, Jipeng Zhang, Yan Wang, Bing Tian Dai, Dongxiang Zhang
Several deep learning models have been proposed for solving math word problems (MWPs) automatically.
Ranked #14 on
Math Word Problem Solving
on Math23K