1 code implementation • 2 Mar 2025 • Zhiqi Kang, Liyuan Wang, Xingxing Zhang, Karteek Alahari
It includes a forgetting-aware initial session adaption that employs pretraining data to initialize prompt parameters and improve generalizability, as well as a non-parametric logit mask of the output layers to mitigate catastrophic forgetting.
no code implementations • 23 Feb 2025 • Jiaxi Li, Xingxing Zhang, Xun Wang, Xiaolong Huang, Li Dong, Liang Wang, Si-Qing Chen, Wei Lu, Furu Wei
Large language models (LLMs) with extended context windows enable tasks requiring extensive information integration but are limited by the scarcity of high-quality, diverse datasets for long-context instruction tuning.
no code implementations • 19 Jan 2025 • Yiyao Yu, Yuxiang Zhang, Dongdong Zhang, Xiao Liang, Hengyuan Zhang, Xingxing Zhang, ZiYi Yang, Mahmoud Khademi, Hany Awadalla, Junjie Wang, Yujiu Yang, Furu Wei
Large Language Models (LLMs) have made notable progress in mathematical reasoning, yet they often rely on single-paradigm reasoning that limits their effectiveness across diverse tasks.
no code implementations • 25 Dec 2024 • Liang Wang, Nan Yang, Xingxing Zhang, Xiaolong Huang, Furu Wei
We introduce a bootstrapping approach to train long-context language models by exploiting their short-context capabilities only.
no code implementations • 4 Dec 2024 • Yaoyao Chang, Lei Cui, Li Dong, Shaohan Huang, Yangyu Huang, Yupan Huang, Scarlett Li, Tengchao Lv, Shuming Ma, Qinzheng Sun, Wenhui Wang, Furu Wei, Ying Xin, Mao Yang, Qiufeng Yin, Xingxing Zhang
This study explores the untapped potential of Common Crawl as a comprehensive and flexible resource for pre-training LLMs, addressing both general-purpose language understanding and specialized domain knowledge.
no code implementations • 25 Nov 2024 • Fangkai Jiao, Geyang Guo, Xingxing Zhang, Nancy F. Chen, Shafiq Joty, Furu Wei
Specifically, using Mathstral-7B as our base model, we improve MATH results from 58. 3 to 68. 6, surpassing both NuminaMath-72B and GPT-4-Turbo-1106-preview.
no code implementations • 4 Nov 2024 • Hengkai Tan, Xuezhou Xu, Chengyang Ying, Xinyi Mao, Songming Liu, Xingxing Zhang, Hang Su, Jun Zhu
To overcome these challenges, we then focus on state-based policy generalization and present \textbf{ManiBox}, a novel bounding-box-guided manipulation method built on a simulation-based teacher-student framework.
no code implementations • 9 Oct 2024 • Qingxiu Dong, Li Dong, Xingxing Zhang, Zhifang Sui, Furu Wei
Through alignment with human preferences, Large Language Models (LLMs) have advanced significantly in generating honest, harmless, and helpful responses.
1 code implementation • 7 Jul 2024 • Liyuan Wang, Jingyi Xie, Xingxing Zhang, Hang Su, Jun Zhu
The deployment of pre-trained models (PTMs) has greatly advanced the field of continual learning (CL), enabling positive knowledge transfer and resilience to catastrophic forgetting.
no code implementations • 30 May 2024 • Hengkai Tan, Songming Liu, Kai Ma, Chengyang Ying, Xingxing Zhang, Hang Su, Jun Zhu
In this paper, we propose to investigate the task from a new perspective of the frequency domain.
1 code implementation • 23 May 2024 • Chengyang Ying, Zhongkai Hao, Xinning Zhou, Xuezhou Xu, Hang Su, Xingxing Zhang, Jun Zhu
Designing generalizable agents capable of adapting to diverse embodiments has achieved significant attention in Reinforcement Learning (RL), which is critical for deploying RL agents in various real-world applications.
1 code implementation • 22 May 2024 • Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao
This paper introduces xRAG, an innovative context compression method tailored for retrieval-augmented generation.
1 code implementation • 22 Apr 2024 • Kanglei Zhou, Junlin Li, Ruizhi Cai, Liyuan Wang, Xingxing Zhang, Xiaohui Liang
However, this common strategy yields suboptimal results due to the inherent struggle of these backbones to capture the subtle cues essential for AQA.
1 code implementation • 7 Mar 2024 • Kanglei Zhou, Liyuan Wang, Xingxing Zhang, Hubert P. H. Shum, Frederick W. B. Li, Jianguo Li, Xiaohui Liang
We propose Continual AQA (CAQA) to refine models using sparse new data.
1 code implementation • 5 Mar 2024 • Zhengyang Tang, Xingxing Zhang, Benyou Wan, Furu Wei
Inspired by the cognitive mechanism in human mathematical learning, it first extracts topics and knowledge points from seed math questions and then build a concept graph, which is subsequently used to generate new math questions.
no code implementations • 20 Feb 2024 • Haoran Li, Qingxiu Dong, Zhengyang Tang, Chaojun Wang, Xingxing Zhang, Haoyang Huang, Shaohan Huang, Xiaolong Huang, Zeqiang Huang, Dongdong Zhang, Yuxian Gu, Xin Cheng, Xun Wang, Si-Qing Chen, Li Dong, Wei Lu, Zhifang Sui, Benyou Wang, Wai Lam, Furu Wei
We introduce Generalized Instruction Tuning (called GLAN), a general and scalable method for instruction tuning of Large Language Models (LLMs).
1 code implementation • 13 Dec 2023 • Ziqi Yuan, Liyuan Wang, Wenbo Ding, Xingxing Zhang, Jiachen Zhong, Jianyong Ai, Jianmin Li, Jun Zhu
A commonly-used strategy for supervised IOD is to encourage the current model (as a student) to mimic the behavior of the old model (as a teacher), but it generally fails in SSIOD because a dominant number of object instances from old and new classes are coexisting and unlabelled, with the teacher only recognizing a fraction of them.
no code implementations • 26 Oct 2023 • Shuai Zheng, Zhizhe Liu, Zhenfeng Zhu, Xingxing Zhang, JianXin Li, Yao Zhao
On this basis, BiKT not only allows us to acquire knowledge from both the GNN and its derived model but promotes each other by injecting the knowledge into the other.
1 code implementation • 21 Oct 2023 • Liyuan Wang, Jingyi Xie, Xingxing Zhang, Hang Su, Jun Zhu
In this work, we present a general framework for continual learning of sequentially arrived tasks with the use of pre-training, which has emerged as a promising direction for artificial intelligence systems to accommodate real-world dynamics.
1 code implementation • 20 Oct 2023 • Haoran Li, Yiran Liu, Xingxing Zhang, Wei Lu, Furu Wei
Furthermore, we apply probabilistic ranking and contextual ranking sequentially to the instruction-tuned LLM.
1 code implementation • NeurIPS 2023 • Yilin Lyu, Liyuan Wang, Xingxing Zhang, Zicheng Sun, Hang Su, Jun Zhu, Liping Jing
Continual learning entails learning a sequence of tasks and balancing their knowledge appropriately.
1 code implementation • NeurIPS 2023 • Liyuan Wang, Jingyi Xie, Xingxing Zhang, Mingyi Huang, Hang Su, Jun Zhu
Following these empirical and theoretical insights, we propose Hierarchical Decomposition (HiDe-)Prompt, an innovative approach that explicitly optimizes the hierarchical components with an ensemble of task-specific prompts and statistics of both uninstructed and instructed representations, further with the coordination of a contrastive regularization strategy.
1 code implementation • 29 Aug 2023 • Liyuan Wang, Xingxing Zhang, Qian Li, Mingtian Zhang, Hang Su, Jun Zhu, Yi Zhong
Continual learning aims to empower artificial intelligence (AI) with strong adaptability to the real world.
3 code implementations • 5 Jul 2023 • Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei
Scaling sequence length has become a critical demand in the era of large language models.
1 code implementation • 31 Jan 2023 • Liyuan Wang, Xingxing Zhang, Hang Su, Jun Zhu
To cope with real-world dynamics, an intelligent system needs to incrementally acquire, update, accumulate, and exploit knowledge throughout its lifetime.
no code implementations • 8 Dec 2022 • Xingxing Zhang, Yiran Liu, Xun Wang, Pengcheng He, Yang Yu, Si-Qing Chen, Wayne Xiong, Furu Wei
The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers.
Ranked #3 on
Text Summarization
on SAMSum
no code implementations • 3 Nov 2022 • Yubo Zhang, Xingxing Zhang, Xun Wang, Si-Qing Chen, Furu Wei
In this paper, we propose Lotus (shorthand for Latent Prompt Tuning for Summarization), which is a single model that can be applied in both controlled and uncontrolled (without control signals) modes.
1 code implementation • 16 Aug 2022 • Jian Jin, Yuan Xue, Xingxing Zhang, Lili Meng, Yao Zhao, Weisi Lin
However, they have a major drawback that the generated JND is assessed in the real-world signal domain instead of in the perceptual domain in the human brain.
2 code implementations • 13 Jul 2022 • Liyuan Wang, Xingxing Zhang, Qian Li, Jun Zhu, Yi Zhong
Continual learning requires incremental compatibility with a sequence of tasks.
no code implementations • 9 Jun 2022 • Weikai Yang, Xi Ye, Xingxing Zhang, Lanxi Xiao, Jiazhi Xia, Zhongyuan Wang, Jun Zhu, Hanspeter Pfister, Shixia Liu
The base learners and labeled samples (shots) in an ensemble few-shot classifier greatly affect the model performance.
no code implementations • ACL 2022 • Ruipeng Jia, Xingxing Zhang, Yanan Cao, Shi Wang, Zheng Lin, Furu Wei
In zero-shot multilingual extractive text summarization, a model is typically trained on English summarization dataset and then applied on summarization datasets of other languages.
1 code implementation • ICLR 2022 • Liyuan Wang, Xingxing Zhang, Kuo Yang, Longhui Yu, Chongxuan Li, Lanqing Hong, Shifeng Zhang, Zhenguo Li, Yi Zhong, Jun Zhu
In this work, we propose memory replay with data compression (MRDC) to reduce the storage cost of old training samples and thus increase their amount that can be stored in the memory buffer.
2 code implementations • 29 Jan 2022 • Ming Zhong, Yang Liu, Suyu Ge, Yuning Mao, Yizhu Jiao, Xingxing Zhang, Yichong Xu, Chenguang Zhu, Michael Zeng, Jiawei Han
In this paper, we propose the first unsupervised multi-granularity summarization framework, GranuSum.
no code implementations • 7 Jan 2022 • Jian Jin, Xingxing Zhang, Lili Meng, Weisi Lin, Jie Liang, Huaxiang Zhang, Yao Zhao
Experimental results show that the VSD can be accurately estimated with the weights learnt by the nonlinear mapping function once its associated S-VSDs are available.
1 code implementation • 8 Sep 2021 • Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei
In this paper, we propose a contrastive learning model for supervised abstractive text summarization, where we view a document, its gold summary and its model generated summaries as different views of the same mean representation and maximize the similarities between them during training.
no code implementations • CVPR 2021 • Zhiqiang Fu, Yao Zhao, Dongxia Chang, Xingxing Zhang, Yiming Wang
This paper presents a novel, simple yet robust self-representation method, i. e., Double Low-Rank Representation with Projection Distance penalty (DLRRPD) for clustering.
1 code implementation • ACL 2022 • Shengqiang Zhang, Xingxing Zhang, Hangbo Bao, Furu Wei
In this paper, we find simply manipulating attention temperatures in Transformers can make pseudo labels easier to learn for student models.
no code implementations • 26 Apr 2021 • Zhiqiang Fu, Yao Zhao, Dongxia Chang, Xingxing Zhang, Yiming Wang
In this paper, a novel unsupervised low-rank representation model, i. e., Auto-weighted Low-Rank Representation (ALRR), is proposed to construct a more favorable similarity graph (SG) for clustering.
no code implementations • 16 Feb 2021 • Jian Jin, Xingxing Zhang, Xin Fu, huan zhang, Weisi Lin, Jian Lou, Yao Zhao
Experimental results on image classification demonstrate that we successfully find the JND for deep machine vision.
no code implementations • COLING 2020 • Shaohan Huang, Furu Wei, Lei Cui, Xingxing Zhang, Ming Zhou
Fine-tuning with pre-trained language models (e. g. BERT) has achieved great success in many language understanding tasks in supervised settings (e. g. text classification).
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei, Ming Zhou
We also find in experiments that our model is less dependent on sentence positions.
no code implementations • EMNLP 2020 • Mengyun Chen, Tao Ge, Xingxing Zhang, Furu Wei, Ming Zhou
We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection (ESD) and Erroneous Span Correction (ESC).
no code implementations • 2 Oct 2020 • Zhizhe Liu, Xingxing Zhang, Zhenfeng Zhu, Shuai Zheng, Yao Zhao, Jian Cheng
There have been numerous methods proposed for human identification, such as face identification, person re-identification, and gait identification.
no code implementations • EMNLP 2020 • Yanyan Zou, Xingxing Zhang, Wei Lu, Furu Wei, Ming Zhou
The main idea is that, given an input text artificially constructed from a document, a model is pre-trained to reinstate the original document.
no code implementations • 10 Feb 2020 • Fuzhen Li, Zhenfeng Zhu, Xingxing Zhang, Jian Cheng, Yao Zhao
In zero-shot learning (ZSL), the samples to be classified are usually projected into side information templates such as attributes.
no code implementations • 12 Dec 2019 • Zhenfeng Zhu, Yingying Meng, Deqiang Kong, Xingxing Zhang, Yandong Guo, Yao Zhao
Due to the deteriorated conditions of \mbox{illumination} lack and uneven lighting, nighttime images have lower contrast and higher noise than their daytime counterparts of the same scene, which limits seriously the performances of conventional background modeling methods.
1 code implementation • CVPR 2020 • Shuai Zheng, Zhenfeng Zhu, Xingxing Zhang, Zhizhe Liu, Jian Cheng, Yao Zhao
Graph representation learning aims to encode all nodes of a graph into low-dimensional vectors that will serve as input of many compute vision tasks.
Generative Adversarial Network
Graph Representation Learning
no code implementations • 28 Nov 2019 • Yawei Zhao, Qian Zhao, Xingxing Zhang, En Zhu, Xinwang Liu, Jianping Yin
We provide a new theoretical analysis framework, which shows an interesting observation, that is, the relation between the switching cost and the dynamic regret is different for settings of OA and OCO.
1 code implementation • 17 Nov 2019 • Xiaoze Jiang, Jing Yu, Zengchang Qin, Yingying Zhuang, Xingxing Zhang, Yue Hu, Qi Wu
More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values.
Ranked #6 on
Visual Dialog
on VisDial v0.9 val
1 code implementation • 16 Nov 2019 • Wenbin Li, Lei Wang, Xingxing Zhang, Lei Qi, Jing Huo, Yang Gao, Jiebo Luo
(2) how to narrow the distribution gap between clean and adversarial examples under the few-shot setting?
1 code implementation • 24 Oct 2019 • Xingxing Zhang, Zhenfeng Zhu, Yao Zhao
Given a set of hand-crafted local features, acquiring a global representation via aggregation is a promising technique to boost computational efficiency and improve task performance.
no code implementations • 24 Oct 2019 • Xingxing Zhang, Shupeng Gui, Zhenfeng Zhu, Yao Zhao, Ji Liu
Specifically, HPL is able to obtain discriminability on both seen and unseen class domains by learning visual prototypes respectively under the transductive setting.
no code implementations • 24 Oct 2019 • Xingxing Zhang, Shupeng Gui, Zhenfeng Zhu, Yao Zhao, Ji Liu
In this paper, we take an initial attempt, and propose a generic formulation to provide a systematical solution (named ATZSL) for learning a robust ZSL model.
no code implementations • 22 Oct 2019 • Zhizhe Liu, Xingxing Zhang, Zhenfeng Zhu, Shuai Zheng, Yao Zhao, Jian Cheng
The key to ZSL is to transfer knowledge from the seen to the unseen classes via auxiliary class attribute vectors.
no code implementations • ACL 2019 • Tao Ge, Xingxing Zhang, Furu Wei, Ming Zhou
Sequence-to-sequence (seq2seq) models have achieved tremendous success in text generation tasks.
no code implementations • ACL 2019 • Xingxing Zhang, Furu Wei, Ming Zhou
Neural extractive summarization models usually employ a hierarchical encoder for document encoding and they are trained using sentence-level labels, which are created heuristically using rule-based methods.
Ranked #7 on
Extractive Text Summarization
on CNN / Daily Mail
no code implementations • EMNLP 2018 • Xingxing Zhang, Mirella Lapata, Furu Wei, Ming Zhou
Extractive summarization models require sentence-level labels, which are usually created heuristically (e. g., with rule-based methods) given that most summarization datasets only have document-summary pairs.
Ranked #11 on
Extractive Text Summarization
on CNN / Daily Mail
no code implementations • CONLL 2017 • Clara Vania, Xingxing Zhang, Adam Lopez
This paper presents our submissions for the CoNLL 2017 UD Shared Task.
1 code implementation • EMNLP 2017 • Xingxing Zhang, Mirella Lapata
Sentence simplification aims to make sentences easier to read and understand.
Ranked #3 on
Text Simplification
on ASSET
1 code implementation • EACL 2017 • Xingxing Zhang, Jianpeng Cheng, Mirella Lapata
Conventional graph-based dependency parsers guarantee a tree structure both during training and inference.
1 code implementation • NAACL 2016 • Xingxing Zhang, Liang Lu, Mirella Lapata
Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have been successfully applied to a variety of sequence modeling tasks.