no code implementations • EMNLP 2021 • Fuli Luo, Pengcheng Yang, Shicheng Li, Xuancheng Ren, Xu sun, Songfang Huang, Fei Huang
Pre-trained self-supervised models such as BERT have achieved striking success in learning sequence representations, especially for natural language processing.
3 code implementations • 15 Jul 2024 • An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, TianHao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Xuejing Liu, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zhifang Guo, Zhihao Fan
This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models.
2 code implementations • 28 Sep 2023 • Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou, Tianhang Zhu
Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans.
Ranked #3 on Multi-Label Text Classification on CC3M-TagMask
1 code implementation • 19 Dec 2022 • Junyang Lin, Xuancheng Ren, Yichang Zhang, Gao Liu, Peng Wang, An Yang, Chang Zhou
This paper proposes a new method, OFA-OCR, to transfer multimodal pretrained models to text recognition.
1 code implementation • 8 Dec 2022 • Jinze Bai, Rui Men, Hao Yang, Xuancheng Ren, Kai Dang, Yichang Zhang, Xiaohuan Zhou, Peng Wang, Sinan Tan, An Yang, Zeyu Cui, Yu Han, Shuai Bai, Wenbin Ge, Jianxin Ma, Junyang Lin, Jingren Zhou, Chang Zhou
As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data.
no code implementations • 28 Oct 2022 • Fenglin Liu, Xian Wu, Shen Ge, Xuancheng Ren, Wei Fan, Xu sun, Yuexian Zou
To enhance the correlation between vision and language in disentangled spaces, we introduce the visual concepts to DiMBERT which represent visual information in textual format.
no code implementations • 19 Oct 2022 • Fenglin Liu, Xuancheng Ren, Xian Wu, Wei Fan, Yuexian Zou, Xu sun
Especially for image captioning, the attention based models are expected to ground correct image regions with proper generated words.
1 code implementation • 11 Oct 2022 • Lei LI, Yankai Lin, Xuancheng Ren, Guangxiang Zhao, Peng Li, Jie zhou, Xu sun
We then design a Model Uncertainty--aware Knowledge Integration (MUKI) framework to recover the golden supervision for the student.
1 code implementation • 4 Jun 2022 • Shuhuai Ren, Lei LI, Xuancheng Ren, Guangxiang Zhao, Xu sun
However, evaluating the openness of CLIP-like models is challenging, as the models are open to arbitrary vocabulary in theory, but their accuracy varies in practice.
no code implementations • Findings (ACL) 2022 • Shaoxiong Feng, Xuancheng Ren, Kan Li, Xu sun
However, for the continual increase of online chit-chat scenarios, directly fine-tuning these models for each of the new tasks not only explodes the capacity of the dialogue system on the embedded devices but also causes knowledge forgetting on pre-trained models and knowledge interference among diverse dialogue tasks.
no code implementations • 14 Dec 2021 • Lei LI, Yankai Lin, Xuancheng Ren, Guangxiang Zhao, Peng Li, Jie zhou, Xu sun
As many fine-tuned pre-trained language models~(PLMs) with promising performance are generously released, investigating better ways to reuse these models is vital as it can greatly reduce the retraining computational cost and the potential environmental side-effects.
1 code implementation • 13 Oct 2021 • Guangxiang Zhao, Wenkai Yang, Xuancheng Ren, Lei LI, Yunfang Wu, Xu sun
The conventional wisdom behind learning deep classification models is to focus on bad-classified examples and ignore well-classified examples that are far from the decision boundary.
1 code implementation • NeurIPS 2021 • Deli Chen, Yankai Lin, Guangxiang Zhao, Xuancheng Ren, Peng Li, Jie zhou, Xu sun
The class imbalance problem, as an important issue in learning node representations, has drawn increasing attention from the community.
no code implementations • 7 Sep 2021 • Zhiyuan Zhang, Ruixuan Luo, Xuancheng Ren, Qi Su, Liangyou Li, Xu sun
To enhance neural networks, we propose the adversarial parameter defense algorithm that minimizes the average risk of multiple adversarial parameter corruptions.
no code implementations • Findings (ACL) 2021 • Fenglin Liu, Xuancheng Ren, Xian Wu, Bang Yang, Shen Ge, Yuexian Zou, Xu sun
Video captioning combines video understanding and language generation.
no code implementations • NAACL 2021 • Zhiyuan Zhang, Xuancheng Ren, Qi Su, Xu sun, Bin He
Motivated by neuroscientific evidence and theoretical results, we demonstrate that side effects can be controlled by the number of changed parameters and thus, we propose to conduct \textit{neural network surgery} by only modifying a limited number of parameters.
1 code implementation • NAACL 2021 • Kaiyuan Liao, Yi Zhang, Xuancheng Ren, Qi Su, Xu sun, Bin He
We first take into consideration all the linguistic information embedded in the past layers and then take a further step to engage the future information which is originally inaccessible for predictions.
no code implementations • 15 May 2021 • Fenglin Liu, Xuancheng Ren, Zhiyuan Zhang, Xu sun, Yuexian Zou
In this work, we investigate how the scale factors in the effectiveness of the skip connection and reveal that a trivial adjustment of the scale will lead to spurious gradient exploding or vanishing in line with the deepness of the models, which could be addressed by normalization, in particular, layer normalization, which induces consistent improvements over the plain skip connection.
1 code implementation • NAACL 2021 • Wenkai Yang, Lei LI, Zhiyuan Zhang, Xuancheng Ren, Xu sun, Bin He
However, in this paper, we find that it is possible to hack the model in a data-free way by modifying one single word embedding vector, with almost no accuracy sacrificed on clean samples.
no code implementations • 22 Feb 2021 • Shaoxiong Feng, Xuancheng Ren, Kan Li, Xu sun
The finding of general knowledge is further hindered by the unidirectional distillation, as the student should obey the teacher and may discard some knowledge that is truly general but refuted by the teacher.
no code implementations • 1 Jan 2021 • Guangxiang Zhao, Lei LI, Xuancheng Ren, Xu sun, Bin He
We find in practice that the high-likelihood area contains correct predictions for tail classes and it plays a vital role in learning imbalanced class distributions.
no code implementations • 14 Dec 2020 • Deli Chen, Yankai Lin, Lei LI, Xuancheng Ren, Peng Li, Jie zhou, Xu sun
Graph Contrastive Learning (GCL) has proven highly effective in promoting the performance of Semi-Supervised Node Classification (SSNC).
no code implementations • COLING 2020 • Fenglin Liu, Xuancheng Ren, Zhiyuan Zhang, Xu sun, Yuexian Zou
In this work, we investigate how the scale factors in the effectiveness of the skip connection and reveal that a trivial adjustment of the scale will lead to spurious gradient exploding or vanishing in line with the deepness of the models, which could by addressed by normalization, in particular, layer normalization, which induces consistent improvements over the plain skip connection.
no code implementations • NeurIPS 2020 • Fenglin Liu, Xuancheng Ren, Xian Wu, Shen Ge, Wei Fan, Yuexian Zou, Xu sun
Especially for image captioning, the attention based models are expected to ground correct image regions with proper generated words.
no code implementations • 13 Oct 2020 • Fuli Luo, Pengcheng Yang, Shicheng Li, Xuancheng Ren, Xu sun
Pre-trained self-supervised models such as BERT have achieved striking success in learning sequence representations, especially for natural language processing.
no code implementations • EMNLP 2020 • Shaoxiong Feng, Xuancheng Ren, Hongshen Chen, Bin Sun, Kan Li, Xu sun
Human dialogues are scenario-based and appropriate responses generally relate to the latent context knowledge entailed by the specific scenario.
no code implementations • 16 Sep 2020 • Shaoxiong Feng, Hongshen Chen, Xuancheng Ren, Zhuoye Ding, Kan Li, Xu sun
Collaborative learning has successfully applied knowledge transfer to guide a pool of small student networks towards robust local minima.
1 code implementation • 10 Jun 2020 • Xu Sun, Zhiyuan Zhang, Xuancheng Ren, Ruixuan Luo, Liangyou Li
We argue that the vulnerability of model parameters is of crucial value to the study of model robustness and generalization but little research has been devoted to understanding this matter.
no code implementations • 16 May 2020 • Fenglin Liu, Xuancheng Ren, Guangxiang Zhao, Chenyu You, Xuewei Ma, Xian Wu, Xu sun
While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information.
no code implementations • 28 Feb 2020 • Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Kai Lei, Xu sun
Recently, attention-based encoder-decoder models have been used extensively in image captioning.
2 code implementations • 25 Dec 2019 • Guangxiang Zhao, Junyang Lin, Zhiyuan Zhang, Xuancheng Ren, Qi Su, Xu sun
Self-attention based Transformer has demonstrated the state-of-the-art performances in a number of natural language processing tasks.
2 code implementations • 27 Oct 2019 • Jianbang Ding, Xuancheng Ren, Ruixuan Luo, Xu sun
The dynamic learning rate bounds are based on the exponential moving averages of the adaptive learning rates themselves, which smooth out unexpected large learning rates and stabilize the training of deep neural networks.
no code implementations • 25 Sep 2019 • Guangxiang Zhao, Junyang Lin, Zhiyuan Zhang, Xuancheng Ren, Xu sun
Extensive experimental results on a series of natural language processing tasks, including neural machine translation, image captioning, and language modeling, all demonstrate the advantages of Sparse Transformer in model performance.
4 code implementations • 27 Jun 2019 • Ruixuan Luo, Jingjing Xu, Yi Zhang, Zhiyuan Zhang, Xuancheng Ren, Xu sun
Through this method, we generate synthetic data using a large amount of unlabeled data in the target domain and then obtain a word segmentation model for the target domain.
1 code implementation • ACL 2019 • Chen Wu, Xuancheng Ren, Fuli Luo, Xu sun
Unsupervised text style transfer aims to alter text styles while preserving the content, without aligned data for supervision.
no code implementations • 24 May 2019 • Zhiyuan Zhang, Pengcheng Yang, Xuancheng Ren, Qi Su, Xu sun
Neural network learning is usually time-consuming since backpropagation needs to compute full gradients and backpropagate them across multiple layers.
1 code implementation • NeurIPS 2019 • Fenglin Liu, Yuanxin Liu, Xuancheng Ren, Xiaodong He, Xu sun
In vision-and-language grounding problems, fine-grained representations of the image are considered to be of paramount importance.
1 code implementation • EMNLP 2018 • Jingjing Xu, Xuancheng Ren, Junyang Lin, Xu sun
Existing text generation methods tend to produce repeated and {''}boring{''} expressions.
no code implementations • 11 Sep 2018 • Shu Liu, Jingjing Xu, Xuancheng Ren, Xu sun
To evaluate the effectiveness of the proposed model, we build a large-scale rationality evaluation dataset.
1 code implementation • EMNLP 2018 • Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Houfeng Wang, Xu sun
The encode-decoder framework has shown recent success in image captioning.
1 code implementation • NAACL 2019 • Guangxiang Zhao, Jingjing Xu, Qi Zeng, Xuancheng Ren
This task requires the system to identify multiple styles of music based on its reviews on websites.
1 code implementation • EMNLP 2018 • Junyang Lin, Xu sun, Xuancheng Ren, Muyu Li, Qi Su
Most of the Neural Machine Translation (NMT) models are based on the sequence-to-sequence (Seq2Seq) model with an encoder-decoder framework equipped with the attention mechanism.
Ranked #7 on Machine Translation on IWSLT2015 English-Vietnamese
1 code implementation • EMNLP 2018 • Jingjing Xu, Xuancheng Ren, Yi Zhang, Qi Zeng, Xiaoyan Cai, Xu sun
Compared to the state-of-the-art models, our skeleton-based model can generate significantly more coherent text according to human evaluation and automatic evaluation.
no code implementations • 16 Aug 2018 • Wei Li, Xuancheng Ren, Damai Dai, Yunfang Wu, Houfeng Wang, Xu sun
In the experiments, we take a real-world sememe knowledge base HowNet and the corresponding descriptions of the words in Baidu Wiki for training and evaluation.
1 code implementation • COLING 2018 • Junyang Lin, Xu sun, Xuancheng Ren, Shuming Ma, Jinsong Su, Qi Su
A great proportion of sequence-to-sequence (Seq2Seq) models for Neural Machine Translation (NMT) adopt Recurrent Neural Network (RNN) to generate translation word by word following a sequential order.
Ranked #9 on Machine Translation on IWSLT2015 English-Vietnamese
1 code implementation • ACL 2018 • Jingjing Xu, Xu sun, Qi Zeng, Xuancheng Ren, Xiaodong Zhang, Houfeng Wang, Wenjie Li
We evaluate our approach on two review datasets, Yelp and Amazon.
Ranked #6 on Unsupervised Text Style Transfer on Yelp
no code implementations • 10 May 2018 • Bingzhen Wei, Xuancheng Ren, Xu sun, Yi Zhang, Xiaoyan Cai, Qi Su
Especially, the proposed approach improves the semantic consistency by 4\% in terms of human evaluation.
no code implementations • 3 May 2018 • Shuming Ma, Xu sun, Junyang Lin, Xuancheng Ren
Text summarization and sentiment classification both aim to capture the main ideas of the text but at different levels.
no code implementations • NAACL 2018 • Ji Wen, Xu sun, Xuancheng Ren, Qi Su
In this paper, we propose the task of relation classification for Chinese literature text.
1 code implementation • NAACL 2018 • Shuming Ma, Xu sun, Wei Li, Sujian Li, Wenjie Li, Xuancheng Ren
The existing sequence-to-sequence model tends to memorize the words and the patterns in the training dataset instead of learning the meaning of the words.
3 code implementations • 5 Feb 2018 • Jingjing Xu, Xuancheng Ren, Junyang Lin, Xu sun
Existing text generation methods tend to produce repeated and "boring" expressions.
1 code implementation • LREC 2018 • Xuancheng Ren, Xu sun, Ji Wen, Bingzhen Wei, Weidong Zhan, Zhiyuan Zhang
Web 2. 0 has brought with it numerous user-produced data revealing one's thoughts, experiences, and knowledge, which are a great source for many tasks, such as information extraction, and knowledge base construction.
1 code implementation • 28 Nov 2017 • Xuancheng Ren, Xu sun
In the training of transition-based dependency parsers, an oracle is used to predict a transition sequence for a sentence and its gold tree.
no code implementations • 25 Nov 2017 • Xu Sun, Weiwei Sun, Shuming Ma, Xuancheng Ren, Yi Zhang, Wenjie Li, Houfeng Wang
The decoding of the complex structure model is regularized by the additionally trained simple structure model.
1 code implementation • COLING 2018 • Yi Zhang, Xu sun, Shuming Ma, Yang Yang, Xuancheng Ren
In our work, we first design a new model called "high order LSTM" to predict multiple tags for the current token which contains not only the current tag but also the previous several tags.
3 code implementations • 17 Nov 2017 • Xu Sun, Xuancheng Ren, Shuming Ma, Bingzhen Wei, Wei Li, Jingjing Xu, Houfeng Wang, Yi Zhang
Based on the sparsified gradients, we further simplify the model by eliminating the rows or columns that are seldom updated, which will reduce the computational cost both in the training and decoding, and potentially accelerate decoding in real-world applications.
1 code implementation • ICLR 2018 • Xu Sun, Bingzhen Wei, Xuancheng Ren, Shuming Ma
We propose a method, called Label Embedding Network, which can learn label representation (label embedding) during the training process of deep networks.
no code implementations • 18 Sep 2017 • Bingzhen Wei, Xu sun, Xuancheng Ren, Jingjing Xu
As traditional neural network consumes a significant amount of computing resources during back propagation, \citet{Sun2017mePropSB} propose a simple yet effective technique to alleviate this problem.
2 code implementations • ICML 2017 • Xu Sun, Xuancheng Ren, Shuming Ma, Houfeng Wang
In back propagation, only a small subset of the full gradient is computed to update the model parameters.
4 code implementations • 29 Mar 2015 • Xu Sun, Shuming Ma, Yi Zhang, Xuancheng Ren
We show that this method with fast training and theoretical guarantee of convergence, which is easy to implement, can support search-based optimization and obtain top accuracy.