Search Results for author: Jiangang Bai

Found 6 papers, 4 papers with code

Enhancing Self-Attention with Knowledge-Assisted Attention Maps

no code implementations • NAACL 2022 • Jiangang Bai, Yujing Wang, Hong Sun, Ruonan Wu, Tianmeng Yang, Pengfei Tang, Defu Cao, Mingliang Zhang1, Yunhai Tong, Yaming Yang, Jing Bai, Ruofei Zhang, Hao Sun, Wei Shen

Large-scale pre-trained language models have attracted extensive attentions in the research community and shown promising results on various tasks of natural language processing.

Multi-Task Learning Natural Language Understanding

Paper
Add Code

Convolution-enhanced Evolving Attention Networks

1 code implementation • 16 Dec 2022 • Yujing Wang, Yaming Yang, Zhuo Li, Jiangang Bai, Mingliang Zhang, Xiangtai Li, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong

To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps.

Image Classification Machine Translation +3

Paper
Code

Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees

1 code implementation • EACL 2021 • Jiangang Bai, Yujing Wang, Yiren Chen, Yaming Yang, Jing Bai, Jing Yu, Yunhai Tong

Pre-trained language models like BERT achieve superior performances in various NLP tasks without explicit consideration of syntactic information.

Natural Language Understanding

Paper
Code

Evolving Attention with Residual Convolutions

2 code implementations • 20 Feb 2021 • Yujing Wang, Yaming Yang, Jiangang Bai, Mingliang Zhang, Jing Bai, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong

In this paper, we propose a novel and generic mechanism based on evolving attention to improve the performance of transformers.

Image Classification Machine Translation +2

Paper
Code

Predictive Attention Transformer: Improving Transformer with Attention Map Prediction

no code implementations • 1 Jan 2021 • Yujing Wang, Yaming Yang, Jiangang Bai, Mingliang Zhang, Jing Bai, Jing Yu, Ce Zhang, Yunhai Tong

Instead, we model their dependencies via a chain of prediction models that take previous attention maps as input to predict the attention maps of a new layer through convolutional neural networks.

de-en Machine Translation

Paper
Add Code

Improving BERT with Self-Supervised Attention

1 code implementation • 8 Apr 2020 • Yiren Chen, Xiaoyu Kou, Jiangang Bai, Yunhai Tong

One of the most popular paradigms of applying large pre-trained NLP models such as BERT is to fine-tune it on a smaller dataset.

Sentence

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.