Graph Convolutions Enrich the Self-Attention in Transformers!

Transformers, renowned for their self-attention mechanism, have achieved state-of-the-art performance across various tasks in natural language processing, computer vision, time-series modeling, etc. However, one of the challenges with deep Transformer models is the oversmoothing problem, where representations across layers converge to indistinguishable values, leading to significant performance degradation. We interpret the original self-attention as a simple graph filter and redesign it from a graph signal processing (GSP) perspective. We propose a graph-filter-based self-attention (GFSA) to learn a general yet effective one, whose complexity, however, is slightly larger than that of the original self-attention mechanism. We demonstrate that GFSA improves the performance of Transformers in various fields, including computer vision, natural language processing, graph-level tasks, speech recognition, and code classification.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Clone Detection CodeXGLUE - BigCloneBench CodeT5-base F1 94.31 # 5
Clone Detection CodeXGLUE - BigCloneBench CodeT5-base + GFSA F1 94.92 # 3
Clone Detection CodeXGLUE - BigCloneBench CodeT5-small F1 94.36 # 4
Defect Detection CodeXGLUE - Devign CodeT5-small Accuracy 63.25 # 8
Defect Detection CodeXGLUE - Devign PLBART + GFSA Accuracy 62.96 # 9
Defect Detection CodeXGLUE - Devign PLBART Accuracy 62.63 # 11
Defect Detection CodeXGLUE - Devign CodeT5-base Accuracy 63.51 # 7
Defect Detection CodeXGLUE - Devign CodeBERT Accuracy 64.31 # 5
Defect Detection CodeXGLUE - Devign RoBERTa + GFSA Accuracy 64.39 # 4
Defect Detection CodeXGLUE - Devign RoBERTa Accuracy 62.88 # 10
Defect Detection CodeXGLUE - Devign CodeBERT + GFSA Accuracy 64.49 # 3
Defect Detection CodeXGLUE - Devign CodeT5-base + GFSA Accuracy 64.75 # 2
Defect Detection CodeXGLUE - Devign CodeT5-small + GFSA Accuracy 63.69 # 6
Image Classification ImageNet DeiT-S-12 + GFSA Top 1 Accuracy 81.1% # 655
Image Classification ImageNet Swin-S + GFSA Top 1 Accuracy 83% # 473
Image Classification ImageNet CaiT-S + GFSA Top 1 Accuracy 82.8% # 490
Image Classification ImageNet DeiT-S-24 + GFSA Top 1 Accuracy 81.5% # 624
Speech Recognition LibriSpeech 100h test-clean Branchformer + GFSA Word Error Rate (WER) 9.6 # 1
Speech Recognition LibriSpeech 100h test-other Branchformer + GFSA Word Error Rate (WER) 22.25 # 1
Speech Recognition LibriSpeech test-clean Branchformer + GFSA Word Error Rate (WER) 2.11 # 30
Speech Recognition LibriSpeech test-other Branchformer + GFSA Word Error Rate (WER) 4.94 # 29
Graph Regression PCQM4M-LSC Graphormer + GFSA Validation MAE 0.1193 # 3
Graph Regression PCQM4Mv2-LSC Graphormer + GFSA Validation MAE 0.0860 # 11

Methods