1 code implementation • 4 Apr 2023 • Junzhu Mao, Yazhou Yao, Zeren Sun, Xingguo Huang, Fumin Shen, Heng-Tao Shen
Then we combine the similarity and first-order gradients of key tokens along the query dimension for token importance estimation and remove redundant key and value tokens to further reduce the inference complexity.