no code implementations • 25 Sep 2019 • Xindian Ma, Peng Zhang, Xiaoliu Mao, Yehua Zhang, Nan Duan, Yuexian Hou, Ming Zhou.
Then, we show that the lower bound of such a separation rank can reveal the quantitative relation between the network structure (e. g. depth/width) and the modeling ability for the contextual dependency.
no code implementations • 25 Sep 2019 • Peng Zhang, Xiaoliu Mao, Xindian Ma, Benyou Wang, Jing Zhang, Jun Wang, Dawei Song
We prove that by a mapping (via the trace operator) on the high-dimensional matching matrix, a low-dimensional attention matrix can be derived.