1 code implementation • 13 Oct 2021 • Srinadh Bhojanapalli, Ayan Chakrabarti, Andreas Veit, Michal Lukasik, Himanshu Jain, Frederick Liu, Yin-Wen Chang, Sanjiv Kumar
Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision.
no code implementations • EMNLP 2021 • Pu-Chin Chen, Henry Tsai, Srinadh Bhojanapalli, Hyung Won Chung, Yin-Wen Chang, Chun-Sung Ferng
Our analysis shows that the gain actually comes from moving positional information to attention layer from the input.
no code implementations • NeurIPS 2020 • Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar
We propose sufficient conditions under which we prove that a sparse attention model can universally approximate any sequence-to-sequence function.
no code implementations • NeurIPS 2020 • Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
We propose sufficient conditions under which we prove that a sparse attention model can universally approximate any sequence-to-sequence function.
no code implementations • ICLR 2020 • Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, Sanjiv Kumar
We consider the large-scale query-document retrieval problem: given a query (e. g., a question), return the set of relevant documents (e. g., paragraphs containing the answer) from a large document corpus.
no code implementations • EMNLP 2017 • Yin-Wen Chang, Michael Collins
The algorithm produces a translation by processing the source-language sentence in strictly left-to-right order, differing from commonly used approaches that build the target-language sentence in left-to-right order.
no code implementations • TACL 2017 • Yin-Wen Chang, Michael Collins
Decoding of phrase-based translation models in the general case is known to be NP-complete, by a reduction from the traveling salesman problem (Knight, 1999).