no code implementations • 4 Mar 2024 • Kunyu Shi, Qi Dong, Luis Goncalves, Zhuowen Tu, Stefano Soatto
Sequence-to-sequence vision-language models are showing promise, but their applicability is limited by their inference latency due to their autoregressive way of generating predictions.
no code implementations • 17 May 2022 • Thomas Delteil, Edouard Belval, Lei Chen, Luis Goncalves, Vijay Mahadevan
In these, text semantics and visual information supplement each other to provide a global understanding of the document.
no code implementations • 6 Jan 2022 • Pengkai Zhu, Zhaowei Cai, Yuanjun Xiong, Zhuowen Tu, Luis Goncalves, Vijay Mahadevan, Stefano Soatto
We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model.