no code implementations • 28 May 2023 • Zhiwei Jia, Pradyumna Narayana, Arjun R. Akula, Garima Pruthi, Hao Su, Sugato Basu, Varun Jampani
Image ad understanding is a crucial task with wide real-world applications.
1 code implementation • NeurIPS 2023 • Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang
When combined with a downstream image generation model, LayoutGPT outperforms text-to-image models/systems by 20-40% and achieves comparable performance as human users in designing visual layouts for numerical and spatial correctness.
1 code implementation • 18 May 2023 • Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang
Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation.
no code implementations • CVPR 2023 • Arjun R. Akula, Brendan Driscoll, Pradyumna Narayana, Soravit Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas Guibas, William T. Freeman, Yuanzhen Li, Varun Jampani
Towards this goal, we introduce MetaCLUE, a set of vision tasks on visual metaphor.
1 code implementation • 9 Dec 2022 • Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang
In this work, we improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions.
no code implementations • 19 Oct 2022 • Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang
Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP.
1 code implementation • NAACL 2022 • Wanrong Zhu, Yuankai Qi, Pradyumna Narayana, Kazoo Sone, Sugato Basu, Xin Eric Wang, Qi Wu, Miguel Eckstein, William Yang Wang
Results show that indoor navigation agents refer to both object and direction tokens when making decisions.
1 code implementation • 7 Jan 2021 • Hongjing Zhang, Tianyang Zhan, Sugato Basu, Ian Davidson
A fundamental strength of deep learning is its flexibility, and here we explore a deep learning framework for constrained clustering and in particular explore how it can extend the field of constrained clustering.
no code implementations • EMNLP 2020 • Wanrong Zhu, Xin Eric Wang, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang
A major challenge in visually grounded language generation is to build robust benchmark datasets and models that can generalize well in real-world settings.
no code implementations • 23 Aug 2020 • Sahaana Suri, Raghuveer Chanda, Neslihan Bulut, Pradyumna Narayana, Yemao Zeng, Peter Bailis, Sugato Basu, Girija Narlikar, Christopher Re, Abishek Sethi
As applications in large organizations evolve, the machine learning (ML) models that power them must adapt the same predictive tasks to newly arising data modalities (e. g., a new video content launch in a social media application requires existing text or image models to extend to video).
1 code implementation • EACL 2021 • Wanrong Zhu, Xin Eric Wang, Tsu-Jui Fu, An Yan, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang
Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment.
Ranked #4 on Vision and Language Navigation on Touchdown Dataset (using extra training data)
8 code implementations • 14 Nov 2019 • Pradyumna Narayana, Aniket Pednekar, Abishek Krishnamoorthy, Kazoo Sone, Sugato Basu
The works in the domain of visual semantic embeddings address this problem by first constructing a semantic embedding space based on some external knowledge and projecting image embeddings onto this fixed semantic embedding space.
1 code implementation • 29 Jan 2019 • Hongjing Zhang, Sugato Basu, Ian Davidson
The area of constrained clustering has been extensively explored by researchers and used by practitioners.
no code implementations • 18 Oct 2018 • Muhammad Asiful Islam, Ramakrishnan Srikant, Sugato Basu
CTR of a result has two core components: (a) the probability of examination of a result by a user, and (b) the perceived relevance of the result given that it has been examined by the user.
no code implementations • 15 Dec 2016 • Ashish Bora, Sugato Basu, Joydeep Ghosh
Many time series are generated by a set of entities that interact with one another over time.