Search Results for author: Sugato Basu

Found 16 papers, 8 papers with code

KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

no code implementations • 28 May 2023 • Zhiwei Jia, Pradyumna Narayana, Arjun R. Akula, Garima Pruthi, Hao Su, Sugato Basu, Varun Jampani

Image ad understanding is a crucial task with wide real-world applications.

Paper
Add Code

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

1 code implementation • NeurIPS 2023 • Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang

When combined with a downstream image generation model, LayoutGPT outperforms text-to-image models/systems by 20-40% and achieves comparable performance as human users in designing visual layouts for numerical and spatial correctness.

Indoor Scene Synthesis Text-to-Image Generation

235

Paper
Code

Discriminative Diffusion Models as Few-shot Vision and Language Learners

1 code implementation • 18 May 2023 • Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation.

Image-text matching Text Matching +1

Paper
Code

MetaCLUE: Towards Comprehensive Visual Metaphors Research

no code implementations • CVPR 2023 • Arjun R. Akula, Brendan Driscoll, Pradyumna Narayana, Soravit Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas Guibas, William T. Freeman, Yuanzhen Li, Varun Jampani

Towards this goal, we introduce MetaCLUE, a set of vision tasks on visual metaphor.

Image Generation Question Answering +1

Paper
Add Code

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

1 code implementation • 9 Dec 2022 • Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang

In this work, we improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions.

Attribute Image Generation

292

Paper
Code

CPL: Counterfactual Prompt Learning for Vision and Language Models

no code implementations • 19 Oct 2022 • Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP.

counterfactual Visual Question Answering

Paper
Add Code

Diagnosing Vision-and-Language Navigation: What Really Matters

1 code implementation • NAACL 2022 • Wanrong Zhu, Yuankai Qi, Pradyumna Narayana, Kazoo Sone, Sugato Basu, Xin Eric Wang, Qi Wu, Miguel Eckstein, William Yang Wang

Results show that indoor navigation agents refer to both object and direction tokens when making decisions.

Object Vision and Language Navigation

Paper
Code

A Framework for Deep Constrained Clustering

1 code implementation • 7 Jan 2021 • Hongjing Zhang, Tianyang Zhan, Sugato Basu, Ian Davidson

A fundamental strength of deep learning is its flexibility, and here we explore a deep learning framework for constrained clustering and in particular explore how it can extend the field of constrained clustering.

Constrained Clustering

Paper
Code

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

no code implementations • EMNLP 2020 • Wanrong Zhu, Xin Eric Wang, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang

A major challenge in visually grounded language generation is to build robust benchmark datasets and models that can generalize well in real-world settings.

Text Generation

Paper
Add Code

Leveraging Organizational Resources to Adapt Models to New Data Modalities

no code implementations • 23 Aug 2020 • Sahaana Suri, Raghuveer Chanda, Neslihan Bulut, Pradyumna Narayana, Yemao Zeng, Peter Bailis, Sugato Basu, Girija Narlikar, Christopher Re, Abishek Sethi

As applications in large organizations evolve, the machine learning (ML) models that power them must adapt the same predictive tasks to newly arising data modalities (e. g., a new video content launch in a social media application requires existing text or image models to extend to video).

Paper
Add Code

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

1 code implementation • EACL 2021 • Wanrong Zhu, Xin Eric Wang, Tsu-Jui Fu, An Yan, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang

Outdoor vision-and-language navigation (VLN) is such a task where an agent follows natural language instructions and navigates a real-life urban environment.

Ranked #4 on Vision and Language Navigation on Touchdown Dataset (using extra training data)

Style Transfer Text Style Transfer +1

Paper
Code

HUSE: Hierarchical Universal Semantic Embeddings

8 code implementations • 14 Nov 2019 • Pradyumna Narayana, Aniket Pednekar, Abishek Krishnamoorthy, Kazoo Sone, Sugato Basu

The works in the domain of visual semantic embeddings address this problem by first constructing a semantic embedding space based on some external knowledge and projecting image embeddings onto this fixed semantic embedding space.

General Classification Representation Learning +1

Paper
Code

A Framework for Deep Constrained Clustering -- Algorithms and Advances

1 code implementation • 29 Jan 2019 • Hongjing Zhang, Sugato Basu, Ian Davidson

The area of constrained clustering has been extensively explored by researchers and used by practitioners.

Constrained Clustering

Paper
Code

Interpretable Neural Architectures for Attributing an Ad's Performance to its Writing Style

no code implementations • WS 2018 • Reid Pryzant, Sugato Basu, Kazoo Sone

How much does {``}free shipping!

Interpretable Machine Learning

Paper
Add Code

Micro-Browsing Models for Search Snippets

no code implementations • 18 Oct 2018 • Muhammad Asiful Islam, Ramakrishnan Srikant, Sugato Basu

CTR of a result has two core components: (a) the probability of examination of a result by a user, and (b) the perceived relevance of the result given that it has been examined by the user.

Paper
Add Code

Graphical RNN Models

no code implementations • 15 Dec 2016 • Ashish Bora, Sugato Basu, Joydeep Ghosh

Many time series are generated by a set of entities that interact with one another over time.

Time Series Time Series Analysis

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.