Search Results for author: Sayak Paul

Found 15 papers, 8 papers with code

FiVL: A Framework for Improved Vision-Language Alignment

no code implementations19 Dec 2024 Estelle Aflalo, Gabriela Ben Melech Stan, Tiep Le, Man Luo, Shachar Rosenman, Sayak Paul, Shao-Yen Tseng, Vasudev Lal

Large Vision Language Models (LVLMs) have achieved significant progress in integrating visual and textual inputs for multimodal reasoning.

Answer Generation Multimodal Reasoning +2

A Noise is Worth Diffusion Guidance

no code implementations5 Dec 2024 Donghoon Ahn, Jiwon Kang, SangHyun Lee, Jaewon Min, Minjae Kim, Wooseok Jang, Hyoungwon Cho, Sayak Paul, SeonHwa Kim, Eunju Cha, Kyong Hwan Jin, Seungryong Kim

Observing that noise obtained via diffusion inversion can reconstruct high-quality images without guidance, we focus on the initial noise of the denoising pipeline.

Denoising Image Generation

FastRM: An efficient and automatic explainability framework for multimodal generative models

no code implementations2 Dec 2024 Gabriela Ben-Melech Stan, Estelle Aflalo, Man Luo, Shachar Rosenman, Tiep Le, Sayak Paul, Shao-Yen Tseng, Vasudev Lal

While Large Vision Language Models (LVLMs) have become masterly capable in reasoning over human prompts and visual inputs, they are still prone to producing responses that contain misinformation.

Misinformation

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

1 code implementation24 Aug 2024 Chansung Park, Juyong Jiang, Fan Wang, Sayak Paul, Jing Tang

The widespread adoption of cloud-based proprietary large language models (LLMs) has introduced significant challenges, including operational dependencies, privacy concerns, and the necessity of continuous internet connectivity.

Language Modeling Language Modelling

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

no code implementations10 Jun 2024 Jiwoo Hong, Sayak Paul, Noah Lee, Kashif Rasul, James Thorne, Jongheon Jeong

In this paper, we focus on the alignment of recent text-to-image diffusion models, such as Stable Diffusion XL (SDXL), and find that this "reference mismatch" is indeed a significant problem in aligning these models due to the unstructured nature of visual modalities: e. g., a preference for a particular stylistic aspect can easily induce such a discrepancy.

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

1 code implementation1 Apr 2024 Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Aflalo, Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, Yezhou Yang

One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt.

Spatial Reasoning

DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models

no code implementations27 Feb 2024 Shyam Marjit, Harshit Singh, Nityanand Mathur, Sayak Paul, Chia-Mu Yu, Pin-Yu Chen

In the realm of subject-driven text-to-image (T2I) generative models, recent developments like DreamBooth and BLIP-Diffusion have led to impressive results yet encounter limitations due to their intensive fine-tuning demands and substantial parameter requirements.

Image Generation parameter-efficient fine-tuning

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

1 code implementation10 Jan 2024 Junsong Chen, Yue Wu, Simian Luo, Enze Xie, Sayak Paul, Ping Luo, Hang Zhao, Zhenguo Li

As a state-of-the-art, open-source image generation model, PIXART-{\delta} offers a promising alternative to the Stable Diffusion family of models, contributing significantly to text-to-image synthesis.

Image Generation

Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss

1 code implementation5 Jan 2024 Yatharth Gupta, Vishnu V. Jaddipal, Harish Prabhala, Sayak Paul, Patrick von Platen

In this work, we introduce two scaled-down variants, Segmind Stable Diffusion (SSD-1B) and Segmind-Vega, with 1. 3B and 0. 74B parameter UNets, respectively, achieved through progressive removal using layer-level losses focusing on reducing the model size while preserving generative quality.

Knowledge Distillation

Flood Segmentation on Sentinel-1 SAR Imagery with Semi-Supervised Learning

1 code implementation NeurIPS Workshop AI4Scien 2021 Sayak Paul, Siddha Ganju

Floods wreak havoc throughout the world, causing billions of dollars in damages, and uprooting communities, ecosystems and economies.

Disaster Response Semantic Segmentation

Vision Transformers are Robust Learners

1 code implementation17 May 2021 Sayak Paul, Pin-Yu Chen

Transformers, composed of multiple self-attention layers, hold strong promises toward a generic learning primitive applicable to different data modalities, including the recent breakthroughs in computer vision achieving state-of-the-art (SOTA) standard accuracy.

Anomaly Detection Image Classification +1

G-SimCLR: Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling

1 code implementation28 Sep 2020 Souradip Chakraborty, Aritra Roy Gosthipaty, Sayak Paul

In this work, we propose that, with the normalized temperature-scaled cross-entropy (NT-Xent) loss function (as used in SimCLR), it is beneficial to not have images of the same category in the same batch.

Contrastive Learning Denoising +2

G-SimCLR : Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling

1 code implementation25 Sep 2020 Souradip Chakraborty, Aritra Roy Gosthipaty, Sayak Paul

In this work, we propose that, with the normalized temperature-scaled cross-entropy (NT-Xent) loss function (as used in SimCLR), it is beneficial to not have images of the same category in the same batch.

Contrastive Learning Denoising +1

Cannot find the paper you are looking for? You can Submit a new open access paper.