Search Results for author: Ajinkya Kale

Found 14 papers, 4 papers with code

Towards Enhanced Controllability of Diffusion Models

no code implementations28 Feb 2023 Wonwoong Cho, Hareesh Ravi, Midhun Harikumar, Vinh Khuc, Krishna Kumar Singh, Jingwan Lu, David I. Inouye, Ajinkya Kale

We rely on the inductive bias of the progressive denoising process of diffusion models to encode pose/layout information in the spatial structure mask and semantic/style information in the style code.

Denoising Image Manipulation +3

PRedItOR: Text Guided Image Editing with Diffusion Prior

no code implementations15 Feb 2023 Hareesh Ravi, Sachin Kelkar, Midhun Harikumar, Ajinkya Kale

We combine this with structure preserving edits on the image decoder using existing approaches such as reverse DDIM to perform text guided image editing.

Decoder text-guided-image-editing

Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models

1 code implementation CVPR 2023 Qiucheng Wu, Yujian Liu, Handong Zhao, Ajinkya Kale, Trung Bui, Tong Yu, Zhe Lin, Yang Zhang, Shiyu Chang

Based on this finding, we further propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.

Denoising Disentanglement

Fine-grained Image Captioning with CLIP Reward

1 code implementation Findings (NAACL) 2022 Jaemin Cho, Seunghyun Yoon, Ajinkya Kale, Franck Dernoncourt, Trung Bui, Mohit Bansal

Toward more descriptive and distinctive caption generation, we propose using CLIP, a multimodal encoder trained on huge image-text pairs from web, to calculate multimodal similarity and use it as a reward function.

Caption Generation Descriptive +5

StyleBabel: Artistic Style Tagging and Captioning

no code implementations10 Mar 2022 Dan Ruta, Andrew Gilbert, Pranav Aggarwal, Naveen Marri, Ajinkya Kale, Jo Briggs, Chris Speed, Hailin Jin, Baldo Faieta, Alex Filipkowski, Zhe Lin, John Collomosse

We present StyleBabel, a unique open access dataset of natural language captions and free-form tags describing the artistic style of over 135K digital artworks, collected via a novel participatory method from experts studying at specialist art and design schools.

Attribute Representation Learning +2

Towards Zero-shot Cross-lingual Image Retrieval and Tagging

2 code implementations15 Sep 2021 Pranav Aggarwal, Ritiz Tambi, Ajinkya Kale

There has been a recent spike in interest in multi-modal Language and Vision problems.

Image Retrieval Retrieval

Multimodal Contrastive Training for Visual Representation Learning

no code implementations CVPR 2021 Xin Yuan, Zhe Lin, Jason Kuen, Jianming Zhang, Yilin Wang, Michael Maire, Ajinkya Kale, Baldo Faieta

We first train our model on COCO and evaluate the learned visual representations on various downstream tasks including image classification, object detection, and instance segmentation.

Cross-Modal Retrieval Image Classification +6

Towards Zero-shot Cross-lingual Image Retrieval

1 code implementation24 Nov 2020 Pranav Aggarwal, Ajinkya Kale

There has been a recent spike in interest in multi-modal Language and Vision problems.

Image Retrieval Retrieval

Multi-Modal Retrieval using Graph Neural Networks

no code implementations4 Oct 2020 Aashish Kumar Misraa, Ajinkya Kale, Pranav Aggarwal, Ali Aminian

Most real world applications of image retrieval such as Adobe Stock, which is a marketplace for stock photography and illustrations, need a way for users to find images which are both visually (i. e. aesthetically) and conceptually (i. e. containing the same salient objects) as a query image.

Image Retrieval Re-Ranking +1

Towards Semantic Query Segmentation

no code implementations25 Jul 2017 Ajinkya Kale, Thrivikrama Taula, Sanjika Hewavitharana, Amit Srivastava

Query Segmentation is one of the critical components for understanding users' search intent in Information Retrieval tasks.

Information Retrieval Retrieval +1

Visual Search at eBay

no code implementations10 Jun 2017 Fan Yang, Ajinkya Kale, Yury Bubnov, Leon Stein, Qiaosong Wang, Hadi Kiapour, Robinson Piramuthu

We harness the availability of large image collection of eBay listings and state-of-the-art deep learning techniques to perform visual search at scale.

Cannot find the paper you are looking for? You can Submit a new open access paper.