Search Results for author: Ajinkya Kale

Found 17 papers, 5 papers with code

Prompt-Guided Mask Proposal for Two-Stage Open-Vocabulary Segmentation

no code implementations13 Dec 2024 Yu-Jhe Li, Xinyang Zhang, Kun Wan, Lantao Yu, Ajinkya Kale, Xin Lu

To overcome this challenge, existing methods often use multi-modal models like CLIP, which combine image and text features in a shared embedding space to bridge the gap between limited and extensive vocabulary recognition, resulting in a two-stage approach: In the first stage, a mask generator takes an input image to generate mask proposals, and the in the second stage the target mask is picked based on the query.

Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach

no code implementations26 Nov 2024 Shijian Deng, Wentian Zhao, Yu-Jhe Li, Kun Wan, Daniel Miranda, Ajinkya Kale, Yapeng Tian

Self-improvement in multimodal large language models (MLLMs) is crucial for enhancing their reliability and robustness.

Hallucination

Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See

1 code implementation8 Oct 2024 Zeliang Zhang, Phu Pham, Wentian Zhao, Kun Wan, Yu-Jhe Li, Jianing Zhou, Daniel Miranda, Ajinkya Kale, Chenliang Xu

In this study, we investigate the redundancy in visual computation at both the parameter and computational pattern levels within LLaVA, a representative MLLM, and introduce a suite of streamlined strategies to enhance efficiency.

PRedItOR: Text Guided Image Editing with Diffusion Prior

no code implementations15 Feb 2023 Hareesh Ravi, Sachin Kelkar, Midhun Harikumar, Ajinkya Kale

We combine this with structure preserving edits on the image decoder using existing approaches such as reverse DDIM to perform text guided image editing.

Decoder text-guided-image-editing

Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models

1 code implementation CVPR 2023 Qiucheng Wu, Yujian Liu, Handong Zhao, Ajinkya Kale, Trung Bui, Tong Yu, Zhe Lin, Yang Zhang, Shiyu Chang

Based on this finding, we further propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.

Denoising Disentanglement

Fine-grained Image Captioning with CLIP Reward

1 code implementation Findings (NAACL) 2022 Jaemin Cho, Seunghyun Yoon, Ajinkya Kale, Franck Dernoncourt, Trung Bui, Mohit Bansal

Toward more descriptive and distinctive caption generation, we propose using CLIP, a multimodal encoder trained on huge image-text pairs from web, to calculate multimodal similarity and use it as a reward function.

Caption Generation Descriptive +5

StyleBabel: Artistic Style Tagging and Captioning

no code implementations10 Mar 2022 Dan Ruta, Andrew Gilbert, Pranav Aggarwal, Naveen Marri, Ajinkya Kale, Jo Briggs, Chris Speed, Hailin Jin, Baldo Faieta, Alex Filipkowski, Zhe Lin, John Collomosse

We present StyleBabel, a unique open access dataset of natural language captions and free-form tags describing the artistic style of over 135K digital artworks, collected via a novel participatory method from experts studying at specialist art and design schools.

Attribute Form +3

Towards Zero-shot Cross-lingual Image Retrieval and Tagging

2 code implementations15 Sep 2021 Pranav Aggarwal, Ritiz Tambi, Ajinkya Kale

There has been a recent spike in interest in multi-modal Language and Vision problems.

Image Retrieval Retrieval

Multimodal Contrastive Training for Visual Representation Learning

no code implementations CVPR 2021 Xin Yuan, Zhe Lin, Jason Kuen, Jianming Zhang, Yilin Wang, Michael Maire, Ajinkya Kale, Baldo Faieta

We first train our model on COCO and evaluate the learned visual representations on various downstream tasks including image classification, object detection, and instance segmentation.

Cross-Modal Retrieval image-classification +7

Towards Zero-shot Cross-lingual Image Retrieval

1 code implementation24 Nov 2020 Pranav Aggarwal, Ajinkya Kale

There has been a recent spike in interest in multi-modal Language and Vision problems.

Image Retrieval Retrieval

Multi-Modal Retrieval using Graph Neural Networks

no code implementations4 Oct 2020 Aashish Kumar Misraa, Ajinkya Kale, Pranav Aggarwal, Ali Aminian

Most real world applications of image retrieval such as Adobe Stock, which is a marketplace for stock photography and illustrations, need a way for users to find images which are both visually (i. e. aesthetically) and conceptually (i. e. containing the same salient objects) as a query image.

Image Retrieval Re-Ranking +1

Towards Semantic Query Segmentation

no code implementations25 Jul 2017 Ajinkya Kale, Thrivikrama Taula, Sanjika Hewavitharana, Amit Srivastava

Query Segmentation is one of the critical components for understanding users' search intent in Information Retrieval tasks.

Information Retrieval Retrieval +1

Visual Search at eBay

no code implementations10 Jun 2017 Fan Yang, Ajinkya Kale, Yury Bubnov, Leon Stein, Qiaosong Wang, Hadi Kiapour, Robinson Piramuthu

We harness the availability of large image collection of eBay listings and state-of-the-art deep learning techniques to perform visual search at scale.

Cannot find the paper you are looking for? You can Submit a new open access paper.