Search Results for author: Vijay Kumar BG

Found 5 papers, 4 papers with code

Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement

no code implementations6 Apr 2024 Zaid Khan, Vijay Kumar BG, Samuel Schulter, Yun Fu, Manmohan Chandraker

We propose a method where we exploit existing annotations for a vision-language task to improvise a coarse reward signal for that task, treat the LLM as a policy, and apply reinforced self-training to improve the visual program synthesis ability of the LLM for that task.

object-detection Object Detection +4

Single-Stream Multi-Level Alignment for Vision-Language Pretraining

1 code implementation27 Mar 2022 Zaid Khan, Vijay Kumar BG, Xiang Yu, Samuel Schulter, Manmohan Chandraker, Yun Fu

Self-supervised vision-language pretraining from pure images and text with a contrastive loss is effective, but ignores fine-grained alignment due to a dual-stream architecture that aligns image and text representations only on a global level.

Question Answering Referring Expression +4

Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention

1 code implementation23 Nov 2020 Varnith Chordia, Vijay Kumar BG

Accurate and efficient product classification is significant for E-commerce applications, as it enables various downstream tasks such as recommendation, retrieval, and pricing.

Classification General Classification +3

Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

2 code implementations16 Mar 2016 Ravi Garg, Vijay Kumar BG, Gustavo Carneiro, Ian Reid

In this work we propose a unsupervised framework to learn a deep convolutional neural network for single view depth predic- tion, without requiring a pre-training stage or annotated ground truth depths.

Depth Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.