Search Results for author: Erhan Bas

Found 6 papers, 1 papers with code

Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding

no code implementations9 Jan 2024 Yatong Bai, Utsav Garg, Apaar Shanker, Haoming Zhang, Samyak Parajuli, Erhan Bas, Isidora Filipovic, Amelia N. Chu, Eugenia D Fomitcheva, Elliot Branson, Aerin Kim, Somayeh Sojoudi, Kyunghyun Cho

Vision and vision-language applications of neural networks, such as image classification and captioning, rely on large-scale annotated datasets that require non-trivial data-collecting processes.

Image Captioning Image Classification +3

On the Performance of Multimodal Language Models

no code implementations4 Oct 2023 Utsav Garg, Erhan Bas

Instruction-tuned large language models (LLMs) have demonstrated promising zero-shot generalization capabilities across various downstream tasks.

Benchmarking Binary Classification +4

Detecting and Preventing Hallucinations in Large Vision Language Models

1 code implementation11 Aug 2023 Anisha Gunjal, Jihan Yin, Erhan Bas

We find that even the current state-of-the-art LVLMs (InstructBLIP) still contain a staggering 30 percent of the hallucinatory text in the form of non-existent objects, unfaithful descriptions, and inaccurate relationships.

16k Hallucination +2

Masked Vision and Language Modeling for Multi-modal Representation Learning

no code implementations3 Aug 2022 Gukyeong Kwon, Zhaowei Cai, Avinash Ravichandran, Erhan Bas, Rahul Bhotika, Stefano Soatto

Instead of developing masked language modeling (MLM) and masked image modeling (MIM) independently, we propose to build joint masked vision and language modeling, where the masked signal of one modality is reconstructed with the help from another modality.

cross-modal alignment Language Modelling +2

X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks

no code implementations12 Apr 2022 Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika, Stefano Soatto

In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image.

End-to-End Piece-Wise Unwarping of Document Images

no code implementations ICCV 2021 Sagnik Das, Kunwar Yashraj Singh, Jon Wu, Erhan Bas, Vijay Mahadevan, Rahul Bhotika, Dimitris Samaras

Document unwarping attempts to undo the physical deformation of the paper and recover a 'flatbed' scanned document-image for downstream tasks such as OCR.

MS-SSIM Optical Character Recognition (OCR) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.