no code implementations • 4 Oct 2024 • Sungnyun Kim, Haofu Liao, Srikar Appalaraju, Peng Tang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan, Stefano Soatto
Visual document understanding (VDU) is a challenging task that involves understanding documents across various modalities (text and image) and layouts (forms, tables, etc.).
no code implementations • 27 Jun 2024 • Varun Nagaraj Rao, Siddharth Choudhary, Aditya Deshpande, Ravi Kumar Satzoda, Srikar Appalaraju
This paper introduces RAVEN, a multitask retrieval augmented VLM framework that enhances base VLMs through efficient, task specific fine-tuning.
no code implementations • ICCV 2023 • Haofu Liao, Aruni RoyChowdhury, Weijian Li, Ankan Bansal, Yuting Zhang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan
We present a new formulation for structured information extraction (SIE) from visually rich documents.
Ranked #4 on Entity Linking on FUNSD
1 code implementation • CVPR 2023 • Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda, Vijay Mahadevan, R. Manmatha
In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks.
Ranked #1 on Referring Expression Segmentation on ReferIt (using extra training data)
no code implementations • 21 Sep 2017 • Akshay Rangesh, Kevan Yuen, Ravi Kumar Satzoda, Rakesh Nattoji Rajaram, Pujitha Gunaratne, Mohan M. Trivedi
Recent progress in autonomous and semi-autonomous driving has been made possible in part through an assortment of sensors that provide the intelligent agent with an enhanced perception of its surroundings.