Search Results for author: Vlad I. Morariu

Found 29 papers, 6 papers with code

DocSynthv2: A Practical Autoregressive Modeling for Document Generation

no code implementations12 Jun 2024 Sanket Biswas, Rajiv Jain, Vlad I. Morariu, Jiuxiang Gu, Puneet Mathur, Curtis Wigington, Tong Sun, Josep Lladós

While the generation of document layouts has been extensively explored, comprehensive document generation encompassing both layout and content presents a more complex challenge.

TutoAI: A Cross-domain Framework for AI-assisted Mixed-media Tutorial Creation on Physical Tasks

no code implementations12 Mar 2024 Yuexi Chen, Vlad I. Morariu, Anh Truong, Zhicheng Liu

Mixed-media tutorials, which integrate videos, images, text, and diagrams to teach procedural skills, offer more browsable alternatives than timeline-based videos.

LayerDoc: Layer-wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents

no code implementations IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023 Puneet Mathur, Rajiv Jain, Ashutosh Mehra, Jiuxiang Gu, Franck Dernoncourt, Anandhavelu N, Quan Tran, Verena Kaynig-Fittkau, Ani Nenkova, Dinesh Manocha, Vlad I. Morariu

Experiments show that our approach outperforms competitive baselines by 10-15% on three diverse datasets of forms and mobile app screen layouts for the tasks of spatial region classification, higher-order group identification, layout hierarchy extraction, reading order detection, and word grouping.

Reading Order Detection

MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding

no code implementations27 Nov 2022 Zilong Wang, Jiuxiang Gu, Chris Tensmeyer, Nikolaos Barmpalios, Ani Nenkova, Tong Sun, Jingbo Shang, Vlad I. Morariu

In contrast, region-level models attempt to encode regions corresponding to paragraphs or text blocks into a single embedding, but they perform worse with additional word-level features.

SelfDoc: Self-Supervised Document Representation Learning

no code implementations CVPR 2021 Peizhao Li, Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Rajiv Jain, Varun Manjunatha, Hongfu Liu

For downstream usage, we propose a novel modality-adaptive attention mechanism for multimodal feature fusion by adaptively emphasizing language and vision signals.

Representation Learning

RPCL: A Framework for Improving Cross-Domain Detection with Auxiliary Tasks

no code implementations18 Apr 2021 Kai Li, Curtis Wigington, Chris Tensmeyer, Vlad I. Morariu, Handong Zhao, Varun Manjunatha, Nikolaos Barmpalios, Yun Fu

Contrasted with prior work, this paper provides a complementary solution to align domains by learning the same auxiliary tasks in both domains simultaneously.

Cross-Domain Document Object Detection: Benchmark Suite and Method

1 code implementation CVPR 2020 Kai Li, Curtis Wigington, Chris Tensmeyer, Handong Zhao, Nikolaos Barmpalios, Vlad I. Morariu, Varun Manjunatha, Tong Sun, Yun Fu

We establish a benchmark suite consisting of different types of PDF document datasets that can be utilized for cross-domain DOD model training and evaluation.

object-detection Object Detection

Learning Rich Features for Image Manipulation Detection

2 code implementations CVPR 2018 Peng Zhou, Xintong Han, Vlad I. Morariu, Larry S. Davis

Image manipulation detection is different from traditional semantic object detection because it pays more attention to tampering artifacts than to image content, which suggests that richer features need to be learned.

Image Manipulation Image Manipulation Detection +3

Fused Deep Neural Networks for Efficient Pedestrian Detection

no code implementations2 May 2018 Xianzhi Du, Mostafa El-Khamy, Vlad I. Morariu, Jungwon Lee, Larry Davis

The classification system further classifies the generated candidates based on opinions of multiple deep verification networks and a fusion network which utilizes a novel soft-rejection fusion method to adjust the confidence in the detection results.

Ensemble Learning General Classification +2

Layout-induced Video Representation for Recognizing Agent-in-Place Actions

no code implementations ICCV 2019 Ruichi Yu, Hongcheng Wang, Ang Li, Jingxiao Zheng, Vlad I. Morariu, Larry S. Davis

We address the recognition of agent-in-place actions, which are associated with agents who perform them and places where they occur, in the context of outdoor home surveillance.

NISP: Pruning Networks using Neuron Importance Score Propagation

no code implementations CVPR 2018 Ruichi Yu, Ang Li, Chun-Fu Chen, Jui-Hsin Lai, Vlad I. Morariu, Xintong Han, Mingfei Gao, Ching-Yung Lin, Larry S. Davis

In contrast, we argue that it is essential to prune neurons in the entire neuron network jointly based on a unified goal: minimizing the reconstruction error of important responses in the "final response layer" (FRL), which is the second-to-last layer before classification, for a pruned network to retrain its predictive power.

Network Pruning

Dynamic Zoom-in Network for Fast Object Detection in Large Images

no code implementations CVPR 2018 Mingfei Gao, Ruichi Yu, Ang Li, Vlad I. Morariu, Larry S. Davis

We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images.

object-detection Real-Time Object Detection

C-WSL: Count-guided Weakly Supervised Localization

no code implementations ECCV 2018 Mingfei Gao, Ang Li, Ruichi Yu, Vlad I. Morariu, Larry S. Davis

We introduce count-guided weakly supervised localization (C-WSL), an approach that uses per-class object count as a new form of supervision to improve weakly supervised localization (WSL).


Generalized Deep Image to Image Regression

1 code implementation CVPR 2017 Venkataraman Santhanam, Vlad I. Morariu, Larry S. Davis

We present a Deep Convolutional Neural Network architecture which serves as a generic image-to-image regressor that can be trained end-to-end without any further machinery.

Colorization Denoising +1

Learning a Discriminative Filter Bank within a CNN for Fine-grained Recognition

1 code implementation CVPR 2018 Yaming Wang, Vlad I. Morariu, Larry S. Davis

Compared to earlier multistage frameworks using CNN features, recent end-to-end deep approaches for fine-grained recognition essentially enhance the mid-level learning capability of CNNs.

Representation Learning

Generating Holistic 3D Scene Abstractions for Text-based Image Retrieval

no code implementations CVPR 2017 Ang Li, Jin Sun, Joe Yue-Hei Ng, Ruichi Yu, Vlad I. Morariu, Larry S. Davis

Since interactions between objects can be reduced to a limited set of atomic spatial relations in 3D, we study the possibility of inferring 3D structure from a text description rather than an image, applying physical relation models to synthesize holistic 3D abstract object layouts satisfying the spatial constraints present in a textual description.

Image Retrieval Object +3

The Role of Context Selection in Object Detection

no code implementations9 Sep 2016 Ruichi Yu, Xi Chen, Vlad I. Morariu, Larry S. Davis

We investigate the reasons why context in object detection has limited utility by isolating and evaluating the predictive power of different context cues under ideal conditions in which context provided by an oracle.

Object object-detection +1

Modeling Context Between Objects for Referring Expression Understanding

1 code implementation1 Aug 2016 Varun K. Nagaraja, Vlad I. Morariu, Larry S. Davis

Our approach uses an LSTM to learn the probability of a referring expression, with input features from a region and a context region.

Multiple Instance Learning Object +1

Mining Discriminative Triplets of Patches for Fine-Grained Classification

no code implementations CVPR 2016 Yaming Wang, Jonghyun Choi, Vlad I. Morariu, Larry S. Davis

Fine-grained classification involves distinguishing between similar sub-categories based on subtle differences in highly localized regions; therefore, accurate localization of discriminative regions remains a major challenge.

Classification General Classification

VRFP: On-the-fly Video Retrieval using Web Images and Fast Fisher Vector Products

no code implementations10 Dec 2015 Xintong Han, Bharat Singh, Vlad I. Morariu, Larry S. Davis

VRFP is a real-time video retrieval framework based on short text input queries, which obtains weakly labeled training images from the web after the query is known.

Re-Ranking Retrieval +2

Searching for Objects using Structure in Indoor Scenes

no code implementations24 Nov 2015 Varun K. Nagaraja, Vlad I. Morariu, Larry S. Davis

However, we can use structure in the scene to search for objects without processing the entire image.

Imitation Learning Object

Selecting Relevant Web Trained Concepts for Automated Event Retrieval

no code implementations ICCV 2015 Bharat Singh, Xintong Han, Zhe Wu, Vlad I. Morariu, Larry S. Davis

Given a text description of an event, event retrieval is performed by selecting concepts linguistically related to the event description and fusing the concept responses on unseen videos.

Domain Adaptation Retrieval

Automatic online tuning for fast Gaussian summation

no code implementations NeurIPS 2008 Vlad I. Morariu, Balaji V. Srinivasan, Vikas C. Raykar, Ramani Duraiswami, Larry S. Davis

To solve the second problem, we present an online tuning approach that results in a black box method that automatically chooses the evaluation method and its parameters to yield the best performance for the input data, desired accuracy, and bandwidth.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.