Search Results for author: Mingfei Gao

Found 24 papers, 9 papers with code

DocQueryNet: Value Retrieval with Arbitrary Queries for Form-like Documents

1 code implementation COLING 2022 Mingfei Gao, Le Xue, Chetan Ramaiah, Chen Xing, ran Xu, Caiming Xiong

Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of the layout and semantics of a form.

document understanding Language Modelling +1

4M: Massively Multimodal Masked Modeling

no code implementations NeurIPS 2023 David Mizrahi, Roman Bachmann, Oğuzhan Fatih Kar, Teresa Yeo, Mingfei Gao, Afshin Dehghan, Amir Zamir

Current machine learning models for vision are often highly specialized and limited to a single modality and task.

Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

no code implementations CVPR 2023 Vibashan VS, Ning Yu, Chen Xing, Can Qin, Mingfei Gao, Juan Carlos Niebles, Vishal M. Patel, ran Xu

In summary, an OV method learns task-specific information using strong supervision from base annotations and novel category information using weak supervision from image-captions pairs.

Image Captioning Instance Segmentation +2

TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation

1 code implementation3 Aug 2022 Jun Wang, Mingfei Gao, Yuqian Hu, Ramprasaath R. Selvaraju, Chetan Ramaiah, ran Xu, Joseph F. JaJa, Larry S. Davis

To address this deficiency, we develop a new method to generate high-quality and diverse QA pairs by explicitly utilizing the existing rich text available in the scene context of each image.

Answer Generation Question-Answer-Generation +3

Value Retrieval with Arbitrary Queries for Form-like Documents

1 code implementation15 Dec 2021 Mingfei Gao, Le Xue, Chetan Ramaiah, Chen Xing, ran Xu, Caiming Xiong

Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of the layout and semantics of a form.

document understanding Language Modelling +1

Burn After Reading: Online Adaptation for Cross-domain Streaming Data

no code implementations8 Dec 2021 Luyu Yang, Mingfei Gao, Zeyuan Chen, ran Xu, Abhinav Shrivastava, Chetan Ramaiah

In the context of online privacy, many methods propose complex privacy and security preserving measures to protect sensitive data.

Unsupervised Domain Adaptation

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

1 code implementation18 Nov 2021 Mingfei Gao, Chen Xing, Juan Carlos Niebles, Junnan Li, ran Xu, Wenhao Liu, Caiming Xiong

To enlarge the set of base classes, we propose a method to automatically generate pseudo bounding-box annotations of diverse objects from large-scale image-caption pairs.

Object object-detection +1

Robustness Evaluation of Transformer-based Form Field Extractors via Form Attacks

1 code implementation8 Oct 2021 Le Xue, Mingfei Gao, Zeyuan Chen, Caiming Xiong, ran Xu

We propose a novel framework to evaluate the robustness of transformer-based form field extraction methods via form attacks.

Optical Character Recognition (OCR)

Deep Co-Training with Task Decomposition for Semi-Supervised Domain Adaptation

1 code implementation ICCV 2021 Luyu Yang, Yan Wang, Mingfei Gao, Abhinav Shrivastava, Kilian Q. Weinberger, Wei-Lun Chao, Ser-Nam Lim

To integrate the strengths of the two classifiers, we apply the well-established co-training framework, in which the two classifiers exchange their high confident predictions to iteratively "teach each other" so that both classifiers can excel in the target domain.

Semi-supervised Domain Adaptation Unsupervised Domain Adaptation

InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling

no code implementations ECCV 2020 Jun Wang, Shiyi Lan, Mingfei Gao, Larry S. Davis

Results show that our framework achieves the state-of-the-art performance with 31 FPS and improves our baseline significantly by 9. 0% mAP on the nuScenes test set.

3D Object Detection Autonomous Driving +2

WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos

no code implementations CVPR 2021 Mingfei Gao, Yingbo Zhou, ran Xu, Richard Socher, Caiming Xiong

Online action detection in untrimmed videos aims to identify an action as it happens, which makes it very important for real-time applications.

Action Recognition Online Action Detection

WSLLN:Weakly Supervised Natural Language Localization Networks

no code implementations IJCNLP 2019 Mingfei Gao, Larry Davis, Richard Socher, Caiming Xiong

We propose weakly supervised language localization networks (WSLLN) to detect events in long, untrimmed videos given language queries.

Sentence

Consistency-based Semi-supervised Active Learning: Towards Minimizing Labeling Cost

no code implementations ECCV 2020 Mingfei Gao, Zizhao Zhang, Guo Yu, Sercan O. Arik, Larry S. Davis, Tomas Pfister

Active learning (AL) combines data labeling and model training to minimize the labeling cost by prioritizing the selection of high value data that can best improve model performance.

Active Learning Image Classification +1

Consistency-Based Semi-Supervised Active Learning: Towards Minimizing Labeling Budget

no code implementations25 Sep 2019 Mingfei Gao, Zizhao Zhang, Guo Yu, Sercan O. Arik, Larry S. Davis, Tomas Pfister

Active learning (AL) aims to integrate data labeling and model training in a unified way, and to minimize the labeling budget by prioritizing the selection of high value data that can best improve model performance.

Active Learning Representation Learning

WSLLN: Weakly Supervised Natural Language Localization Networks

no code implementations31 Aug 2019 Mingfei Gao, Larry S. Davis, Richard Socher, Caiming Xiong

We propose weakly supervised language localization networks (WSLLN) to detect events in long, untrimmed videos given language queries.

Sentence

Goal-oriented Object Importance Estimation in On-road Driving Videos

no code implementations8 May 2019 Mingfei Gao, Ashish Tawari, Sujitha Martin

We propose a novel framework that incorporates both visual model and goal representation to conduct OIE.

Object

StartNet: Online Detection of Action Start in Untrimmed Videos

no code implementations ICCV 2019 Mingfei Gao, Mingze Xu, Larry S. Davis, Richard Socher, Caiming Xiong

We propose StartNet to address Online Detection of Action Start (ODAS) where action starts and their associated categories are detected in untrimmed, streaming videos.

Action Classification Policy Gradient Methods

Temporal Recurrent Networks for Online Action Detection

2 code implementations ICCV 2019 Mingze Xu, Mingfei Gao, Yi-Ting Chen, Larry S. Davis, David J. Crandall

Most work on temporal action detection is formulated as an offline problem, in which the start and end times of actions are determined after the entire video is fully observed.

Online Action Detection

NISP: Pruning Networks using Neuron Importance Score Propagation

no code implementations CVPR 2018 Ruichi Yu, Ang Li, Chun-Fu Chen, Jui-Hsin Lai, Vlad I. Morariu, Xintong Han, Mingfei Gao, Ching-Yung Lin, Larry S. Davis

In contrast, we argue that it is essential to prune neurons in the entire neuron network jointly based on a unified goal: minimizing the reconstruction error of important responses in the "final response layer" (FRL), which is the second-to-last layer before classification, for a pruned network to retrain its predictive power.

Network Pruning

C-WSL: Count-guided Weakly Supervised Localization

no code implementations ECCV 2018 Mingfei Gao, Ang Li, Ruichi Yu, Vlad I. Morariu, Larry S. Davis

We introduce count-guided weakly supervised localization (C-WSL), an approach that uses per-class object count as a new form of supervision to improve weakly supervised localization (WSL).

Object

Dynamic Zoom-in Network for Fast Object Detection in Large Images

no code implementations CVPR 2018 Mingfei Gao, Ruichi Yu, Ang Li, Vlad I. Morariu, Larry S. Davis

We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images.

object-detection Real-Time Object Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.