1 code implementation • COLING 2022 • Mingfei Gao, Le Xue, Chetan Ramaiah, Chen Xing, ran Xu, Caiming Xiong
Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of the layout and semantics of a form.
1 code implementation • 10 Dec 2022 • Le Xue, Mingfei Gao, Chen Xing, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, ran Xu, Juan Carlos Niebles, Silvio Savarese
Then, ULIP learns a 3D representation space aligned with the common image-text space, using a small number of automatically synthesized triplets.
Ranked #2 on
3D Point Cloud Classification
on ModelNet40
(using extra training data)
1 code implementation • 3 Aug 2022 • Jun Wang, Mingfei Gao, Yuqian Hu, Ramprasaath R. Selvaraju, Chetan Ramaiah, ran Xu, Joseph F. JaJa, Larry S. Davis
To address this deficiency, we develop a new method to generate high-quality and diverse QA pairs by explicitly utilizing the existing rich text available in the scene context of each image.
Ranked #3 on
Visual Question Answering
on TextVQA test-standard
1 code implementation • 15 Dec 2021 • Mingfei Gao, Le Xue, Chetan Ramaiah, Chen Xing, ran Xu, Caiming Xiong
Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of the layout and semantics of a form.
no code implementations • 8 Dec 2021 • Luyu Yang, Mingfei Gao, Zeyuan Chen, ran Xu, Abhinav Shrivastava, Chetan Ramaiah
In the context of online privacy, many methods propose complex privacy and security preserving measures to protect sensitive data.
1 code implementation • 18 Nov 2021 • Mingfei Gao, Chen Xing, Juan Carlos Niebles, Junnan Li, ran Xu, Wenhao Liu, Caiming Xiong
To enlarge the set of base classes, we propose a method to automatically generate pseudo bounding-box annotations of diverse objects from large-scale image-caption pairs.
1 code implementation • 8 Oct 2021 • Le Xue, Mingfei Gao, Zeyuan Chen, Caiming Xiong, ran Xu
We propose a novel framework to evaluate the robustness of transformer-based form field extraction methods via form attacks.
2 code implementations • SpaNLP (ACL) 2022 • Mingfei Gao, Zeyuan Chen, Nikhil Naik, Kazuma Hashimoto, Caiming Xiong, ran Xu
We propose a novel framework to conduct field extraction from forms with unlabeled data.
1 code implementation • ICCV 2021 • Luyu Yang, Yan Wang, Mingfei Gao, Abhinav Shrivastava, Kilian Q. Weinberger, Wei-Lun Chao, Ser-Nam Lim
To integrate the strengths of the two classifiers, we apply the well-established co-training framework, in which the two classifiers exchange their high confident predictions to iteratively "teach each other" so that both classifiers can excel in the target domain.
no code implementations • ECCV 2020 • Jun Wang, Shiyi Lan, Mingfei Gao, Larry S. Davis
Results show that our framework achieves the state-of-the-art performance with 31 FPS and improves our baseline significantly by 9. 0% mAP on the nuScenes test set.
Ranked #319 on
3D Object Detection
on nuScenes
no code implementations • CVPR 2021 • Mingfei Gao, Yingbo Zhou, ran Xu, Richard Socher, Caiming Xiong
Online action detection in untrimmed videos aims to identify an action as it happens, which makes it very important for real-time applications.
Ranked #4 on
Online Action Detection
on THUMOS'14
no code implementations • IJCNLP 2019 • Mingfei Gao, Larry Davis, Richard Socher, Caiming Xiong
We propose weakly supervised language localization networks (WSLLN) to detect events in long, untrimmed videos given language queries.
no code implementations • ECCV 2020 • Mingfei Gao, Zizhao Zhang, Guo Yu, Sercan O. Arik, Larry S. Davis, Tomas Pfister
Active learning (AL) combines data labeling and model training to minimize the labeling cost by prioritizing the selection of high value data that can best improve model performance.
no code implementations • 25 Sep 2019 • Mingfei Gao, Zizhao Zhang, Guo Yu, Sercan O. Arik, Larry S. Davis, Tomas Pfister
Active learning (AL) aims to integrate data labeling and model training in a unified way, and to minimize the labeling budget by prioritizing the selection of high value data that can best improve model performance.
no code implementations • 31 Aug 2019 • Mingfei Gao, Larry S. Davis, Richard Socher, Caiming Xiong
We propose weakly supervised language localization networks (WSLLN) to detect events in long, untrimmed videos given language queries.
no code implementations • 12 May 2019 • Boyuan Ma, Xiaoyan Wei, Chuni Liu, Xiaojuan Ban, Haiyou Huang, Hao Wang, Weihua Xue, Stephen Wu, Mingfei Gao, Qing Shen, Adnan Omer Abuassba, Haokai Shen, Yanjing Su
Recent progress in material data mining has been driven by high-capacity models trained on large datasets.
no code implementations • 8 May 2019 • Mingfei Gao, Ashish Tawari, Sujitha Martin
We propose a novel framework that incorporates both visual model and goal representation to conduct OIE.
no code implementations • ICCV 2019 • Mingfei Gao, Mingze Xu, Larry S. Davis, Richard Socher, Caiming Xiong
We propose StartNet to address Online Detection of Action Start (ODAS) where action starts and their associated categories are detected in untrimmed, streaming videos.
2 code implementations • ICCV 2019 • Mingze Xu, Mingfei Gao, Yi-Ting Chen, Larry S. Davis, David J. Crandall
Most work on temporal action detection is formulated as an offline problem, in which the start and end times of actions are determined after the entire video is fully observed.
Ranked #6 on
Online Action Detection
on TVSeries
no code implementations • CVPR 2018 • Ruichi Yu, Ang Li, Chun-Fu Chen, Jui-Hsin Lai, Vlad I. Morariu, Xintong Han, Mingfei Gao, Ching-Yung Lin, Larry S. Davis
In contrast, we argue that it is essential to prune neurons in the entire neuron network jointly based on a unified goal: minimizing the reconstruction error of important responses in the "final response layer" (FRL), which is the second-to-last layer before classification, for a pruned network to retrain its predictive power.
no code implementations • ECCV 2018 • Mingfei Gao, Ang Li, Ruichi Yu, Vlad I. Morariu, Larry S. Davis
We introduce count-guided weakly supervised localization (C-WSL), an approach that uses per-class object count as a new form of supervision to improve weakly supervised localization (WSL).
no code implementations • CVPR 2018 • Mingfei Gao, Ruichi Yu, Ang Li, Vlad I. Morariu, Larry S. Davis
We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images.