Search Results for author: Wenwen Yu

Found 16 papers, 9 papers with code

OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models

1 code implementation22 Feb 2025 Wenwen Yu, Zhibo Yang, Jianqiang Wan, Sibo Song, Jun Tang, Wenqing Cheng, Yuliang Liu, Xiang Bai

In this paper, we introduce OmniParser V2, a universal model that unifies VsTP typical tasks, including text spotting, key information extraction, table recognition, and layout analysis, into a unified framework.

document understanding Key Information Extraction +4

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

1 code implementation28 Mar 2024 Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang

Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.

Decoder document understanding +4

P2Seg: Pointly-supervised Segmentation via Mutual Distillation

no code implementations18 Jan 2024 Zipeng Wang, Xuehui Yu, Xumeng Han, Wenwen Yu, Zhixun Huang, Jianbin Jiao, Zhenjun Han

Nevertheless, weakly supervised semantic segmentation methods are proficient in utilizing intra-class feature consistency to capture the boundary contours of the same semantic regions.

Box-supervised Instance Segmentation Segmentation +2

OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition

1 code implementation CVPR 2024 Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang

Recently visually-situated text parsing (VsTP) has experienced notable advancements driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.

Decoder document understanding +4

P2RBox: Point Prompt Oriented Object Detection with SAM

no code implementations22 Nov 2023 Guangming Cao, Xuehui Yu, Wenwen Yu, Xumeng Han, Xue Yang, Guorong Li, Jianbin Jiao, Zhenjun Han

In this study, we introduce P2RBox, which employs point prompt to generate rotated box (RBox) annotation for oriented object detection.

Object object-detection +2

Turning a CLIP Model into a Scene Text Spotter

1 code implementation21 Aug 2023 Wenwen Yu, Yuliang Liu, Xingkui Zhu, Haoyu Cao, Xing Sun, Xiang Bai

Utilizing only 10% of the supervised data, FastTCM-CR50 improves performance by an average of 26. 5% and 5. 5% for text detection and spotting tasks, respectively.

object-detection Object Detection +4

OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models

1 code implementation13 May 2023 Yuliang Liu, Zhang Li, Mingxin Huang, Biao Yang, Wenwen Yu, Chunyuan Li, XuCheng Yin, Cheng-Lin Liu, Lianwen Jin, Xiang Bai

In this paper, we conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks including Text Recognition, Scene Text-Centric Visual Question Answering (VQA), Document-Oriented VQA, Key Information Extraction (KIE), and Handwritten Mathematical Expression Recognition (HMER).

Key Information Extraction Nutrition +4

ICDAR 2023 Competition on Reading the Seal Title

no code implementations24 Apr 2023 Wenwen Yu, MingYu Liu, Mingrui Chen, Ning Lu, Yinlong Wen, Yuliang Liu, Dimosthenis Karatzas, Xiang Bai

To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (ReST), which included two tasks: seal title text detection (Task 1) and end-to-end seal title recognition (Task 2).

Optical Character Recognition (OCR) Task 2 +1

Turning a CLIP Model into a Scene Text Detector

1 code implementation CVPR 2023 Wenwen Yu, Yuliang Liu, Wei Hua, Deqiang Jiang, Bo Ren, Xiang Bai

Recently, pretraining approaches based on vision language models have made effective progresses in the field of text detection.

Domain Adaptation Scene Text Detection +1

ADASYN-Random Forest Based Intrusion Detection Model

no code implementations10 May 2021 Zhewei Chen, Wenwen Yu, Linyue Zhou

Through the comparative experiment of Intrusion detection on CICIDS 2017 dataset, it is found that ADASYN with Random Forest performs better.

Intrusion Detection model

Unsupervised Domain Adaptation Network with Category-Centric Prototype Aligner for Biomedical Image Segmentation

no code implementations3 Mar 2021 Ping Gong, Wenwen Yu, Qiuwen Sun, Ruohan Zhao, Junfeng Hu

Specifically, our approach consists of two key modules, a conditional domain discriminator~(CDD) and a category-centric prototype aligner~(CCPA).

Image Segmentation object-detection +4

PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

2 code implementations16 Apr 2020 Wenwen Yu, Ning Lu, Xianbiao Qi, Ping Gong, Rong Xiao

Computer vision with state-of-the-art deep learning models has achieved huge success in the field of Optical Character Recognition (OCR) including text detection and recognition tasks recently.

Graph Learning Key Information Extraction +3

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

7 code implementations7 Oct 2019 Ning Lu, Wenwen Yu, Xianbiao Qi, Yihao Chen, Ping Gong, Rong Xiao, Xiang Bai

Attention-based scene text recognizers have gained huge success, which leverages a more compact intermediate representation to learn 1d- or 2d- attention by a RNN-based encoder-decoder architecture.

Decoder Scene Text Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.