Search Results for author: Biao Yang

Found 11 papers, 6 papers with code

TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document

1 code implementation • 7 Mar 2024 • Yuliang Liu, Biao Yang, Qiang Liu, Zhang Li, Zhiyin Ma, Shuo Zhang, Xiang Bai

We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks.

document understanding Key Information Extraction +4

1,350

Paper
Code

Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition

no code implementations • 24 Feb 2024 • Mingkun Yang, Biao Yang, Minghui Liao, Yingying Zhu, Xiang Bai

Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training.

Scene Text Recognition Semantic Similarity +1

Paper
Add Code

Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition

1 code implementation • 21 Feb 2024 • Mingkun Yang, Biao Yang, Minghui Liao, Yingying Zhu, Xiang Bai

By enhancing the alignment between the canonical mask feature and the text feature, the module ensures more effective fusion, ultimately leading to improved recognition performance.

Scene Text Recognition

Paper
Code

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

1 code implementation • 11 Nov 2023 • Zhang Li, Biao Yang, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun, Yuliang Liu, Xiang Bai

Additionally, experiments on 18 datasets further demonstrate that Monkey surpasses existing LMMs in many tasks like Image Captioning and various Visual Question Answering formats.

Image Captioning Question Answering +2

1,350

Paper
Code

Looking and Listening: Audio Guided Text Recognition

1 code implementation • 6 Jun 2023 • Wenwen Yu, MingYu Liu, Biao Yang, Enming Zhang, Deqiang Jiang, Xing Sun, Yuliang Liu, Xiang Bai

Text recognition in the wild is a long-standing problem in computer vision.

Scene Text Recognition

Paper
Code

On the Hidden Mystery of OCR in Large Multimodal Models

1 code implementation • 13 May 2023 • Yuliang Liu, Zhang Li, Biao Yang, Chunyuan Li, XuCheng Yin, Cheng-Lin Liu, Lianwen Jin, Xiang Bai

In this paper, we conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks including Text Recognition, Scene Text-Centric Visual Question Answering (VQA), Document-Oriented VQA, Key Information Extraction (KIE), and Handwritten Mathematical Expression Recognition (HMER).

Key Information Extraction Nutrition +4

274

Paper
Code

Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data

no code implementations • 10 Feb 2023 • Zhijian Li, Biao Yang, Penghang Yin, Yingyong Qi, Jack Xin

In this paper, we propose a feature affinity (FA) assisted knowledge distillation (KD) method to improve quantization-aware training of deep neural networks (DNN).

Knowledge Distillation Quantization

Paper
Add Code

Searching Intrinsic Dimensions of Vision Transformers

no code implementations • 16 Apr 2022 • Fanghui Xue, Biao Yang, Yingyong Qi, Jack Xin

It has been shown by many researchers that transformers perform as well as convolutional neural networks in many computer vision tasks.

Image Classification object-detection +1

Paper
Add Code

NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results

5 code implementations • 5 May 2020 • Andreas Lugmayr, Martin Danelljan, Radu Timofte, Namhyuk Ahn, Dongwoon Bai, Jie Cai, Yun Cao, Junyang Chen, Kaihua Cheng, SeYoung Chun, Wei Deng, Mostafa El-Khamy, Chiu Man Ho, Xiaozhong Ji, Amin Kheradmand, Gwantae Kim, Hanseok Ko, Kanghyu Lee, Jungwon Lee, Hao Li, Ziluan Liu, Zhi-Song Liu, Shuai Liu, Yunhua Lu, Zibo Meng, Pablo Navarrete Michelini, Christian Micheloni, Kalpesh Prajapati, Haoyu Ren, Yong Hyeok Seo, Wan-Chi Siu, Kyung-Ah Sohn, Ying Tai, Rao Muhammad Umer, Shuangquan Wang, Huibing Wang, Timothy Haoning Wu, Hao-Ning Wu, Biao Yang, Fuzhi Yang, Jaejun Yoo, Tongtong Zhao, Yuanbo Zhou, Haijie Zhuo, Ziyao Zong, Xueyi Zou

This paper reviews the NTIRE 2020 challenge on real world super-resolution.

Benchmarking Image Super-Resolution

1,048

Paper
Code

TPPO: A Novel Trajectory Predictor with Pseudo Oracle

no code implementations • 4 Feb 2020 • Biao Yang, Caizhen He, Pin Wang, Ching-Yao Chan, Xiaofeng Liu, Yang Chen

A latent variable predictor is proposed to estimate latent variable distributions from observed and ground-truth trajectories.

Autonomous Driving Human-Object Interaction Detection

Paper
Add Code

A Novel Graph based Trajectory Predictor with Pseudo Oracle

no code implementations • 2 Feb 2020 • Biao Yang, Guocheng Yan, Pin Wang, Ching-Yao Chan, Xiang Song, Yang Chen

Recent studies focus on modeling pedestrians' motion patterns with recurrent neural networks, capturing social interactions with pooling-based or graph-based methods, and handling future uncertainties by using random Gaussian noise as the latent variable.

Graph Attention Pedestrian Trajectory Prediction +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.