no code implementations • 2 Feb 2020 • Biao Yang, Guocheng Yan, Pin Wang, Ching-Yao Chan, Xiang Song, Yang Chen
Recent studies focus on modeling pedestrians' motion patterns with recurrent neural networks, capturing social interactions with pooling-based or graph-based methods, and handling future uncertainties by using random Gaussian noise as the latent variable.
no code implementations • 4 Feb 2020 • Biao Yang, Caizhen He, Pin Wang, Ching-Yao Chan, Xiaofeng Liu, Yang Chen
A latent variable predictor is proposed to estimate latent variable distributions from observed and ground-truth trajectories.
5 code implementations • 5 May 2020 • Andreas Lugmayr, Martin Danelljan, Radu Timofte, Namhyuk Ahn, Dongwoon Bai, Jie Cai, Yun Cao, Junyang Chen, Kaihua Cheng, SeYoung Chun, Wei Deng, Mostafa El-Khamy, Chiu Man Ho, Xiaozhong Ji, Amin Kheradmand, Gwantae Kim, Hanseok Ko, Kanghyu Lee, Jungwon Lee, Hao Li, Ziluan Liu, Zhi-Song Liu, Shuai Liu, Yunhua Lu, Zibo Meng, Pablo Navarrete Michelini, Christian Micheloni, Kalpesh Prajapati, Haoyu Ren, Yong Hyeok Seo, Wan-Chi Siu, Kyung-Ah Sohn, Ying Tai, Rao Muhammad Umer, Shuangquan Wang, Huibing Wang, Timothy Haoning Wu, Hao-Ning Wu, Biao Yang, Fuzhi Yang, Jaejun Yoo, Tongtong Zhao, Yuanbo Zhou, Haijie Zhuo, Ziyao Zong, Xueyi Zou
This paper reviews the NTIRE 2020 challenge on real world super-resolution.
no code implementations • 16 Apr 2022 • Fanghui Xue, Biao Yang, Yingyong Qi, Jack Xin
It has been shown by many researchers that transformers perform as well as convolutional neural networks in many computer vision tasks.
no code implementations • 10 Feb 2023 • Zhijian Li, Biao Yang, Penghang Yin, Yingyong Qi, Jack Xin
In this paper, we propose a feature affinity (FA) assisted knowledge distillation (KD) method to improve quantization-aware training of deep neural networks (DNN).
1 code implementation • 13 May 2023 • Yuliang Liu, Zhang Li, Biao Yang, Chunyuan Li, XuCheng Yin, Cheng-Lin Liu, Lianwen Jin, Xiang Bai
In this paper, we conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks including Text Recognition, Scene Text-Centric Visual Question Answering (VQA), Document-Oriented VQA, Key Information Extraction (KIE), and Handwritten Mathematical Expression Recognition (HMER).
1 code implementation • 6 Jun 2023 • Wenwen Yu, MingYu Liu, Biao Yang, Enming Zhang, Deqiang Jiang, Xing Sun, Yuliang Liu, Xiang Bai
Text recognition in the wild is a long-standing problem in computer vision.
1 code implementation • 11 Nov 2023 • Zhang Li, Biao Yang, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun, Yuliang Liu, Xiang Bai
Additionally, experiments on 18 datasets further demonstrate that Monkey surpasses existing LMMs in many tasks like Image Captioning and various Visual Question Answering formats.
1 code implementation • 21 Feb 2024 • Mingkun Yang, Biao Yang, Minghui Liao, Yingying Zhu, Xiang Bai
By enhancing the alignment between the canonical mask feature and the text feature, the module ensures more effective fusion, ultimately leading to improved recognition performance.
no code implementations • 24 Feb 2024 • Mingkun Yang, Biao Yang, Minghui Liao, Yingying Zhu, Xiang Bai
Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training.
1 code implementation • 7 Mar 2024 • Yuliang Liu, Biao Yang, Qiang Liu, Zhang Li, Zhiyin Ma, Shuo Zhang, Xiang Bai
We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks.