1 code implementation • 19 Mar 2025 • Henrique Morimitsu, Xiaobin Zhu, Roberto M. Cesar Jr., Xiangyang Ji, Xu-Cheng Yin
We also introduce Kubric-NK, a new benchmark for evaluating optical flow methods with input resolutions ranging from 1K to 8K.
Ranked #1 on
Optical Flow Estimation
on Sintel-final
1 code implementation • 16 Jul 2024 • Shi-Xue Zhang, Hongfa Wang, Xiaobin Zhu, Weibo Gu, Tianjin Zhang, Chun Yang, Wei Liu, Xu-Cheng Yin
In this paper, we propose a novel Spatio-Temporal Graph Transformer module to uniformly learn spatial and temporal contexts for video-language alignment pre-training (dubbed STGT).
1 code implementation • IEEE International Conference on Robotics and Automation (ICRA) 2024 • Henrique Morimitsu, Xiaobin Zhu, Roberto M. Cesar-Jr., Xiangyang Ji, Xu-Cheng Yin
Extracting motion information from videos with optical flow estimation is vital in multiple practical robot applications.
Ranked #7 on
Optical Flow Estimation
on KITTI 2015
1 code implementation • 1 May 2024 • Zhiyu Fang, Shuai-Long Lei, Xiaobin Zhu, Chun Yang, Shi-Xue Zhang, Xu-Cheng Yin, Jingyan Qin
We then craft a mixed-context reasoning module based on the multi-layer perceptron (MLP) to learn the unified representations of inter-quadruples for ECE while accomplishing temporal knowledge reasoning.
1 code implementation • 1 May 2024 • Zhiyu Fang, Jingyan Qin, Xiaobin Zhu, Chun Yang, Xu-Cheng Yin
Distinguished from traditional knowledge graphs (KGs), temporal knowledge graphs (TKGs) must explore and reason over temporally evolving facts adequately.
no code implementations • 8 Jan 2024 • Shi-Xue Zhang, Chun Yang, Xiaobin Zhu, Hongyang Zhou, Hongfa Wang, Xu-Cheng Yin
Specifically, we propose an innovative reading-order estimation module (REM) that extracts reading-order information from the initial text boundary generated by an initial boundary module (IBM).
no code implementations • CVPR 2024 • Min Liang, Jia-Wei Ma, Xiaobin Zhu, Jingyan Qin, Xu-Cheng Yin
Existing scene text detectors generally focus on accurately detecting single-level (i. e. word-level line-level or paragraph-level) text entities without exploring the relationships among different levels of text entities.
1 code implementation • ICCV 2023 • Hongyang Zhou, Xiaobin Zhu, Jianqing Zhu, Zheng Han, Shi-Xue Zhang, Jingyan Qin, Xu-Cheng Yin
Instead of assuming degradation are spatially invariant across the whole image, we learn correction filters to adjust degradations to known degradations in a spatially variant way by a novel linearly-assembled pixel degradation-adaptive regression module (DARM).
1 code implementation • 26 Aug 2022 • Shi-Xue Zhang, Xiaobin Zhu, Lei Chen, Jie-Bo Hou, Xu-Cheng Yin
To be concrete, we adopt a Sigmoid Alpha Function (SAF) to transfer the distances between boundaries and their inside pixels to a probability map.
2 code implementations • 11 May 2022 • Shi-Xue Zhang, Chun Yang, Xiaobin Zhu, Xu-Cheng Yin
In our method, we explicitly model the text boundary via an innovative iterative boundary transformer in a coarse-to-fine manner.
no code implementations • 7 May 2022 • Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Xu-Cheng Yin
Then, we propose a graph-based fusion network via Graph Convolutional Network (GCN) to learn to reason and fuse the detection boxes for generating final instance boxes.
1 code implementation • 12 Mar 2022 • Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Chun Yang, Xu-Cheng Yin
In this paper, we propose an innovative Kernel Proposal Network (dubbed KPN) for arbitrary shape text detection.
no code implementations • 10 Mar 2022 • Chang Liu, Chun Yang, Hai-Bo Qin, Xiaobin Zhu, Cheng-Lin Liu, Xu-Cheng Yin
Scene text recognition is a popular topic and extensively used in the industry.
no code implementations • 24 Dec 2021 • Zhiyu Fang, Xiaobin Zhu, Chun Yang, Zheng Han, Jingyan Qin, Xu-Cheng Yin
Learning a common latent embedding by aligning the latent spaces of cross-modal autoencoders is an effective strategy for Generalized Zero-Shot Classification (GZSC).
1 code implementation • ICCV 2021 • Shi-Xue Zhang, Xiaobin Zhu, Chun Yang, Hongfa Wang, Xu-Cheng Yin
In this work, we propose a novel adaptive boundary proposal network for arbitrary shape text detection, which can learn to directly produce accurate boundary for arbitrary shape text without any post-processing.
no code implementations • 12 Jul 2021 • Fei Shen, Yi Xie, Jianqing Zhu, Xiaobin Zhu, Huanqiang Zeng
In the macro view, a list of GiT blocks are stacked to build a vehicle re-identification model, in where graphs are to extract discriminative local features within patches and transformers are to extract robust global features among patches.
Ranked #2 on
Vehicle Re-Identification
on VehicleID Small
(mAP metric)
1 code implementation • 16 Dec 2020 • Yaqi Liu, Chao Xia, Xiaobin Zhu, Shengwei Xu
The first stage is a backbone self deep matching network, and the second stage is named as Proposal SuperGlue.
1 code implementation • 29 May 2020 • Fei Shen, Jianqing Zhu, Xiaobin Zhu, Yi Xie, Jingchang Huang
Secondly, a novel pyramidal graph network (PGN) is designed to comprehensively explore the spatial significance of feature maps at multiple scales.
Ranked #3 on
Vehicle Re-Identification
on VehicleID Small
(mAP metric)
2 code implementations • CVPR 2020 • Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Chang Liu, Chun Yang, Hongfa Wang, Xu-Cheng Yin
In this paper, we propose a novel unified relational reasoning graph network for arbitrary shape text detection.
no code implementations • 27 Nov 2019 • Xiao-Yu Zhang, Changsheng Li, Haichao Shi, Xiaobin Zhu, Peng Li, Jing Dong
The point process is a solid framework to model sequential data, such as videos, by exploring the underlying relevance.
no code implementations • 20 Feb 2019 • Xiao-Yu Zhang, Haichao Shi, Changsheng Li, Kai Zheng, Xiaobin Zhu, Lixin Duan
Action recognition in videos has attracted a lot of attention in the past decade.
no code implementations • 8 Sep 2018 • Yaqi Liu, Xianfeng Zhao, Xiaobin Zhu, Yun Cao
Constrained image splicing detection and localization (CISDL) is a newly proposed challenging task for image forensics, which investigates two input suspected images and identifies whether one image has suspected regions pasted from the other.