no code implementations • 2 Jan 2025 • Tian-Hao Zhang, Jiawei Zhang, Jun Wang, Xinyuan Qian, Xu-Cheng Yin
Humans can perceive speakers' characteristics (e. g., identity, gender, personality and emotion) by their appearance, which are generally aligned to their voice style.
no code implementations • 1 Jan 2025 • Wei zhang, Tian-Hao Zhang, Chao Luo, Hui Zhou, Chao Yang, Xinyuan Qian, Xu-Cheng Yin
Recently, end-to-end automatic speech recognition has become the mainstream approach in both industry and academia.
no code implementations • 24 Sep 2024 • Yuqi Ma, Mengyin Liu, Chao Zhu, Xu-Cheng Yin
However, OVD models are pretrained on large-scale image-text pairs with rich attribute words, whose latent feature space can represent the global text feature as a linear composition of fine-grained attribute tokens without highlighting them.
1 code implementation • 5 Aug 2024 • Long Huang, Zhiwei Dong, Song-Lu Chen, Ruiyao Zhang, Shutong Ti, Feng Chen, Xu-Cheng Yin
Task inharmony problem commonly occurs in modern object detectors, leading to inconsistent qualities between classification and regression tasks.
1 code implementation • 16 Jul 2024 • Shi-Xue Zhang, Hongfa Wang, Xiaobin Zhu, Weibo Gu, Tianjin Zhang, Chun Yang, Wei Liu, Xu-Cheng Yin
In this paper, we propose a novel Spatio-Temporal Graph Transformer module to uniformly learn spatial and temporal contexts for video-language alignment pre-training (dubbed STGT).
1 code implementation • 1 May 2024 • Zhiyu Fang, Shuai-Long Lei, Xiaobin Zhu, Chun Yang, Shi-Xue Zhang, Xu-Cheng Yin, Jingyan Qin
We then craft a mixed-context reasoning module based on the multi-layer perceptron (MLP) to learn the unified representations of inter-quadruples for ECE while accomplishing temporal knowledge reasoning.
1 code implementation • 1 May 2024 • Zhiyu Fang, Jingyan Qin, Xiaobin Zhu, Chun Yang, Xu-Cheng Yin
Distinguished from traditional knowledge graphs (KGs), temporal knowledge graphs (TKGs) must explore and reason over temporally evolving facts adequately.
1 code implementation • IEEE International Conference on Robotics and Automation (ICRA) 2024 • Henrique Morimitsu, Xiaobin Zhu, Roberto M. Cesar-Jr., Xiangyang Ji, Xu-Cheng Yin
Extracting motion information from videos with optical flow estimation is vital in multiple practical robot applications.
Ranked #6 on
Optical Flow Estimation
on KITTI 2015
no code implementations • 8 Jan 2024 • Shi-Xue Zhang, Chun Yang, Xiaobin Zhu, Hongyang Zhou, Hongfa Wang, Xu-Cheng Yin
Specifically, we propose an innovative reading-order estimation module (REM) that extracts reading-order information from the initial text boundary generated by an initial boundary module (IBM).
no code implementations • CVPR 2024 • Min Liang, Jia-Wei Ma, Xiaobin Zhu, Jingyan Qin, Xu-Cheng Yin
Existing scene text detectors generally focus on accurately detecting single-level (i. e. word-level line-level or paragraph-level) text entities without exploring the relationships among different levels of text entities.
no code implementations • 24 May 2023 • Zhi-Hao Lai, Tian-Hao Zhang, Qi Liu, Xinyuan Qian, Li-Fang Wei, Song-Lu Chen, Feng Chen, Xu-Cheng Yin
To address these issues, this paper proposes InterFormer for interactive local and global features fusion to learn a better representation for ASR.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 23 May 2023 • Tian-Hao Zhang, Hai-Bo Qin, Zhi-Hao Lai, Song-Lu Chen, Qi Liu, Feng Chen, Xinyuan Qian, Xu-Cheng Yin
The experimental results show that ASCD significantly improves the performance by leveraging both the acoustic and semantic information cooperatively.
no code implementations • 21 May 2023 • Mengyin Liu, Chao Zhu, Shiqi Ren, Xu-Cheng Yin
1) Firstly, Semantic-aware Iterative Segmentation (SIS) is proposed to extract unsupervised representations of multi-view images, which are converted into 2D pedestrian masks as pseudo labels, via our proposed iterative PCA and zero-shot semantic classes from vision-language models.
1 code implementation • CVPR 2023 • Mengyin Liu, Jie Jiang, Chao Zhu, Xu-Cheng Yin
Firstly, we propose a self-supervised Vision-Language Semantic (VLS) segmentation method, which learns both fully-supervised pedestrian detection and contextual segmentation via self-generated explicit labels of semantic classes by vision-language models.
Ranked #5 on
Pedestrian Detection
on Caltech
1 code implementation • ICCV 2023 • Hongyang Zhou, Xiaobin Zhu, Jianqing Zhu, Zheng Han, Shi-Xue Zhang, Jingyan Qin, Xu-Cheng Yin
Instead of assuming degradation are spatially invariant across the whole image, we learn correction filters to adjust degradations to known degradations in a spatially variant way by a novel linearly-assembled pixel degradation-adaptive regression module (DARM).
1 code implementation • 26 Aug 2022 • Shi-Xue Zhang, Xiaobin Zhu, Lei Chen, Jie-Bo Hou, Xu-Cheng Yin
To be concrete, we adopt a Sigmoid Alpha Function (SAF) to transfer the distances between boundaries and their inside pixels to a probability map.
no code implementations • 15 Jul 2022 • Mengyin Liu, Chao Zhu, Hongyu Gao, Weibo Gu, Hongfa Wang, Wei Liu, Xu-Cheng Yin
2) Secondly, a text-guided information range minimization method is proposed to adaptively encode descriptive parts of each modality into an identical space with a powerful pretrained linguistic model.
2 code implementations • 11 May 2022 • Shi-Xue Zhang, Chun Yang, Xiaobin Zhu, Xu-Cheng Yin
In our method, we explicitly model the text boundary via an innovative iterative boundary transformer in a coarse-to-fine manner.
no code implementations • 7 May 2022 • Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Xu-Cheng Yin
Then, we propose a graph-based fusion network via Graph Convolutional Network (GCN) to learn to reason and fuse the detection boxes for generating final instance boxes.
1 code implementation • CVPR 2022 • Chang Liu, Chun Yang, Xu-Cheng Yin
Contextual information can be decomposed into temporal information and linguistic information.
1 code implementation • 12 Mar 2022 • Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Chun Yang, Xu-Cheng Yin
In this paper, we propose an innovative Kernel Proposal Network (dubbed KPN) for arbitrary shape text detection.
no code implementations • 10 Mar 2022 • Chang Liu, Chun Yang, Hai-Bo Qin, Xiaobin Zhu, Cheng-Lin Liu, Xu-Cheng Yin
Scene text recognition is a popular topic and extensively used in the industry.
no code implementations • 24 Dec 2021 • Zhiyu Fang, Xiaobin Zhu, Chun Yang, Zheng Han, Jingyan Qin, Xu-Cheng Yin
Learning a common latent embedding by aligning the latent spaces of cross-modal autoencoders is an effective strategy for Generalized Zero-Shot Classification (GZSC).
no code implementations • 14 Sep 2021 • Chuan-Fei Zhang, Yan Liu, Tian-Hao Zhang, Song-Lu Chen, Feng Chen, Xu-Cheng Yin
To tackle the above problems, we propose a new non-autoregressive transformer with a unified bidirectional decoder (NAT-UBD), which can simultaneously utilize left-to-right and right-to-left contexts.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • ICCV 2021 • Shi-Xue Zhang, Xiaobin Zhu, Chun Yang, Hongfa Wang, Xu-Cheng Yin
In this work, we propose a novel adaptive boundary proposal network for arbitrary shape text detection, which can learn to directly produce accurate boundary for arbitrary shape text without any post-processing.
1 code implementation • 27 Oct 2020 • Song-Lu Chen, Shu Tian, Jia-Wei Ma, Qi Liu, Chun Yang, Feng Chen, Xu-Cheng Yin
Second, we propose to predict the quadrilateral bounding box in the local region by regressing the four corners of the license plate to robustly detect oblique license plates.
1 code implementation • 24 Oct 2020 • Zan-Xia Jin, Heran Wu, Chun Yang, Fang Zhou, Jingyan Qin, Lei Xiao, Xu-Cheng Yin
Text-based visual question answering (VQA) requires to read and understand text in an image to correctly answer a given question.
Optical Character Recognition
Optical Character Recognition (OCR)
+3
no code implementations • 21 Oct 2020 • Ye He, Chao Zhu, Xu-Cheng Yin
These two branches are trained in a mutual-supervised way with full body annotations and visible body annotations, respectively.
2 code implementations • CVPR 2020 • Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Chang Liu, Chun Yang, Hongfa Wang, Xu-Cheng Yin
In this paper, we propose a novel unified relational reasoning graph network for arbitrary shape text detection.
no code implementations • 3 Apr 2019 • Xinjie Li, Chun Yang, Songlu Chen, Chao Zhu, Xu-Cheng Yin
Specifically, we design a generalized cross-entropy loss for the training of the proposed framework to fully exploit the semantic priors via considering the relevance between adjacent levels and enlarge the distance between samples of different coarse classes.
no code implementations • 10 Oct 2017 • Chun Yang, Xu-Cheng Yin, Zejun Li, Jianwei Wu, Chunchao Guo, Hongfa Wang, Lei Xiao
Recognizing text in the wild is a really challenging task because of complex backgrounds, various illuminations and diverse distortions, even with deep neural networks (convolutional neural networks and recurrent neural networks).
no code implementations • WS 2017 • Zan-Xia Jin, Bo-Wen Zhang, Fan Fang, Le-Le Zhang, Xu-Cheng Yin
This paper describes the participation of USTB{\_}PRIR team in the 2017 BioASQ 5B on question answering, including document retrieval, snippet retrieval, and concept retrieval task.
no code implementations • 4 Jun 2014 • Xu-Cheng Yin, Chun Yang, Hong-Wei Hao
In this paper, we argue that diversity, not direct diversity on samples but adaptive diversity with data, is highly correlated to ensemble accuracy, and we propose a novel technology for classifier ensemble, learning to diversify, which learns to adaptively combine classifiers by considering both accuracy and diversity.
no code implementations • 11 Jan 2013 • Xu-Cheng Yin, Xuwang Yin, Kai-Zhu Huang, Hong-Wei Hao
Text detection in natural scene images is an important prerequisite for many content-based image analysis tasks.