Our contributions are three-fold: 1) CoText simultaneously address the three tasks (e. g., text detection, tracking, recognition) in a real-time end-to-end trainable framework.
A novel Shuffled Style Assembly Network (SSAN) is proposed to extract and reassemble different content and style features for a stylized feature space.
Semantic representation is of great benefit to the video text tracking(VTT) task that requires simultaneously classifying, detecting, and tracking texts in the video.
Most existing video text spotting benchmarks focus on evaluating a single language and scenario with limited data.
Face anti-spoofing (FAS) plays a crucial role in securing face recognition systems.
Face forgery detection is raising ever-increasing interest in computer vision since facial manipulation technologies cause serious worries.
In this paper, we propose a data augmentation method using generative adversarial networks (GAN).