Although human parsing has made great progress, it still faces a challenge, i. e., how to extract the whole foreground from similar or cluttered scenes effectively.
To overcome the defects caused by the erasing operation, we perform feature reconstruction to recover the information destroyed by occlusion and details lost in cleaning procedure.
Based on the designed TDM phased-MIMO radar, we develop a system to automatically localize multiple human subjects and estimate their vital signs.
Despite considerable similarities between multiple object tracking (MOT) and single object tracking (SOT) tasks, modern MOT methods have not benefited from the development of SOT ones to achieve satisfactory performance.
These idealized assumptions, however, makes the existing audio adversarial attacks mostly impossible to be launched in a timely fashion in practice (e. g., playing unnoticeable adversarial perturbations along with user's streaming input).
As the popularity of voice user interface (VUI) exploded in recent years, speaker recognition system has emerged as an important medium of identifying a speaker in many security-required applications and services.
In this paper, we build a speech privacy attack that exploits speech reverberations generated from a smartphone's inbuilt loudspeaker captured via a zero-permission motion sensor (accelerometer).
Cryptography and Security
After observing that the features used in most online discriminatively trained trackers are not optimal, in this paper, we propose a novel and effective architecture to learn optimal feature embeddings for online discriminative tracking.
In this paper, we propose a novel Recurrent Calibration Network (RCN) for irregular scene text recognition.
Datasets drive vision progress, yet existing driving datasets are impoverished in terms of visual content and supported tasks to study multitask learning for autonomous driving.
Finally, an automatic portrait animation system based on fast deep matting is built on mobile devices, which does not need any interaction and can realize real-time matting with 15 fps.
In the first stage, a convolutional encoder-decoder sub-network is employed to reconstruct the background images and encode rich prior knowledge of background scenes.