Further, the fake faces by our method can pass face forgery detection and face recognition, which exposes the security problems of face forgery detectors.
In particular, CG2A develops a Gradient Agreement Solver to adaptively balance the varying gradient magnitudes, and introduces a Soft Gradient Surgery strategy to alleviate the gradient conflicts.
Then, we introduce a low-noise representation to alleviate the domain shifts and build a structured relationship of multiple data examples to exploit data knowledge.
1 code implementation • • Dingkang Yang, Shuai Huang, Zhi Xu, Zhenpeng Li, Shunli Wang, Mingcheng Li, Yuzheng Wang, Yang Liu, Kun Yang, Zhaoyu Chen, Yan Wang, Jing Liu, Peixuan Zhang, Peng Zhai, Lihua Zhang
Driver distraction has become a significant cause of severe traffic accidents over the past decade.
We propose and study a new computer vision task named open-vocabulary video instance segmentation (OpenVIS), which aims to simultaneously segment, detect, and track arbitrary objects in a video according to corresponding text descriptions.
To overcome the lack of labeled training data, a fast and effective generation pipeline in Blender3D is designed to simulate middle ear shapes and extract in-vivo noisy and partial point clouds.
Moreover, the existing Adversarial Training (AT) can further improve the robustness of these large models.
However, a long-overlooked issue is that a context bias in existing datasets leads to a significantly unbalanced distribution of emotional states among different context scenarios.
First, STDE introduces target videos as patch textures and only adds patches on keyframes that are adaptively selected by temporal difference.
Data-free knowledge distillation is a challenging model lightweight task for scenarios in which the original dataset is not available.
To this end, we propose a novel structured ARD method called Contrastive Relationship DeNoise Distillation (CRDND).
To tackle these issues, we propose Global Momentum Initialization (GI) to suppress gradient elimination and help search for the global optimum.
The videos in our LVOS last 1. 59 minutes on average, which is 20 times longer than videos in existing VOS datasets.
Though deep neural networks (DNNs) have demonstrated excellent performance in computer vision, they are susceptible and vulnerable to carefully crafted adversarial examples which can mislead DNNs to incorrect outputs.
To obtain consistent prediction probabilities from the dual path, we further propose a dual path regularization loss, aiming to minimize the divergence between the distributions of two-path embeddings.
Ranked #6 on Dynamic Facial Expression Recognition on DFEW
To move towards a practical certifiable patch defense, we introduce Vision Transformer (ViT) into the framework of Derandomized Smoothing (DS).
Recently, adversarial attacks have been applied in visual object tracking to deceive deep trackers by injecting imperceptible perturbations into video frames.
Then, we design a two-level perturbation fusion strategy to alleviate the conflict between the adversarial watermarks generated by different facial images and models.
Firstly, we propose a patch selection and refining scheme to find the pixels which have the greatest importance for attack and remove the inconsequential perturbations gradually.