Recent studies have increasingly acknowledged the advantages of incorporating visual data into speech enhancement (SE) systems.
This study introduces an efficient and effective method, MeDM, that utilizes pre-trained image Diffusion Models for video-to-video translation with consistent temporal flow.
To the best of our knowledge, we are the first to propose DM-based naturalistic adversarial patch generation for object detectors.
To the best of our knowledge, our proposed method is the first to accomplish diverse and temporally consistent synthetic-to-real video translation using conditional image diffusion models.
With the development of artificial intelligence, more and more financial practitioners apply deep reinforcement learning to financial trading strategies. However, It is difficult to extract accurate features due to the characteristics of considerable noise, highly non-stationary, and non-linearity of single-scale time series, which makes it hard to obtain high returns. In this paper, we extract a multi-scale feature matrix on multiple time scales of financial time series, according to the classic financial theory-Chan Theory, and put forward to an approach of multi-scale stroke deep deterministic policy gradient reinforcement learning model(MSSDDPG)to search for the optimal trading strategy. We carried out experiments on the datasets of the Dow Jones, S&P 500 of U. S. stocks, and China's CSI 300, SSE Composite, evaluate the performance of our approach compared with turtle trading strategy, Deep Q-learning(DQN)reinforcement learning strategy, and deep deterministic policy gradient (DDPG) reinforcement learning strategy. The result shows that our approach gets the best performance in China CSI 300, SSE Composite, and get an outstanding result in Dow Jones, S&P 500 of U. S.
In visual search, the gallery set could be incrementally growing and added to the database in practice.
Face super-resolution is a challenging and highly ill-posed problem since a low-resolution (LR) face image may correspond to multiple high-resolution (HR) ones during the hallucination process and cause a dramatic identity change for the final super-resolved results.
Besides, we discover the errors not only for the identity labels of tracklets but also for the evaluation protocol for the test data of MARS.
Meanwhile, instead of normalizing the total loss with the number of objects, the proposed approach decomposes the total loss into class-wise losses and normalizes each class loss using the number of objects for the class.
Most prior works on physical adversarial attacks mainly focus on the attack performance but seldom enforce any restrictions over the appearance of the generated adversarial patches.
Unlike other modalities, constellation of joints and their motion generate models with succinct human motion information for activity recognition.
Ranked #1 on Action Recognition on Mimetics
In recent years, the research community has approached the problem of vehicle re-identification (re-id) with attention-based models, specifically focusing on regions of a vehicle containing discriminative information.
In this paper, we present a novel dual-path adaptive attention model for vehicle re-identification (AAVER).
In this paper, we propose the Uncertainty-Gated Graph (UGG), which conducts graph-based identity propagation between tracklets, which are represented by nodes in a graph.
Deep convolutional neural networks (DCNNs) also create generalizable face representations, but with cascades of simulated neurons.
In this work, we consider challenging scenarios for unconstrained video-based face recognition from multiple-shot videos and surveillance videos with low-quality frames.
We provide evaluation results of the proposed face detector on challenging unconstrained face detection datasets.
In this paper, we comprehensively study two covariate related problems for unconstrained face verification: first, how covariates affect the performance of deep neural networks on the large-scale unconstrained face verification problem; second, how to utilize covariates to improve verification performance.
In this paper, we consider the problem of grouping a collection of unconstrained face images in which the number of subjects is not known.
We show that integrating this simple step in the training pipeline significantly improves the performance of face verification and recognition systems.
In this paper, we propose an unsupervised face clustering algorithm called "Proximity-Aware Hierarchical Clustering" (PAHC) that exploits the local structure of deep representations.
Thus, in this work, we propose a deep heterogeneous feature fusion network to exploit the complementary information present in features generated by different deep convolutional neural networks (DCNNs) for template-based face recognition, where a template refers to a set of still face images or video frames from different sources which introduces more blur, pose, illumination and other variations than traditional face datasets.
The results show that the DCNN features contain surprisingly accurate information about the yaw and pitch of a face, and about whether the face came from a still image or a video frame.
Over the last five years, methods based on Deep Convolutional Neural Networks (DCNNs) have shown impressive performance improvements for object detection and recognition problems.
In this paper, we present a brief history of developments in computer vision and artificial neural networks over the last forty years for the problem of image-based recognition.
In this paper, we present an algorithm for unconstrained face verification based on deep convolutional features and evaluate it on the newly released IARPA Janus Benchmark A (IJB-A) dataset.
Ranked #13 on Face Verification on IJB-A