For 3D medical image (e. g. CT and MRI) segmentation, the difficulty of segmenting each slice in a clinical case varies greatly.
Zero-shot translation is a promising direction for building a comprehensive multilingual neural machine translation (MNMT) system.
However, in practice, many of the generated test cases fail to preserve similar semantic meaning and are unnatural (e. g., grammar errors), which leads to a high false alarm rate and unnatural test cases.
With the wide applications of deep neural network models in various computer vision tasks, more and more works study the model vulnerability to adversarial examples.
In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation~(NMT).
Different from TransBTS, the proposed TransBTSV2 is not limited to brain tumor segmentation (BTS) but focuses on general medical image segmentation, providing a stronger and more efficient 3D baseline for volumetric segmentation of medical images.
We propose a mandarin keyword spotting system (KWS) with several novel and effective improvements, including a big backbone (B) model, a keyword biasing (B) mechanism and the introduction of syllable modeling units (S).
This study developed optimal VSL control strategy under fog conditions with fully consideration of factors that affect traffic safety risks.
Inspired by the recent progress of large-scale pre-trained language models on machine translation in a limited scenario, we firstly demonstrate that a single language model (LM4MT) can achieve comparable performance with strong encoder-decoder NMT models on standard machine translation benchmarks, using the same training data and similar amount of model parameters.
Deep neural networks, particularly face recognition models, have been shown to be vulnerable to both digital and physical adversarial examples.
Previous substitute training approaches focus on stealing the knowledge of the target model based on real training data or synthetic data, without exploring what kind of data can further improve the transferability between the substitute and target models.
To capture the local 3D context information, the encoder first utilizes 3D CNN to extract the volumetric spatial feature maps.
This paper proposes an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems.
This paper proposes an offline control algorithm, called Recurrent Model Predictive Control (RMPC), to solve large-scale nonlinear finite-horizon optimal control problems.
Transformer becomes the state-of-the-art translation model, while it is not well studied how each intermediate component contributes to the model performance, which poses significant challenges for designing optimal architectures.
An indicator-guided learning mechanism is further proposed to ease the training of the proposed model.
After more than four months study, we found that the confirmed cases of COVID-19 present the consistent ocular pathological symbols; and we propose a new screening method of analyzing the eye-region images, captured by common CCD and CMOS cameras, could reliably make a rapid risk screening of COVID-19 with very high accuracy.
It is challenging in learning a makeup-invariant face verification model, due to (1) insufficient makeup/non-makeup face training pairs, (2) the lack of diverse makeup faces, and (3) the significant appearance changes caused by cosmetics.
Specifically, we consider that under cloth-changes, soft-biometrics such as body shape would be more reliable.
Theoretical analysis shows that the SNR is a function of the center frequency of the passband, the modulation index, the chromatic dispersion, and the shape of the IBOS.
Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we further evaluate several tasks of few-shot expression learning by virtue of our F2ED, which are to recognize the facial expressions given only few training instances.
Instead of learning the cross features directly, DeepEnFM adopts the Transformer encoder as a backbone to align the feature embeddings with the clues of other fields.
First, we create a new facial expression dataset of more than 200k images with 119 persons, 4 poses and 54 expressions.
Person Re-identification (re-id) faces two major challenges: the lack of cross-view paired training data and learning discriminative identity-sensitive and view-invariant features in the presence of large pose variations.