Compared with the commonly used NuQE baseline, BAL-QE achieves 47% (En-Ru) and 75% (En-De) of performance promotions.
This paper presents our submissions to the IWSLT 2022 Isometric Spoken Language Translation task.
The paper presents the HW-TSC’s pipeline and results of Offline Speech to Speech Translation for IWSLT 2022.
The cascade system is composed of a chunking-based streaming ASR model and the SimulMT model used in the T2T track.
For machine translation part, we pretrained three translation models on WMT21 dataset and fine-tuned them on in-domain corpora.
This paper presents the submission of Huawei Translation Services Center (HW-TSC) to the WMT 2021 Large-Scale Multilingual Translation Task.
This paper presents the submission of Huawei Translation Service Center (HW-TSC) to WMT 2021 Triangular MT Shared Task.
This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the WMT 2021 News Translation Shared Task.
Mask-predict CMLM (Ghazvininejad et al., 2019) has achieved stunning performance among non-autoregressive NMT models, but we find that the mechanism of predicting all of the target words only depending on the hidden state of [MASK] is not effective and efficient in initial iterations of refinement, resulting in ungrammatical repetitions and slow convergence.
Length prediction is a special task in a series of NAT models where target length has to be determined before generation.
This paper presents our work in the WMT 2020 Word and Sentence-Level Post-Editing Quality Estimation (QE) Shared Task.
This paper describes Huawei’s submissions to the WMT20 biomedical translation shared task.
The paper presents the submission by HW-TSC in the WMT 2020 Automatic Post Editing Shared Task.
We also conduct experiment with similar language augmentation, which lead to positive results, although not used in our submission.
This paper describes our work in the WAT 2020 Indic Multilingual Translation Task.
We propose a unified multilingual model for humor detection which can be trained under a transfer learning framework.
Based on large-scale pretrained networks and the liability to be easily overfitting with limited labelled training data of multimodal translation (MMT) is a critical issue in MMT.
Natural Language Inference (NLI) datasets contain examples with highly ambiguous labels due to its subjectivity.
This paper presents our work in WMT 2021 Quality Estimation (QE) Shared Task.
This paper describes the submission of Huawei Translation Service Center (HW-TSC) to WMT21 biomedical translation task in two language pairs: Chinese↔English and German↔English (Our registered team name is HuaweiTSC).
This paper presents the submission of Huawei Translation Services Center (HW-TSC) to WMT 2021 Efficiency Shared Task.
Few-shot relation learning refers to infer facts for relations with a limited number of observed triples.
To this end, we propose a plug-in algorithm for this line of work, i. e., Aligned Constrained Training (ACT), which alleviates this problem by familiarizing the model with the source-side context of the constraints.
We propose pose-guided multiplane image (MPI) synthesis which can render an animatable character in real scenes with photorealistic quality.
Since theses ID labels automatically derived from tracklets inevitably contain noises, we develop a large-scale Pre-training framework utilizing Noisy Labels (PNL), which consists of three learning modules: supervised Re-ID learning, prototype-based contrastive learning, and label-guided contrastive learning.
Ranked #5 on Person Re-Identification on CUHK03
This is enabled by a unified architecture, Omni-DETR, based on the recent progress on student-teacher framework and end-to-end transformer based object detection.
Through further analysis of the ASR outputs, we find that in some cases the sentiment words, the key sentiment elements in the textual modality, are recognized as other words, which makes the sentiment of the text change and hurts the performance of multimodal sentiment models directly.
Extensive experiments on well-known white- and black-box attacks show that MFDV-SNN achieves a significant improvement over existing methods, which indicates that it is a simple but effective method to improve model robustness.
In this paper, we aim to close the gap by preserving the original objective of AR and NAR under a unified framework.
no code implementations • 22 Dec 2021 • Jiaxin Guo, Minghan Wang, Daimeng Wei, Hengchao Shang, Yuxia Wang, Zongyao Li, Zhengzhe Yu, Zhanglin Wu, Yimeng Chen, Chang Su, Min Zhang, Lizhi Lei, Shimin Tao, Hao Yang
An effective training strategy to improve the performance of AT models is Self-Distillation Mixup (SDM) Training, which pre-trains a model on raw data, generates distilled data by the pre-trained model itself and finally re-trains a model on the combination of raw data and distilled data.
no code implementations • 22 Dec 2021 • Zhengzhe Yu, Jiaxin Guo, Minghan Wang, Daimeng Wei, Hengchao Shang, Zongyao Li, Zhanglin Wu, Yuxia Wang, Yimeng Chen, Chang Su, Min Zhang, Lizhi Lei, Shimin Tao, Hao Yang
Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but it reaches the upper bound of translation quality when the number of encoder layers exceeds 18.
In this paper, we study the transfer performance of pre-trained models on face analysis tasks and introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner.
Ranked #1 on Face Parsing on CelebAMask-HQ (using extra training data)
This paper presents the Delayed Propagation Transformer (DePT), a new transformer-based model that specializes in the global modeling of CPS while taking into account the immutable constraints from the physical world.
In this work, we propose an adversarial training based modification to the current state-of-the-arts link prediction method to solve this problem.
On the other hand, AAM is an attention module which can get anisotropic attention mask focusing on the region of point and its local edge connected by adjacent points, it has a stronger response in tangent than in normal, which means relaxed constraints in the tangent.
Ranked #2 on Face Alignment on 300W
Adversarial training (AT) is one of the most reliable methods for defending against adversarial attacks in machine learning.
Modeling inter-dependencies between time-series is the key to achieve high performance in anomaly detection for multivariate time-series data.
This paper describes our work in participation of the IWSLT-2021 offline speech translation task.
Everyday tasks are characterized by their varieties and variations, and frequently are not clearly specified to service agents.
To overcome these limitations, we propose a patch-wise spatial-temporal quality enhancement network which firstly extracts spatial and temporal features, then recalibrates and fuses the obtained spatial and temporal features.
Learning a good transfer function to map the word vectors from two languages into a shared cross-lingual word vector space plays a crucial role in cross-lingual NLP.
Graph-based fraud detection approaches have escalated lots of attention recently due to the abundant relational information of graph-structured data, which may be beneficial for the detection of fraudsters.
Ranked #3 on Node Classification on Amazon-Fraud
Question Answering (QA) models over Knowledge Bases (KBs) are capable of providing more precise answers by utilizing relation information among entities.
During the training process, DLT records the loss value of each sample and calculates dynamic loss thresholds.
In this paper, we proposed a novel Style-based Point Generator with Adversarial Rendering (SpareNet) for point cloud completion.
Ranked #1 on Point Cloud Completion on ShapeNet (Earth Mover's Distance metric)
In the last decades, extreme classification has become an essential topic for deep learning.
We present the first graph-based adversarial detection method that constructs a Latent Neighborhood Graph (LNG) around an input example to determine if the input example is adversarial.
Various Position Embeddings (PEs) have been proposed in Transformer based architectures~(e. g. BERT) to model word order.
Adversarial examples are input examples that are specifically crafted to deceive machine learning classifiers.
In this paper, we present a large scale unlabeled person re-identification (Re-ID) dataset "LUPerson" and make the first attempt of performing unsupervised pre-training for improving the generalization ability of the learned person Re-ID feature representation.
Ranked #2 on Person Re-Identification on Market-1501 (using extra training data)
The structural information of Knowledge Bases (KBs) has proven effective to Question Answering (QA).
We use the continuous-variable Gaussian quantum information formalism to show that quantum illumination is better for object detection compared with coherent states of the same mean photon number, even for simple direct photodetection.
Object Detection Quantum Physics
Multi-contrast magnetic resonance (MR) image registration is useful in the clinic to achieve fast and accurate imaging-based disease diagnosis and treatment planning.
It is among the first efforts in applying edge computing for real-time traffic video analytics and is expected to benefit multiple sub-fields in smart transportation research and applications.
The paper presents details of our system in the IWSLT Video Speech Translation evaluation.
Existing algorithms usually extract visual features using off-the-shelf Convolutional Neural Networks (CNNs) and late fuse the visual and non-visual features for the finally predicted CTR.
We study the problem of making item recommendations to ephemeral groups, which comprise users with limited or no historical activities together.
The rapid proliferation of new users and items on the social web has aggravated the gray-sheep user/long-tail item challenge in recommender systems.
There are a few approaches that consider an entire outfit, but these approaches have limitations such as requiring rich semantic information, category labels, and fixed order of items.
While feasible, digital attacks have limited applicability in attacking deployed systems, including face recognition systems, where an adversary typically has access to the input and not the transmission channel.
It transmits the high-level semantic features to the low-level layers and flows temporal information stage by stage to progressively model global spatial-temporal features for action recognition; (3) The FGCN model provides early predictions.
Ranked #18 on Skeleton Based Action Recognition on NTU RGB+D 120
Our findings challenge common practices of fine-tuning and encourages deep learning practitioners to rethink the hyperparameters for fine-tuning.
For the difficult cases, where the domain gaps and especially category differences are large, we explore three different exemplar sampling methods and show the proposed adaptive sampling method is effective to select diverse and informative samples from entire datasets, to further prevent forgetting.
For this reason, face X-ray provides an effective way for detecting forgery generated by most existing face manipulation algorithms.
We propose a novel attributes encoder for extracting multi-level target face attributes, and a new generator with carefully designed Adaptive Attentional Denormalization (AAD) layers to adaptively integrate the identity and the attributes for face synthesis.
Supervised machine learning tasks in networks such as node classification and link prediction require us to perform feature engineering that is known and agreed to be the key to success in applied machine learning.
However, these methods require fully annotated object bounding boxes for training, which are incredibly hard to scale up due to the high annotation cost.
Then, an attention mechanism is proposed to model the relations between the image region and blocks and generate the valuable position feature, which will be further utilized to enhance the region expression and model a more reliable relationship between the visual image and the textual sentence.
Recently, approaches based on deep learning and methods for contextual information extraction have served in many image segmentation tasks.
To address these challenges, this paper proposes a Cross-Level fusion and Context Inference Network (CLCI-Net) for the chronic stroke lesion segmentation from T1-weighted MR images.
Recent advances in deep learning have facilitated the demand of neural models for real applications.
Ranked #2 on Sentiment Analysis on MPQA
To solve these two challenges, in this paper, combined with the sliding window detection algorithm and Convolution Neural Network we propose a real-time VoIP steganalysis method which based on multi-channel convolution sliding windows.
Learning latent representations of nodes in graphs is an important and ubiquitous task with widespread applications such as link prediction, node classification, and graph visualization.
With huge amount of information generated every day on the web, fact checking is an important and challenging task which can help people identify the authenticity of most claims as well as providing evidences selected from knowledge source like Wikipedia.
Object detection without bounding box annotations, i. e, weakly supervised detection methods, are still lagging far behind.
Ranked #15 on Weakly Supervised Object Detection on PASCAL VOC 2012 test (using extra training data)
As the proposed PI loss is convex and SGD compatible and the framework itself is a fully convolutional network, MIML-FCN+ can be easily integrated with state of-the-art deep learning networks.
Experimental results demonstrate the effectiveness of the proposed semantic descriptor and the usefulness of incorporating the structured semantic correlations.
Bird strikes present a huge risk for aircraft, especially since traditional airport bird surveillance is mainly dependent on inefficient human observation.
With strong labels, our framework is able to achieve state-of-the-art results in both datasets.
Ranked #13 on Multi-Label Classification on PASCAL VOC 2007
A large number of experimental data shows that Support Vector Machine (SVM) algorithm has obvious advantages in text classification, handwriting recognition, image classification, bioinformatics, and some other fields.