Moment retrieval in videos is a challenging task that aims to retrieve the most relevant video moment in an untrimmed video given a sentence description.
Finally, we verify the reliability of the model and achieved automatic measurement of VV and ICV.
Temporal language grounding in videos aims to localize the temporal span relevant to the given query sentence.
A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles.
This paper presents a novel method to generate answers for non-extraction machine reading comprehension (MRC) tasks whose answers cannot be simply extracted as one span from the given passages.
In this work, we focus on Dialogue-related Natural Language Processing (DrNLP) tasks and design a Dialogue-Adaptive Pre-training Objective (DAPO) based on some important qualities for assessing dialogues which are usually ignored by general LM pre-training objectives.
Current stripe-based feature learning approaches have delivered impressive accuracy, but do not make a proper trade-off between diversity, locality, and robustness, which easily suffers from part semantic inconsistency for the conflict between rigid partition and misalignment.
Accurate temporal action proposals play an important role in detecting actions from untrimmed videos.
A collection of approaches based on graph convolutional networks have proven success in skeleton-based action recognition by exploring neighborhood information and dense dependencies between intra-frame joints.
Ranked #19 on Skeleton Based Action Recognition on NTU RGB+D
The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference tasks.
Ranked #4 on Natural Language Inference on SNLI
Multi-choice reading comprehension is a challenging task to select an answer from a set of candidate options when given passage and question.
In this technical report, we describe our solution to temporal action proposal (task 1) in ActivityNet Challenge 2019.
Multi-choice reading comprehension is a challenging task that requires complex reasoning procedure.
Ranked #3 on Question Answering on RACE
Semantic role labeling (SRL) aims to discover the predicateargument structure of a sentence.
Ranked #6 on Semantic Role Labeling on CoNLL 2005
Recently, semantic segmentation and general object detection frameworks have been widely adopted by scene text detecting tasks.
Automatic speech recognition (ASR) tasks are resolved by end-to-end deep learning models, which benefits us by less preparation of raw data, and easier transformation between languages.
Deep Feedforward Sequential Memory Network (DFSMN) has shown superior performance on speech recognition tasks.
Sound Audio and Speech Processing
Neural machine translation models integrating results of loanword identification experiments achieve the best results on OOV translation(with 0. 5-0. 9 BLEU improvements)
Instead of learning on semantic regions, we uniformly partition the images into several stripes, and vary the number of parts in different local branches to obtain local feature representations with multiple granularities.
Ranked #3 on Person Re-Identification on SYSU-30k (using extra training data)
We propose a straightforward method that simultaneously reconstructs the 3D facial structure and provides dense alignment.
Ranked #1 on 3D Face Reconstruction on Florence
To alleviate data sparsity in spoken Uyghur machine translation, we proposed a log-linear based morphological segmentation approach.
At the global stage, given an image with a rough face detection result, the full face region is firstly re-initialized by a supervised spatial transformer network to a canonical shape state and then trained to regress a coarse landmark estimation.
Existing discourse research only focuses on the monolingual languages and the inconsistency between languages limits the power of the discourse theory in multilingual applications such as machine translation.