Video captioning is a challenging task that captures different visual parts and describes them in sentences, for it requires visual and linguistic coherence.
The existing MOTS studies face two critical challenges: 1) the published datasets inadequately capture the real-world complexity for network training to address various driving settings; 2) the working pipeline annotation tool is under-studied in the literature to improve the quality of MOTS learning examples.
We propose a novel portfolio trading system, which contains a feature preprocessing module and a trading module.
Financial trading aims to build profitable strategies to make wise investment decisions in the financial market.
Math word problem (MWP) is a challenging and critical task in natural language processing.
They utilize simple and fixed schemes, like neighborhood information aggregation or mathematical calculation of vectors, to fuse the embeddings of different user behaviors to obtain a unified embedding to represent a user's behavioral patterns which will be used in downstream recommendation tasks.
This paper describes the XMUSPEECH speaker recognition and diarisation systems for the VoxCeleb Speaker Recognition Challenge 2021.
Deep neural networks have demonstrated remarkable performance in many data-driven and prediction-oriented applications, and sometimes even perform better than humans.
In this paper, we studied the problem of localizing a generic set of keypoints across multiple quadruped or four-legged animal species from images.
Named entity recognition (NER) is usually developed and tested on text from well-written sources.
This paper presents a three-tier modality alignment approach to learning text-image joint embedding, coined as JEMA, for cross-modal retrieval of cooking recipes and food images.
We present a Multi-modal Semantics enhanced Joint Embedding approach (MSJE) for learning a common feature space between the two modalities (text and image), with the ultimate goal of providing high-performance cross-modal retrieval services.
This paper introduces a two-phase deep feature calibration framework for efficient learning of semantics enhanced text-image cross-modal joint embedding, which clearly separates the deep feature calibration in data preprocessing from training the joint embedding model.
In addition to the Language Identification (LID) tasks, multilingual Automatic Speech Recognition (ASR) tasks are introduced to OLR 2021 Challenge for the first time.
The fifth Oriental Language Recognition (OLR) Challenge focuses on language recognition in a variety of complex environments to promote its development.
We also present a two-step global semantic ICP to obtain the 3D pose (x, y, yaw) used to align the point cloud to improve matching performance.
Ranked #1 on Visual Place Recognition on KITTI
For the SV system, we proposed a multi-task learning network, where phonetic branch is trained with the character label of the utterance, and speaker branch is trained with the label of the speaker.
This paper proposes a multi-task learning network with phoneme-aware and channel-wise attentive learning strategies for text-dependent Speaker Verification (SV).
LiDAR-based SLAM system is admittedly more accurate and stable than others, while its loop closure detection is still an open issue.
In this paper, we propose a Deep Multi-behavior Graph Networks (DMBGN) to shed light on this field for the voucher redemption rate prediction.
In this work, we propose a novel joint learning framework of modeling coreference resolution and query rewriting for complex, multi-turn dialogue understanding.
ransportation systems have revolutionized the form of society.
Deep Neural Networks (DNNs) have shown great success in completing complex tasks.
Radial flow can be directly extracted from the azimuthal distribution of mean transverse rapidity.
Nuclear Theory High Energy Physics - Phenomenology Nuclear Experiment
no code implementations • 17 Dec 2020 • Andrei Afanasev, Jaseer Ahmed, Igor Akushevich, Jan C. Bernauer, Peter G. Blunden, Andrea Bressan, Duane Byer, Ethan Cline, Markus Diefenthaler, Jan M. Friedrich, Haiyan Gao, Alexandr Ilyichev, Ulrich D. Jentschura, Vladimir Khachatryan, Lin Li, Wally Melnitchouk, Richard Milner, Fred Myhrer, Chao Peng, Jianwei Qiu, Udit Raha, Axel Schmidt, Vanamali C. Shastry, Hubert Spiesberger, Stan Srednyak, Steffen Strauch, Pulak Talukdar, Weizhi Xiong
Current precision scattering experiments and even more so many experiments planed for the Electron Ion Collider will be limited by systematics.
Our second method refines word representations by aligning original and re-fined embedding spaces based on local tangent space instead of performing weighted locally linear combination twice.
1 code implementation • • Jianpeng Cheng, Devang Agrawal, Hector Martinez Alonso, Shruti Bhargava, Joris Driesen, Federico Flego, Shaona Ghosh, Dain Kaplan, Dimitri Kartsaklis, Lin Li, Dhivya Piraviperumal, Jason D Williams, Hong Yu, Diarmuid O Seaghdha, Anders Johannsen
We consider a new perspective on dialog state tracking (DST), the task of estimating a user's goal through the course of a dialog.
More recently, a direct method of the time-frequency approach, called signal separation operation (SSO), was introduced for multi-component signal separation.
We use the chirplet transform (CT) to represent a multicomponent signal in the three-dimensional space of time, frequency and chirp rate and introduce a CT-based signal separation scheme (CT3S) to retrieve modes.
The WSST with a time-varying parameter, called the adaptive WSST, was introduced very recently in the paper "Adaptive synchrosqueezing transform with a time-varying parameter for non-stationary signal separation".
As a result, a meta-learner cannot be trained well in a high-dimensional parameter space to generalize to new tasks.
Based on Kaldi and Pytorch, recipes for i-vector and x-vector systems are also conducted as baselines for the three tasks.
no code implementations • • Deepak Muralidharan, Joel Ruben Antony Moniz, Sida Gao, Xiao Yang, Justine Kao, Stephen Pulman, Atish Kothari, Ray Shen, Yinying Pan, Vivek Kaul, Mubarak Seyed Ibrahim, Gang Xiang, Nan Dun, Yidan Zhou, Andy O, Yuan Zhang, Pooja Chitkara, Xuan Wang, Alkesh Patel, Kushal Tayal, Roger Zheng, Peter Grasch, Jason D. Williams, Lin Li
Named Entity Recognition (NER) and Entity Linking (EL) play an essential role in voice assistant interaction, but are challenging due to the special difficulties associated with spoken user queries.
1 code implementation • • Lin Li, Lixin Qin, Zeguo Xu, Youbing Yin, Xin Wang, Bin Kong, Junjie Bai, Yi Lu, Zhenghan Fang, Qi Song, Kunlin Cao, Daliang Liu, Guisheng Wang, Qizhong Xu, Xisheng Fang, Shiqin Zhang, Juan Xia, Jun Xia
Materials and Methods In this retrospective and multi-center study, a deep learning model, COVID-19 detection neural network (COVNet), was developed to extract visual features from volumetric chest CT exams for the detection of COVID-19.
Aiming at the issue, we propose a sentiment analysis and key entity detection approach based on BERT, which is applied in online financial text mining and public opinion analysis in social media.
This paper attempts to find a solution to guarantee the effectiveness of waterline detection for inland maritime applications with general digital camera sensor.
Our conversational agent UKP-ATHENA assists NLP researchers in finding and exploring scientific literature, identifying relevant authors, planning or post-processing conference visits, and preparing paper submissions using a unified interface based on natural language inputs and responses.
no code implementations • 18 Sep 2019 • Deepak Muralidharan, Justine Kao, Xiao Yang, Lin Li, Lavanya Viswanathan, Mubarak Seyed Ibrahim, Kevin Luikens, Stephen Pulman, Ashish Garg, Atish Kothari, Jason Williams
Personal assistant AI systems such as Siri, Cortana, and Alexa have become widely used as a means to accomplish tasks through natural language commands.
To tackle the issue of preference aggregation for group recommendation, we propose a novel attentive aggregation representation learning method based on sociological theory for group recommendation, namely SIAGR (short for "Social Influence-based Attentive Group Recommendation"), which takes attention mechanisms and the popular method (BERT) as the aggregation representation for group profile modeling.
There are rich formats of information in the network, such as rating, text, image, and so on, which represent different aspects of user preferences.
Based on the Bi-LSTM model, the classification model of word-level attention mechanism is studied.
With the rapid development of knowledge bases(KBs), question answering(QA)based on KBs has become a hot research issue.
With the rapid development of knowledge base, question answering based on knowledge base has been a hot research issue.
In order to solve these problems, we propose a real-time panoramic video stitching framework. The framework we propose mainly consists of three algorithms, LORB image feature extraction algorithm, feature point matching algorithm based on LSH and GPU parallel video stitching algorithm based on CUDA. The experiment results show that the algorithm mentioned can improve the performance in the stages of feature extraction of images stitching and matching, the running speed of which is 11 times than that of the traditional ORB algorithm and 639 times than that of the traditional SIFT algorithm.
In this paper, we propose an end-to-end neural network framework for image-to-video person reidentification by leveraging cross-modal embeddings learned from extra information. Concretely speaking, cross-modal embeddings from image captioning and video captioning models are reused to help learned features be projected into a coordinated space, where similarity can be directly computed.
We introduce the Recurrent Relational Network to learn the spatial features in a single skeleton, followed by a multi-layer LSTM to learn the temporal features in the skeleton sequences.
Ranked #57 on Skeleton Based Action Recognition on NTU RGB+D
We propose a probabilistic modeling framework for learning the dynamic patterns in the collective behaviors of social agents and developing profiles for different behavioral groups, using data collected from multiple information sources.
In this letter, an effective image saliency detection method is proposed by constructing some novel spaces to model the background and redefine the distance of the salient patches away from the background.
Nowadays, people are motivated to share their experiences and feelings on social media, so we propose to sense SWB from the vast user generated data on social media.