A data augmentation module is utilized in contrastive learning to transform the given data example into two views, which is considered essential and irreplaceable.
To address the above challenging problems, we propose a novel Cross-city Federated Transfer Learning framework (CcFTL) to cope with the data insufficiency and privacy problems.
Federated learning (FL) is a privacy-preserving machine learning paradigm in which the server periodically aggregates local model parameters from clients without assembling their private data.
Second, we employ a dynamic graph relationship learning module to learn dynamic spatial relationships between metro stations without a predefined graph adjacency matrix.
In this paper, we propose a novel Hierarchical Spatio-Temporal Graph Neural Network (HiSTGNN) to model cross-regional spatio-temporal correlations among meteorological variables in multiple stations.
Accurate forecasting of citywide traffic flow has been playing critical role in a variety of spatial-temporal mining applications, such as intelligent traffic control and public risk assessment.
People often refer to a place of interest (POI) by an alias.
1 code implementation • 13 Jun 2021 • Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan
This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10, 000 hours of high quality labeled audio suitable for supervised training, and 40, 000 hours of total audio suitable for semi-supervised and unsupervised training.
Ranked #1 on Speech Recognition on GigaSpeech
This paper introduces a new open-source speech corpus named "speechocean762" designed for pronunciation assessment use, consisting of 5000 English utterances from 250 non-native speakers, where half of the speakers are children.
Ranked #3 on Phone-level pronunciation scoring on speechocean762
An appropriate weight selection algorithm that combines the information quantity of training accuracy and training frequency to measure the weights is proposed.
This framework consists of three parts: 1) a local feature extraction module to learn representations for each region; 2) a global context module to extract global contextual priors and upsample them to generate the global features; and 3) a region-specific predictor based on tensor decomposition to provide customized predictions for each region, which is very parameter-efficient compared to previous methods.
It is commonly observed that the data are scattered everywhere and difficult to be centralized.
Urban spatial-temporal flows prediction is of great importance to traffic management, land use, public safety, etc.
Predicting urban traffic is of great importance to intelligent transportation systems and public safety, yet is very challenging because of two aspects: 1) complex spatio-temporal correlations of urban traffic, including spatial correlations between locations along with temporal correlations among timestamps; 2) diversity of such spatiotemporal correlations, which vary from location to location and depend on the surrounding geographical information, e. g., points of interests and road networks.
This paper focuses on two related subtasks of aspect-based sentiment analysis, namely aspect term extraction and aspect sentiment classification, which we call aspect term-polarity co-extraction.
In this paper, we tackle these challenges and propose a privacy-preserving machine learning model, called Federated Forest, which is a lossless learning model of the traditional random forest method, i. e., achieving the same level of accuracy as the non-privacy-preserving approach.
In this letter, we address the problem of controlling energy storage systems (ESSs) for arbitrage in real-time electricity markets under price uncertainty.
In this paper, we formulate crowd flow forecasting in irregular regions as a spatio-temporal graph (STG) prediction problem in which each node represents a region with time-varying flows.
In this paper, we aim to infer the real-time and fine-grained crowd flows throughout a city based on coarse-grained observations.
Ranked #1 on Fine-Grained Urban Flow Inference on TaxiBJ-P4
In this paper, we propose a general framework (HyperST-Net) based on hypernetworks for deep ST models.
In this paper, we propose an attention-based end-to-end neural approach for small-footprint keyword spotting (KWS), which aims to simplify the pipelines of building a production-quality KWS system.
Speaker adaptation aims to estimate a speaker specific acoustic model from a speaker independent one to minimize the mismatch between the training and testing conditions arisen from speaker variabilities.
First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in our dataset.
Previous attempts have shown that applying attention-based encoder-decoder to Mandarin speech recognition was quite difficult due to the logographic orthography of Mandarin, the large vocabulary and the conditional dependency of the attention model.
We propose a deep-learning-based approach, called ST-ResNet, to collectively forecast two types of crowd flows (i. e. inflow and outflow) in each and every region of a city.
The rapid growth of emerging information technologies and application patterns in modern society, e. g., Internet, Internet of Things, Cloud Computing and Tri-network Convergence, has caused the advent of the era of big data.
The aggregation is further combined with external factors, such as weather and day of the week, to predict the final traffic of crowds in each and every region.
In this paper, we propose a spatio-temporal multi-view-based learning (ST-MVL) method to collectively fill missing readings in a collection of geosensory time series data, considering 1) the temporal correlation between readings at different timestamps in the same series and 2) the spatial correlation between different time series.