We show in this work that memory intensive computations can result in severe performance problems due to off-chip memory access and CPU-GPU context switch overheads in a wide range of deep learning models.
Existing learning based methods for text labeling task usually require a large amount of labeled examples to train a specific model for each type of document.
The proposed grasping detection network specially provides a prediction uncertainty estimation mechanism by leveraging on Feature Pyramid Network (FPN), and the mean-teacher semi-supervised learning utilizes such uncertainty information to emphasizing the consistency loss only for those unlabelled data with high confidence, which we referred it as the confidence-driven mean teacher.
To be specific, 1) supervised crowd understanding: pre-train a crowd analysis model on the synthetic data, then fine-tune it using the real data and labels, which makes the model perform better on the real world; 2) crowd understanding via domain adaptation: translate the synthetic data to photo-realistic images, then train the model on translated data and labels.
Then we instantiate this search strategy by optimizing both a dedicated graph neural network (GNN) and the adjacency tensor associated with the defined feature graph.
The last decade has witnessed growth in the computational requirements for training deep neural networks.
Specifically, we build topic models upon graphs using anonymous walks and Graph Anchor LDA, an LDA variant that selects significant structural patterns first, so as to alleviate the complexity and generate structural topics efficiently.
Recently, lots of deep networks are proposed to improve the quality of predicted super-resolution (SR) images, due to its widespread use in several image-based fields.
Motivated by the necessity and benefits of task-oriented BERT compression, we propose a novel compression method, AdaBERT, that leverages differentiable Neural Architecture Search to automatically compress BERT into task-adaptive small models for specific tasks.
In the last decade, crowd counting and localization attract much attention of researchers due to its wide-spread applications, including crowd monitoring, public safety, space design, etc.
Sponsored search optimizes revenue and relevance, which is estimated by Revenue Per Mille (RPM).
One critical issue for efficiently operating practical AI clouds, is to characterize the computing and data transfer demands of these workloads, and more importantly, the training performance given the underlying software framework and hardware configurations.
DL2 is a DL-driven scheduler for DL clusters, targeting global training job expedition by dynamically resizing resources allocated to jobs.
This technical report attempts to provide efficient and solid kits addressed on the field of crowd counting, which is denoted as Crowd Counting Code Framework (C$^3$F).
Network embedding is a method to learn low-dimensional representation vectors for nodes in complex networks.
Secondly, we propose two schemes that exploit the synthetic data to boost the performance of crowd counting in the wild: 1) pretrain a crowd counter on the synthetic data, then finetune it using the real data, which significantly prompts the model's performance on real data; 2) propose a crowd counting method via domain adaptation, which can free humans from heavy data annotations.
An increasing number of machine learning tasks require dealing with large graph datasets, which capture rich and complex relationship among potentially billions of elements.
Distributed, Parallel, and Cluster Computing
In the real traffic of a large-scale e-commerce sponsored search, the proposed approach significantly outperforms the baseline.
In this method, the network is viewed as a computational graph, in which the vertices denote the computation nodes and edges represent the information flow.
In recent years, there is a surge on machine learning applications in industry.
Distributed, Parallel, and Cluster Computing Mathematical Software
Building multi-turn information-seeking conversation systems is an important and challenging research topic.
To solve this problem, we propose a novel end-to-end scene text detector IncepText from an instance-aware segmentation perspective.
Given a search request, ad retrieval module rewrites the query into bidding keywords, and uses these keywords as keys to select Top N ads through inverted indexes.
A new methodology of nonsparse learning with latent variables (NSL) is proposed to simultaneously recover the significant observable predictors and latent factors as well as their effects.
This paper describes the USTC_NELSLIP systems submitted to the Trilingual Entity Detection and Linking (EDL) track in 2016 TAC Knowledge Base Population (KBP) contests.