Extensive evaluation over real-world traffic data sets, including normal, encrypted and malicious labels, show that, CGNN improves the prediction accuracy by 23\% to 29\% for application classification, by 2\% to 37\% for malicious traffic classification, and reaches the same accuracy level for encrypted traffic classification.
The energy term of the prior model couples a continuous latent vector and a symbolic one-hot vector, so that discrete category can be inferred from the observed example based on the continuous latent vector.
Transfer learning with large pretrained transformer-based language models like BERT has become a dominating approach for most NLP tasks.
In light of this, we propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution.
This paper studies the adaptive optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques.
It is initialized from the prior distribution of the latent variable and then runs a small number (e. g., 20) of Langevin dynamics steps guided by its posterior distribution.
Sampling from or optimizing the learned LB-EBM yields a belief vector which is used to make a path plan, which then in turn helps to predict a long-range trajectory.
Such spatial and attention features are nested deeply, therefore, the proposed framework works in a mixed top-down and bottom-up manner.
Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload.
First, we construct and release a new dense video captioning dataset, Video Timeline Tags (ViTT), featuring a variety of instructional videos together with time-stamped annotations.
Ranked #1 on Dense Video Captioning on YouCook2 (ROUGE-L metric, using extra training data)
In this work, we propose a novel selective contrastive learning framework for unsupervised feature learning.
In this paper, we further improve spatio-temporal point cloud feature learning with a flexible module called ASAP considering both attention and structure information across frames, which we find as two important factors for successful segmentation in dynamic point clouds.
Open-domain dialogue generation has gained increasing attention in Natural Language Processing.
Due to the low dimensionality of the latent space and the expressiveness of the top-down network, a simple EBM in latent space can capture regularities in the data effectively, and MCMC sampling in latent space is efficient and mixes well.
We show that the model has a particularly simple form in the space of the latent variables of the flow-based model, and MCMC sampling of the EBM in the latent space, which is a simple special case of neural transport MCMC, mixes well and traverses modes in the data space.
As deep learning brings excellent performances to object detection algorithms, Tracking by Detection (TBD) has become the mainstream tracking framework.
This paper proposes a joint training method to learn both the variational auto-encoder (VAE) and the latent energy-based model (EBM).
Image enhancement from degradation of rainy artifacts plays a critical role in outdoor visual computing systems.
Understanding sequential information is a fundamental task for artificial intelligence.
This paper studies the robustness of policy iteration in the context of continuous-time infinite-horizon linear quadratic regulation (LQR) problem.
Systems and Control Numerical Analysis Systems and Control Numerical Analysis Optimization and Control
no code implementations • 5 May 2020 • Dario Fuoli, Zhiwu Huang, Martin Danelljan, Radu Timofte, Hua Wang, Longcun Jin, Dewei Su, Jing Liu, Jaehoon Lee, Michal Kudelski, Lukasz Bala, Dmitry Hrybov, Marcin Mozejko, Muchen Li, Si-Yao Li, Bo Pang, Cewu Lu, Chao Li, Dongliang He, Fu Li, Shilei Wen
For track 2, some existing methods are evaluated, showing promising solutions to the weakly-supervised video quality mapping problem.
Pretraining from unlabelled web videos has quickly become the de-facto means of achieving high performance on many video understanding tasks.
Learning such a generative model requires inferring the latent variables for each training example based on the posterior distribution of these latent variables.
Instructional videos get high-traffic on video sharing platforms, and prior work suggests that providing time-stamped, subtask annotations (e. g., "heat the oil in the pan") improves user experiences.
4 code implementations • • Tao Yu, Rui Zhang, He Yang Er, Suyi Li, Eric Xue, Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze Shi, Zihan Li, Youxuan Jiang, Michihiro Yasunaga, Sungrok Shim, Tao Chen, Alexander Fabbri, Zifan Li, Luyao Chen, Yuwen Zhang, Shreya Dixit, Vincent Zhang, Caiming Xiong, Richard Socher, Walter S. Lasecki, Dragomir Radev
We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems.
Object detection plays an important role in current solutions to vision and language tasks like image captioning and visual question answering.
Ranked #4 on Visual Question Answering on VizWiz 2018
Although the proper use of idioms can enhance the elegance of writing, the active use of various expressions is a challenge because remembering idioms is difficult.
5 code implementations • • Tao Yu, Rui Zhang, Michihiro Yasunaga, Yi Chern Tan, Xi Victoria Lin, Suyi Li, Heyang Er, Irene Li, Bo Pang, Tao Chen, Emily Ji, Shreya Dixit, David Proctor, Sungrok Shim, Jonathan Kraft, Vincent Zhang, Caiming Xiong, Richard Socher, Dragomir Radev
The best model obtains an exact match accuracy of 20. 2% over all questions and less than10% over all interaction sequences, indicating that the cross-domain setting and the con-textual phenomena of the dataset present significant challenges for future research.
There are mainly two novel designs in our deep RNN framework: one is a new RNN module called Context Bridge Module (CBM) which splits the information flowing along the sequence (temporal direction) and along depth (spatial representation direction), making it easier to train when building deep by balancing these two directions; the other is the Overlap Coherence Training Scheme that reduces the training complexity for long visual sequential tasks on account of the limitation of computing resources.
We introduce the first benchmark for a new problem --- recognizing human action adverbs (HAA): "Adverbs Describing Human Actions" (ADHA).
We consider the problem of classifying documents not by topic, but by overall sentiment, e. g., determining whether a review is positive or negative.