Drones shooting can be applied in dynamic traffic monitoring, object detecting and tracking, and other vision tasks.
Neural processes (NPs) constitute a family of variational approximate models for stochastic processes with promising properties in computational efficiency and uncertainty quantification.
Objective: This study aims to develop and validate a novel framework, iPhantom, for automated creation of patient-specific phantoms or digital-twins (DT) using patient medical images.
To be specific, 1) supervised crowd understanding: pre-train a crowd analysis model on the synthetic data, then fine-tune it using the real data and labels, which makes the model perform better on the real world; 2) crowd understanding via domain adaptation: translate the synthetic data to photo-realistic images, then train the model on translated data and labels.
Based on the observations, we propose a scheme to fuse global and local motion patterns (MPs) and key visual information (KVI) for semantic event recognition in basketball videos.
Policy evaluation algorithms are essential to reinforcement learning due to their ability to predict the performance of a policy.
Video frame interpolation achieves temporal super-resolution by generating smooth transitions between frames.
Ranked #5 on Video Frame Interpolation on Vimeo90K
The reference algorithm, Automatminer, is a highly-extensible, fully-automated ML pipeline for predicting materials properties from materials primitives (such as composition and crystal structure) without user intervention or hyperparameter tuning.
Materials Science Computational Physics
Specifically, for a specific neuron of a source model, NLT exploits few labeled target data to learn domain shift parameters.
Through our analysis, we expect to make reasonable inference and prediction for the future development of crowd counting, and meanwhile, it can also provide feasible solutions for the problem of object counting in other fields.
no code implementations • 9 Mar 2020 • Xianpei Han, Zhichun Wang, Jiangtao Zhang, Qinghua Wen, Wenqi Li, Buzhou Tang, Qi. Wang, Zhifan Feng, Yang Zhang, Yajuan Lu, Haitao Wang, Wenliang Chen, Hao Shao, Yubo Chen, Kang Liu, Jun Zhao, Taifeng Wang, Kezun Zhang, Meng Wang, Yinlin Jiang, Guilin Qi, Lei Zou, Sen Hu, Minhao Zhang, Yinnian Lin
Knowledge graph models world knowledge as concepts, entities, and the relationships between them, which has been widely used in many real-world tasks.
Recently, lots of deep networks are proposed to improve the quality of predicted super-resolution (SR) images, due to its widespread use in several image-based fields.
According to the semantic consistency, a similar distribution in deep layer's features of the synthetic and real-world crowd area, we first introduce a semantic extractor to effectively distinguish crowd and background in high-level semantic information.
By testing seven machine learning models for formation energy on stability predictions using the Materials Project database of DFT calculations for 85, 014 unique chemical compositions, we show that while formation energies can indeed be predicted well, all compositional models perform poorly on predicting the stability of compounds, making them considerably less useful than DFT for the discovery and design of new solids.
Materials Science Computational Physics
In this paper, we consider a novel task, Spatio-Temporal Video Grounding for Multi-Form Sentences (STVG).
Deep learning-based hyperspectral image super-resolution (SR) methods have achieved great success recently.
In the last decade, crowd counting and localization attract much attention of researchers due to its wide-spread applications, including crowd monitoring, public safety, space design, etc.
Video moment retrieval is to search the moment that is most relevant to the given natural language query.
To train the meta-model without knowledge of the attack strategy, we introduce a technique called jumbo learning that samples a set of Trojaned models following a general distribution.
For general supervised deep learning classification algorithms, the pixel-by-pixel algorithm achieves precise yet inefficient classification with a small number of labeled pixels, whereas the pixel mapping algorithm achieves efficient yet edge-rough classification with more prior labels required.
The latter attempts to extract more discriminative features among different channels, which aids model to pay attention to the head region, the core of crowd scenes.
Crowd counting from a single image is a challenging task due to high appearance similarity, perspective changes and severe congestion.
In this paper, a novel locality and structure regularized low rank representation (LSLRR) model is proposed for HSI classification.
In order to better handle high dimension problem and explore abundance information, this paper presents a General End-to-end Two-dimensional CNN (GETNET) framework for hyperspectral image change detection (HSI-CD).
Our contributions are threefold: (1) A priori s-CNNs model that learns priori location information at superpixel level is proposed to describe various objects discriminatingly; (2) A hierarchical data augmentation method is presented to alleviate dataset bias in the priori s-CNNs training stage, which improves foreground objects labeling significantly; (3) A soft restricted MRF energy function is defined to improve the priori s-CNNs model's labeling performance and reduce the over smoothness at the same time.
Road detection from the perspective of moving vehicles is a challenging issue in autonomous driving.
Our contributions are as follows: 1) We propose a multi-resolution feature fusion network architecture which exploits densely connected deconvolution layers with skip connections, and can learn more effective features for the small size object; 2) We frame the traffic sign detection as a spatial sequence classification and regression task, and propose a vertical spatial sequence attention (VSSA) module to gain more context information for better detection performance.
In this paper, we present iDVO (inertia-embedded deep visual odometry), a self-supervised learning based monocular visual odometry (VO) for road vehicles.
Human actions captured in video sequences contain two crucial factors for action recognition, i. e., visual appearance and motion dynamics.
Band selection, by choosing a set of representative bands in hyperspectral image (HSI), is an effective method to reduce the redundant information without compromising the original contents.
The classification object ensures that each modal network predicts the true action category while the competing objective encourages each modal network to outperform the other one.
3) Results of motion orientation and magnitude are adaptively weighted and fused by a Bayesian model, which makes the proposed method more robust and handle more kinds of abnormal events.
Action Prediction is aimed to determine what action is occurring in a video as early as possible, which is crucial to many online applications, such as predicting a traffic accident before it happens and detecting malicious actions in the monitoring system.
In this paper, we propose a weakly supervised adversarial domain adaptation to improve the segmentation performance from synthetic data to real scenes, which consists of three deep neural networks.
different encoding schemes indicate that using machine model to accelerate optimization evaluation and reduce experimental cost is feasible to some extent, which could dramatically promote the upgrading of encoding scheme then help the blind to improve their visual perception ability.
When metallic glasses are subjected to mechanical loads, the plastic response of atoms is heterogeneous.
Materials Science Computational Physics
Secondly, we propose two schemes that exploit the synthetic data to boost the performance of crowd counting in the wild: 1) pretrain a crowd counter on the synthetic data, then finetune it using the real data, which significantly prompts the model's performance on real data; 2) propose a crowd counting method via domain adaptation, which can free humans from heavy data annotations.
This work introduces a multi-target optimization framework with Bayesian modeling of the target events, called Deep Bayesian Multi-Target Learning (DBMTL).
Recent breakthroughs in Go play and strategic games have witnessed the great potential of reinforcement learning in intelligently scheduling in uncertain environment, but some bottlenecks are also encountered when we generalize this paradigm to universal complex tasks.
Based on the proposed model, we also construct a PatientEG dataset with 191, 294 events, 3, 429 distinct entities, and 545, 993 temporal relations using EMRs from Shanghai Shuguang hospital.
Two CNN-based classification models were then used as feature extractors to obtain the discriminative features of the entire CXR images and the cropped lung region images.
Previous transfer learning methods based on deep network assume the knowledge should be transferred between the same hidden layers of the source domain and the target domains.
In this paper, we propose a Residual Dilated Convolutional Neural Network with Conditional Random Field (RD-CNN-CRF) to solve it.
Coronary artery disease (CAD) is one of the leading causes of cardiovascular disease deaths.
In this paper, we present an attention-based Bi-GRU-CapsNet model to detect hypernymy relationship between compound entities.
Clinical Named Entity Recognition (CNER) aims to identify and classify clinical terms such as diseases, symptoms, treatments, exams, and body parts in electronic health records, which is a fundamental and crucial task for clinical and translational research.
Motivated by an important insight from neural science, we propose a new framework for understanding the success of the recently proposed "maxout" networks.