In this work, we propose to decouple the explicit modelling of spatial relations from local aggregation.
Ranked #2 on Semantic Segmentation on S3DIS Area5
In addition, there are weak edges at the tooth, between tooth and root canal, which makes it very difficult to segment such weak edges.
Image deraining is a challenging task that involves restoring degraded images affected by rain streaks.
We devised a cross-layer simulation framework to evaluate the effectiveness of STT-MRAM as a scratchpad replacing SRAM in a systolic-array-based DNN accelerator.
In this survey, we review the state-of-the-art calibration methods and provide an understanding of their principles for performing model calibration.
Federated learning (FL) has found numerous applications in healthcare, finance, and IoT scenarios.
no code implementations • 21 Jun 2023 • Zimeng Li, Sa Xiao, Cheng Wang, Haidong Li, Xiuchao Zhao, Caohui Duan, Qian Zhou, Qiuchen Rao, Yuan Fang, Junshuai Xie, Lei Shi, Fumin Guo, Chaohui Ye, Xin Zhou
Magnetic resonance imaging (MRI) using hyperpolarized noble gases provides a way to visualize the structure and function of human lung, but the long imaging time limits its broad research and clinical applications.
By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack.
Ranked #1 on Multi-Object Tracking on MOT20 (using extra training data)
Interactions between power and gas systems, which are both large and complex, have been gradually intensified during the last decades, predominantly due to the propagation of large fleet natural gas-fired power units (GPUs) and the technological developments of power-to-gas (P2G) facilities.
In the realm of urban transportation, metro systems serve as crucial and sustainable means of public transit.
Second, we propose an modeling evaluation method based on HPMB for object-level modeling to overcome this limitation.
The core of this dataset is a blending optimization process, which corrects for the pose as it drifts and is affected by the magnetic conditions.
Fortunately, we have identified two observations that help us achieve the best of both worlds: 1) query-based methods demonstrate superiority over dense proposal-based methods in open-world instance segmentation, and 2) learning localization cues is sufficient for open world instance segmentation.
Our experiments show that (a) learning sample-wise gamma at continuous space can effectively perform calibration; (b) SECE smoothly optimise gamma-net towards better robustness to binning schemes; (c) the combination of gamma-net and SECE achieve the best calibration performance across various calibration metrics and retain very competitive predictive performance as compared to multiple recently proposed methods on three datasets.
We present SLOPER4D, a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction in the wild.
Performing ATP bioluminescence causes cell lysis of organoids, so it is impossible to observe organoids' long-term viability changes after medication continually.
It delivers the class Relevance to the activated neurons in the intermediate layers in a back-propagation manner, and associates the activation of neurons with the input points to visualize the hidden semantics of each layer.
Adversarial examples are beneficial to improve the robustness of the 3D neural network model and enhance the stability of the AR system.
We present CEMA: Causal Explanations in Multi-Agent systems; a general framework to create causal explanations for an agent's decisions in sequential multi-agent systems.
It aims to infer knowledge for (the things at) unobserved locations using the data from (the things at) observed locations during a given time period of interest.
In this work, we propose a novel LiDAR localization framework, SGLoc, which decouples the pose estimation to point cloud correspondence regression and pose estimation via this correspondence.
We aim to address both shortcomings with a class of recurrent networks that use a stochastic state transition mechanism between cell applications.
Specific emitter identification (SEI) is a potential physical layer authentication technology, which is one of the most critical complements of upper layer authentication.
Second, constrained by the far-distance in data distribution of the sampled clients, we further minimize the variance of the numbers of times that the clients are sampled, to mitigate long-term bias.
However, it is difficult to apply such networks to 3D object detection in autonomous driving due to its large computation cost and slow reasoning speed.
Reinforcement learning (RL) operating on attack graphs leveraging cyber terrain principles are used to develop reward and state associated with determination of surveillance detection routes (SDR).
Task-oriented dialogue systems in industry settings need to have high conversational capability, be easily adaptable to changing situations and conform to business constraints.
To address these issues, we introduce hyper surface fitting to implicitly learn hyper surfaces, which are represented by multi-layer perceptron (MLP) layers that take point features as input and output surface patterns in a high dimensional feature space.
Ranked #2 on Surface Normals Estimation on PCPNet
Herein, we develop a new model called KG-MTT-BERT (Knowledge Graph Enhanced Multi-Type Text BERT) by extending the BERT model for long and multi-type text with the integration of the medical knowledge graph.
3 code implementations • 2 Aug 2022 • Ibrahim H. Ahmed, Cillian Brewitt, Ignacio Carlucho, Filippos Christianos, Mhairi Dunion, Elliot Fosong, Samuel Garcin, Shangmin Guo, Balint Gyevnar, Trevor McInroe, Georgios Papoudakis, Arrasy Rahman, Lukas Schäfer, Massimiliano Tamborski, Giuseppe Vecchio, Cheng Wang, Stefano V. Albrecht
The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning.
Goal recognition (GR) involves inferring the goals of other vehicles, such as a certain junction exit, which can enable more accurate prediction of their future behaviour.
Relevant recommendation is a special recommendation scenario which provides relevant items when users express interests on one target item (e. g., click, like and purchase).
In the second part, we review what is knows about the different new non-volatile memory materials and devices suited for compute in-memory, and discuss the outlook and challenges.
Quantitative and qualitative experiments show that our method outperforms the techniques based only on RGB images.
We propose Human-centered 4D Scene Capture (HSC4D) to accurately and efficiently create a dynamic digital world, containing large-scale indoor-outdoor scenes, diverse human motions, and rich interactions between humans and environments.
Despite the usefulness of this service, we consider that recommending courses to users directly may neglect their varying degrees of expertise.
Human mobility data contains rich but abundant information, which yields to the comprehensive region embeddings for cross domain tasks.
The granular-ball rough set can simultaneously represent Pawlak rough sets, and the neighborhood rough set, so as to realize the unified representation of the two.
To implement this framework, we design both coarse-grained and fine-grained procedures for modeling user preference, where the former focuses on more general, coarse-grained semantic fusion and the latter focuses on more specific, fine-grained semantic fusion.
Ranked #1 on Recommendation Systems on ReDial
In this work, we propose a novel way to enable transformers to have the capability of uncertainty estimation and, meanwhile, retain the original predictive performance.
Datasets containing both categorical and continuous variables are frequently encountered in many areas, and with the rapid development of modern measurement technologies, the dimensions of these variables can be very high.
However, this approach failed to explicitly reflect the correlations between different nodes at different time steps, thus limiting the learning capability of graph neural networks.
By integrating the spatial features from each cardiac frame of the gated MPS and the temporal features from the sequential cardiac frames of the gated MPS, we developed a Spatial-Temporal V-Net (ST-VNet) for automatic extraction of RV endocardial and epicardial contours.
Representation learning on temporal interaction graphs (TIG) is to model complex networks with the dynamic evolution of interactions arising in a broad spectrum of problems.
To address these challenges, we propose a novel neuron model that has cosine activation with a time varying component for sequential processing.
Large solar power stations usually locate in remote areas and connect to the main grid via a long transmission line.
Mitotic figure count is an important marker of tumor proliferation and has been shown to be associated with patients' prognosis.
The reconstructed volumetric images convincingly demonstrate the merits of the SMART system using the AI-empowered interior tomography approach, enabling cardiac micro-CT with the unprecedented temporal resolution of 30ms, which is an order of magnitude higher than the state of the art.
Due to the lack of insight in industrial application, existing methods on open datasets neglect the camera pose information, which inevitably results in the detector being susceptible to camera extrinsic parameters.
Ranked #6 on Monocular 3D Object Detection on KITTI Cars Moderate (using extra training data)
Real-world machine learning systems are achieving remarkable performance in terms of coarse-grained metrics like overall accuracy and F-1 score.
As there is a lack of 3D point clouds datasets related to the fine-grained building facade, we construct the first large-scale building facade point clouds benchmark dataset for semantic segmentation.
We have been witnessing the usefulness of conversational AI systems such as Siri and Alexa, directly impacting our daily lives.
Being one of the most popular generative framework, variational autoencoders(VAE) are known to suffer from a phenomenon termed posterior collapse, i. e. the latent variational distributions collapse to the prior, especially when a strong decoder network is used.
no code implementations • 9 Mar 2021 • Xian Sun, Peijin Wang, Zhiyuan Yan, Feng Xu, Ruiping Wang, Wenhui Diao, Jin Chen, Jihao Li, Yingchao Feng, Tao Xu, Martin Weinmann, Stefan Hinz, Cheng Wang, Kun fu
In this paper, we propose a novel benchmark dataset with more than 1 million instances and more than 15, 000 images for Fine-grAined object recognItion in high-Resolution remote sensing imagery which is named as FAIR1M.
We find that the dynamic speckle patterns can be utilized to decouple the singly and multiply backscattered components.
Optics Applied Physics
Lithium niobate on insulator (LNOI) is an emerging photonic platform with great promises for future optical communications, nonlinear optics and microwave photonics.
Optics Applied Physics
In the second order numerical scheme, the BDF temporal stencil is applied, and an alternate convex-concave decomposition is derived, so that the concave part corresponds to a quadratic energy.
Numerical Analysis Numerical Analysis 35A15 (Primary) 12-XX, 12-08 (Secondary) F.2.2; G.2
We successfully perform the three-dimensional tracking in a turbulent fluid flow of small asymmetrical particles that are neutrally-buoyant and bottom-heavy, i. e., they have a non-homogeneous mass distribution along their symmetry axis.
Uncertainty quantification is crucial for building reliable and trustable machine learning systems.
Moreover, a new refiner module is also presented to preserve the vehicle details from inputs and refine the complete outputs with fine-grained information.
In the re-ID stage, we predict identity labels of detected bounding boxes, and use these examples to construct a more practical mixed train set for the DRA model.
The error correcting performance of multi-level-cell (MLC) NAND flash memory is closely related to the block length of error correcting codes (ECCs) and log-likelihood-ratios (LLRs) of the read-voltage thresholds.
Deep learning is now the most powerful tool for data processing in computer vision, becoming the most preferred technique for tasks such as classification, segmentation, and detection.
Point2Node can dynamically explore correlation among all graph nodes from different levels, and adaptively aggregate the learned features.
Between the encoder and the decoder, a transform attention layer is applied to convert the encoded traffic features to generate the sequence representations of future time steps as the input of the decoder.
Ranked #2 on Image Dehazing on KITTI
In this paper, we present a simple but effective supervoxel segmentation method for point clouds, which formalizes supervoxel segmentation as a subset selection problem.
Motivated by this gap, we proposed a variant of the Barzilai-Borwein (BB) method, referred to as the Random Barzilai-Borwein (RBB) method, to calculate step size for SARAH in the mini-batch setting, thereby leading to a new SARAH method: MB-SARAH-RBB.
Second, to accurately extract trees from all point clouds, we propose a 3D deep learning network, PointNLM, to semantically segment tree crowns.
This paper proposes a new end-to-end trainable matching network based on receptive field, RF-Net, to compute sparse correspondence between images.
Man-made environments typically comprise planar structures that exhibit numerous geometric relationships, such as parallelism, coplanarity, and orthogonality.
We present a novel deep convolutional network pipeline, LO-Net, for real-time lidar odometry estimation.
Although various transfer learning methods have shown promising performance in this context, our proposed novel method RecSys-DAN focuses on alleviating the cross-domain and within-domain data sparsity and data imbalance and learns transferable latent representations for users, items and their interactions.
We aim to address both shortcomings with a class of recurrent networks that use a stochastic state transition mechanism between cell applications.
Tabular data extraction from reports and other published data in PDF format is of interest for various data consolidation purposes such as analysing and aggregating financial reports of a company.
The database contains 274 CT angiography (CTA) scans from 274 unique TBAD patients and is split into a training set(254 cases including 210 preoperative and 44 postoperative scans ) and a test set(20 cases). Based on STENT, we develop a series of methods including automated TBAD segmentation and automated measurement of TBAD parameters that facilitate personalized and precise management of the disease.
It is widely recognized that the deeper networks or networks with more feature maps have better performance.
The estimation of high dimensional precision matrices has been a central topic in statistical learning.
To forecast the traffic flow across a wide area and overcome the mentioned challenges, we design and propose a promising forecasting model called Layerwise Recurrent Autoencoder (LRA), in which a three-layer stacked autoencoder (SAE) architecture is used to obtain temporal traffic correlations and a recurrent neural networks (RNNs) model for prediction.
This paper deals with the geometric multi-model fitting from noisy, unstructured point set data (e. g., laser scanned point clouds).
Then, the image together with the retrieved shape model is fed into the proposed network to generate the fine-grained 3D point cloud.
We propose a novel deep network called Mancs that solves the person re-identification problem from the following aspects: fully utilizing the attention mechanism for the person misalignment problem and properly sampling for the ranking loss to obtain more stable person representation.
We study the problem of unsupervised domain adaptive re-identification (re-ID) which is an active topic in computer vision but lacks a theoretical foundation.
Ranked #15 on Unsupervised Domain Adaptation on Market to Duke
Learning autonomous-driving policies is one of the most challenging but promising tasks for computer vision.
Then, based on the VEM, we proposed the concept of the Visual Recognizability Field (VRF) to reflect the visual recognizability distribution in 3D space and established a Visual Recognizability Evaluation Model (VREM) to measure a traffic sign visual recognizability for a given viewpoint.
However, most geometric model fitting methods are unable to fit an arbitrary geometric model (e. g. a surface with holes) to incomplete data, due to that the similarity metrics used in these methods are unable to measure the rigid partial similarity between arbitrary models.
This work presents an end-to-end trainable deep bidirectional LSTM (Long-Short Term Memory) model for image captioning.