Recovering realistic textures from a largely down-sampled low resolution (LR) image with complicated patterns is a challenging problem in image super-resolution.
Semi-supervised domain adaptation (SSDA) aims to apply knowledge learned from a fully labeled source domain to a scarcely labeled target domain.
Experimental results on $4, 773$ dental models have shown our DArch can accurately segment each tooth of a dental model, and its performance is superior to the state-of-the-art methods.
3D single object tracking (3D SOT) in LiDAR point clouds plays a crucial role in autonomous driving.
Thus, a more faithful caption can be generated only using point clouds during the inference.
Semantic segmentation of point cloud usually relies on dense annotation that is exhausting and costly, so it attracts wide attention to investigate solutions for the weakly supervised scheme with only sparse points annotated.
We present PVSeRF, a learning framework that reconstructs neural radiance fields from single-view RGB images, for novel view synthesis.
This letter studies a vertical federated edge learning (FEEL) system for collaborative objects/human motion recognition by exploiting the distributed integrated sensing and communication (ISAC).
To tackle this problem, the first VQA-3D dataset, namely CLEVR3D, is proposed, which contains 60K questions in 1, 129 real-world scenes.
In this paper, we design a federated two-stage learning framework that augments prototypical federated learning with a cut layer on devices and uses sign-based stochastic gradient descent with the majority vote method on model updates.
Recent advances in unsupervised domain adaptation have achieved remarkable performance on semantic segmentation tasks.
First, a MU-MC-VLC system model is established, and then a sum-rate maximization problem under dimming level and illumination uniformity constraints is formulated.
A novel two-phase sensing framework is proposed to localize the passive targets that cannot transmit/receive reference signals to/from the base stations (BSs), where the ranges of the targets are estimated based on their reflected OFDM signals to the BSs in Phase I, and the location of each target is estimated based on its ranges to different BSs in Phase II.
Medical imaging technologies, including computed tomography (CT) or chest X-Ray (CXR), are largely employed to facilitate the diagnosis of the COVID-19.
Current 3D single object tracking approaches track the target based on a feature comparison between the target template and the search area.
Thus, we propose a novel framework based on a teacher-student architecture for the accurate colorectal polyp classification (CPC) through directly using white-light (WL) colonoscopy images in the examination.
In practice, our SPOL model first generates the CAMs through a novel element-wise multiplication of shallow and deep feature maps, which filters the background noise and generates sharper boundaries robustly.
To address the above issues, we propose the Shallow Attention Network (SANet) for polyp segmentation.
Ranked #8 on Video Polyp Segmentation on SUN-SEG-Easy
Building on the analytical result, an optimized probabilistic scheduling policy is derived in closed-form by solving the approximate communication time minimization problem.
Despite video forecasting has been a widely explored topic in recent years, the mainstream of the existing work still limits their models with a single prediction space but completely neglects the way to leverage their model with multi-prediction spaces.
Sampling, grouping, and aggregation are three important components in the multi-scale analysis of point clouds.
In this paper, we raise the problem of HCC segmentation in DSA videos, and build our own DSA dataset.
Such training objective is sub-optimal when the target sequence is not perfect, e. g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available.
Training in heterogeneous and potentially massive networks introduces bias into the system, which is originated from the non-IID data and the low participation rate in reality.
Existing conversational recommendation (CR) systems usually suffer from insufficient item information when conducted on short dialogue history and unfamiliar items.
In this paper, we address a fundamental problem in PCSR: How to downsample the dense point cloud with arbitrary scales while preserving the local topology of discarding points in a case-agnostic manner (i. e. without additional storage for point relationship)?
In this paper, the problem of minimizing the weighted sum of age of information (AoI) and total energy consumption of Internet of Things (IoT) devices is studied.
Empowered by big data and machine learning, next-generation data-driven communication systems will be intelligent with the characteristics of expressiveness, scalability, interpretability, and especially uncertainty modeling, which can confidently involve diversified latent demands and personalized services in the foreseeable future.
Then, the problem of user selection and bandwidth allocation is studied for FL implemented over a hybrid VLC/RF system aiming to optimize the FL performance.
Compared with the visual grounding on 2D images, the natural-language-guided 3D object localization on point clouds is more challenging.
The problem is formulated as an optimization problem whose goal is to maximize the reliability of the VR network by selecting the appropriate VAPs to be turned on and controlling the user association with SBSs.
Cellular Internet of Things (IoT) is considered as de facto paradigm to improve the communication and computation systems.
Networking and Internet Architecture
To this end, the fundamentals of this framework are first introduced.
The key point of language-guided person search is to construct the cross-modal association between visual and textual input.
In practice, an initial semantic segmentation (SS) of a single sweep point cloud can be achieved by any appealing network and then flows into the semantic scene completion (SSC) module as the input.
Analytical results show that, the proposed VD-RL algorithm is guaranteed to converge to a local optimal solution of the non-convex optimization problem.
Existing works usually estimate the missing shape by decoding a latent feature encoded from the input points.
Since the data size of each computational task is different, as the requested computational task varies, the BSs must adjust their resource (subcarrier and transmit power) and task allocation schemes to effectively serve the users.
In this paper, the problem of delay minimization for federated learning (FL) over wireless communication networks is investigated.
Graph convolutional networks (GCN) have been recently utilized to extract the underlying structures of datasets with some labeled data and high-dimensional features.
We show that combining universal vector quantization methods with FL yields a decentralized training system in which the compression of the trained models induces only a minimum distortion.
However, due to resource constraints and privacy challenges, edge IoT devices may not be able to transmit their collected data to a central controller for training machine learning models.
Meanwhile, the probability that the DBS serves over 50% of user requests increases about 27%, compared to the baseline policy gradient algorithm.
In this network, multiple RISs are spatially distributed to serve wireless users and the energy efficiency of the network is maximized by dynamically controlling the on-off status of each RIS as well as optimizing the reflection coefficients matrix of the RISs.
High-fidelity clothing reconstruction is the key to achieving photorealism in a wide range of applications including human digitization, virtual try-on, etc.
Although occlusion widely exists in nature and remains a fundamental challenge for pose estimation, existing heatmap-based approaches suffer serious degradation on occlusions.
This problem is posed as an optimization problem whose goal is to minimize the energy and time consumption for task computing and transmission by adjusting the user association, service sequence, and task allocation scheme.
Extensive experiments verify the robustness and superiority of our approach in point clouds processing tasks regardless of synthesis data, indoor data, and outdoor data with or without noise.
The marriage of wireless big data and machine learning techniques revolutionizes the wireless system by the data-driven philosophy.
In this paper, we investigate how to manipulate the coefficients obtained via linear regression by adding carefully designed poisoning data points to the dataset or modify the original data points.
We introduce FPConv, a novel surface-style convolution operator designed for 3D point cloud analysis.
Artificial intelligence (AI) assisted unmanned aerial vehicle (UAV) aided next-generation networking is proposed for dynamic environments.
Due to the limited number of resource blocks (RBs) in a wireless network, only a subset of users can be selected to transmit their local FL model parameters to the BS at each learning step.
Along with increasingly popular virtual reality applications, the three-dimensional (3D) point cloud has become a fundamental data structure to characterize 3D objects and surroundings.
This joint learning, wireless resource allocation, and user selection problem is formulated as an optimization problem whose goal is to minimize an FL loss function that captures the performance of the FL algorithm.
We first characterize the optimal rank-one attack strategy that maximizes the subspace distance between the subspace learned from the original data matrix and that learned from the modified data matrix.
The recent success of single-agent reinforcement learning (RL) in Internet of things (IoT) systems motivates the study of multi-agent reinforcement learning (MARL), which is more challenging but more useful in large-scale IoT.
Hyper-parameter optimization remains as the core issue of Gaussian process (GP) for machine learning nowadays.
Third, this work proposes an offline-evaluation based safeguard mechanism to ensure that the online system can always operate with the optimal and well-trained MLB policy, which not only stabilizes the online performance but also enables the exploration beyond current policies to make full use of machine learning in a safe way.
Given a single depth image, our method first goes through the 3D volume branch to obtain a volumetric scene reconstruction as a guide to the next view inpainting step, which attempts to make up the missing information; the third step involves projecting the volume under the same view of the input, concatenating them to complete the current view depth, and integrating all depth into the point cloud.
First, to the best of our knowledge, this paper is the first to empower GP regression with the alternating direction method of multipliers (ADMM) for parallel hyper-parameter optimization in the training phase, where such a scalable training framework well balances the local estimation in baseband units (BBUs) and information consensus among BBUs in a principled way for large-scale executions.
Due to high-resolution and small-size lesion regions, applying existing methods, such as U-Nets, to perform segmentation on fundus photography is very challenging.
Channel interpolation is an essential technique for providing high-accuracy estimation of the channel state information (CSI) for wireless systems design where the frequency-space structural correlations of multi-antenna channel are typically hidden in matrix or tensor forms.
Timely and accurate knowledge of channel state information (CSI) is necessary to support scheduling operations at both physical and network layers.
Similarity search is a fundamental problem in computing science with various applications and has attracted significant research attention, especially in large-scale search with high dimensions.
To construct the mapping between 2D sketches and a vertex-wise scaling field, a novel deep learning architecture is developed.
Submodular maximization problems belong to the family of combinatorial optimization problems and enjoy wide applications.
For the case when the underlying interaction graph is known to be acyclic, it is shown that a simple algorithm that is based on a maximum-weight spanning tree with respect to the plug-in estimates of the influences not only has strong theoretical performance guarantees, but can also outperform generic feature selection algorithms for recovering the interaction graph from i. i. d.
How to learn such a BN structure is a long standing issue, not fully understood even in the statistical learning community.