Recovering realistic textures from a largely down-sampled low resolution (LR) image with complicated patterns is a challenging problem in image super-resolution.
Such training objective is sub-optimal when the target sequence is not perfect, e. g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available.
Building on this, we design a visual program that consists of three types of modules, i. e., view-independent, view-dependent, and functional modules.
Semantic communication is widely touted as a key technology for propelling the sixth-generation (6G) wireless networks.
In this article, we investigate applications of GAI in the physical layer and analyze its support for integrated sensing and communications (ISAC) systems.
Specifically, we use a small network similar to NeRF while preserving the rendering speed with a single network forwarding per pixel as in NeLF.
The high-accuracy and resource-intensive deep neural networks (DNNs) have been widely adopted by live video analytics (VA), where camera videos are streamed over the network to resource-rich edge/cloud servers for DNN inference.
On the one hand, each query is generated based on 2D lane-aware features and adopts a hybrid embedding to enhance lane information.
Ranked #1 on 3D Lane Detection on OpenLane
In this paper, we introduce a realistic and challenging domain adaptation problem called Universal Semi-supervised Model Adaptation (USMA), which i) requires only a pre-trained source model, ii) allows the source and target domain to have different label sets, i. e., they share a common label set and hold their own private label set, and iii) requires only a few labeled samples in each class of the target domain.
In particular, we consider two scenarios with best-effort and error-constrained computation tasks, with the objectives of minimizing the average computation mean squared error (MSE) and the computation outage probability over the multiple subcarriers, respectively.
High-quality data is essential for conversational recommendation systems and serves as the cornerstone of the network architecture development and training strategy design.
YONA fully exploits the information of one previous adjacent frame and conducts polyp detection on the current frame without multi-frame collaborations.
Unmanned Aerial Vehicle (UAV)-mounted edge devices are particularly advantageous for FEEL due to their flexibility and mobility in efficient data collection.
In this paper, we propose A2EHV, an Automated Alignment Evaluation with a Heterogeneous Value system that (1) is automated to minimize individual human biases, and (2) allows assessments against various target values to foster heterogeneous agents.
Despite the simplicity, stochastic gradient descent (SGD)-like algorithms are successful in training deep neural networks (DNNs).
Thus, we propose a new task, SCoDA, for the domain adaptation of real scan shape completion from synthetic data.
The traditional methods for data compression are typically based on the symbol-level statistics, with the information source modeled as a long sequence of i. i. d.
Due to the motion-centric nature, our method shows its impressive generalizability with limited training labels and provides good differentiability for end-to-end cycle training.
no code implementations • • Xianggang Yu, Mutian Xu, Yidan Zhang, Haolin Liu, Chongjie Ye, Yushuang Wu, Zizheng Yan, Chenming Zhu, Zhangyang Xiong, Tianyou Liang, GuanYing Chen, Shuguang Cui, Xiaoguang Han
The birth of ImageNet drives a remarkable trend of "learning from large-scale data" in computer vision.
Based on this hybrid representation, we propose a fast optimization NeRF variant, called GP-NeRF, that achieves better rendering results while maintaining a compact model size.
We firmly think an intermediate representation is essential, but we argue that orientation map using the dominant filtering-based methods is sensitive to uncertain noise and far from a competent representation.
Specifically, we bridge the latent space of Get3DHuman with that of StyleGAN-Human via a specially-designed prior network, where the input latent code is mapped to the shape and texture feature volumes spanned by the pixel-aligned 3D reconstructor.
To tackle these issues, we propose an adaptive context selection based encoder-decoder framework which is composed of Local Context Attention (LCA) module, Global Context Module (GCM) and Adaptive Selection Module (ASM).
In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions.
In practice, the proposed BEV@DC model comprehensively takes advantage of LiDARs with rich geometric details in training, employing an enhanced depth completion manner in inference, which takes only images (RGB and depth) as input.
The mainstream of the existing approaches for video prediction builds up their models based on a Single-In-Single-Out (SISO) architecture, which takes the current frame as input to predict the next frame in a recursive manner.
In practice, box annotations are applied to alleviate the over-fitting issue of previous polyp segmentation models, which generate fine-grained polyp area through the iterative boosted segmentation model.
Moreover, to improve the 2D classifier in the target domain, we perform domain-invariant geometric adaptation from source to target and unify the 2D semantic and 3D geometric segmentation results in two domains.
Then, the noised previous state is used as the input to learn to predict the current state, improving the model's ability to update and correct slot values.
The framework introduces two paradigms for the optimization of meta-parameters: a centralized paradigm that simplifies the process by sharing data from all historical environments, and a distributed paradigm that maintains data privacy by training meta-parameters for each specific environment separately.
The unlabeled data of the DST task is incorporated into the self-training iterations, where the pseudo labels are predicted by a DST model trained on limited labeled data in advance.
Specifically, this paper introduces a simple but effective point cloud cross-modality training (PointCMT) strategy, which utilizes view-images, i. e., rendered or projected 2D images of the 3D object, to boost point cloud analysis.
Ranked #10 on 3D Point Cloud Classification on ModelNet40
In 3D medical image segmentation, small targets segmentation is crucial for diagnosis but still faces challenges.
Compared to model-free RL, this model-based RL approach leverages the derived mathematical characterization of the FL training process to discover an effective device selection and quantization scheme without imposing additional device communication overhead.
Estimating accurate lane lines in 3D space remains challenging due to their sparse and slim nature.
Ranked #5 on 3D Lane Detection on OpenLane
Hence, the BS must select an appropriate resource block for each user as well as determine and transmit part of the semantic information to the users.
This dynamically personalized FL technique incentivizes clients to participate in personalizing local models while allowing the adoption of the global model when it performs better.
To accelerate the training process, we propose a truncated vertical federated learning (T-VFL) algorithm, where the training latency is highly reduced by integrating the standard VFL algorithm with a channel-aware user scheduling policy.
This paper proposes a new efficient approach for composable text operations in the compact latent space of text.
Ranked #2 on Unsupervised Text Style Transfer on Yelp
We present a new framework to reconstruct holistic 3D indoor scenes including both room background and indoor objects from single-view images.
Recent progress in 3D scene understanding has explored visual grounding (3DVG) to localize a target object through a language description.
This paper studies a new multi-device edge artificial-intelligent (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC) to enable low-latency intelligent services at the network edge.
This article will discuss about the possibility of exploiting the future sixth-generation (6G) cellular network to realize ISAC.
In this paper, we propose a service delay efficient FL (SDEFL) scheme over mobile devices.
Semi-supervised domain adaptation (SSDA) aims to apply knowledge learned from a fully labeled source domain to a scarcely labeled target domain.
Ranked #2 on Semi-supervised Domain Adaptation on VisDA2017
Experimental results on $4, 773$ dental models have shown our DArch can accurately segment each tooth of a dental model, and its performance is superior to the state-of-the-art methods.
3D single object tracking (3D SOT) in LiDAR point clouds plays a crucial role in autonomous driving.
Ranked #1 on Object Tracking on KITTI
Thus, a more faithful caption can be generated only using point clouds during the inference.
Semantic segmentation of point cloud usually relies on dense annotation that is exhausting and costly, so it attracts wide attention to investigate solutions for the weakly supervised scheme with only sparse points annotated.
We present PVSeRF, a learning framework that reconstructs neural radiance fields from single-view RGB images, for novel view synthesis.
This letter studies a vertical federated edge learning (FEEL) system for collaborative objects/human motion recognition by exploiting the distributed integrated sensing and communication (ISAC).
Experimental results on 4, 773 dental models have shown our DArch can accurately segment each tooth of a dental model, and its performance is superior to the state-of-the-art methods.
Inspired by the fact that X-ray has a strong penetrating power to see through the bag and overlapping objects, we propose to perform waste inspection efficiently using X-ray images without the need to open the bag.
To tackle this problem, we propose the CLEVR3D, a large-scale VQA-3D dataset consisting of 171K questions from 8, 771 3D scenes.
In this paper, we design a federated two-stage learning framework that augments prototypical federated learning with a cut layer on devices and uses sign-based stochastic gradient descent with the majority vote method on model updates.
Recent advances in unsupervised domain adaptation have achieved remarkable performance on semantic segmentation tasks.
First, a MU-MC-VLC system model is established, and then a sum-rate maximization problem under dimming level and illumination uniformity constraints is formulated.
A novel two-phase sensing framework is proposed to localize the passive targets that cannot transmit/receive reference signals to/from the base stations (BSs), where the ranges of the targets are estimated based on their reflected OFDM signals to the BSs in Phase I, and the location of each target is estimated based on its ranges to different BSs in Phase II.
Medical imaging technologies, including computed tomography (CT) or chest X-Ray (CXR), are largely employed to facilitate the diagnosis of the COVID-19.
Current 3D single object tracking approaches track the target based on a feature comparison between the target template and the search area.
Ranked #2 on Object Tracking on KITTI
Thus, we propose a novel framework based on a teacher-student architecture for the accurate colorectal polyp classification (CPC) through directly using white-light (WL) colonoscopy images in the examination.
In practice, our SPOL model first generates the CAMs through a novel element-wise multiplication of shallow and deep feature maps, which filters the background noise and generates sharper boundaries robustly.
To address the above issues, we propose the Shallow Attention Network (SANet) for polyp segmentation.
Ranked #9 on Video Polyp Segmentation on SUN-SEG-Easy (Unseen)
Building on the analytical result, an optimized probabilistic scheduling policy is derived in closed-form by solving the approximate communication time minimization problem.
Despite video forecasting has been a widely explored topic in recent years, the mainstream of the existing work still limits their models with a single prediction space but completely neglects the way to leverage their model with multi-prediction spaces.
In this paper, we raise the problem of HCC segmentation in DSA videos, and build our own DSA dataset.
Sampling, grouping, and aggregation are three important components in the multi-scale analysis of point clouds.
Such training objective is sub-optimal when the target sequence is not perfect, e. g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available.
Training in heterogeneous and potentially massive networks introduces bias into the system, which is originated from the non-IID data and the low participation rate in reality.
Existing conversational recommendation (CR) systems usually suffer from insufficient item information when conducted on short dialogue history and unfamiliar items.
In this paper, we address a fundamental problem in PCSR: How to downsample the dense point cloud with arbitrary scales while preserving the local topology of discarding points in a case-agnostic manner (i. e. without additional storage for point relationship)?
In this paper, the problem of minimizing the weighted sum of age of information (AoI) and total energy consumption of Internet of Things (IoT) devices is studied.
Empowered by big data and machine learning, next-generation data-driven communication systems will be intelligent with the characteristics of expressiveness, scalability, interpretability, and especially uncertainty modeling, which can confidently involve diversified latent demands and personalized services in the foreseeable future.
Then, the problem of user selection and bandwidth allocation is studied for FL implemented over a hybrid VLC/RF system aiming to optimize the FL performance.
Compared with the visual grounding on 2D images, the natural-language-guided 3D object localization on point clouds is more challenging.
Cellular Internet of Things (IoT) is considered as de facto paradigm to improve the communication and computation systems.
Networking and Internet Architecture
The problem is formulated as an optimization problem whose goal is to maximize the reliability of the VR network by selecting the appropriate VAPs to be turned on and controlling the user association with SBSs.
To this end, the fundamentals of this framework are first introduced.
The key point of language-guided person search is to construct the cross-modal association between visual and textual input.
In practice, an initial semantic segmentation (SS) of a single sweep point cloud can be achieved by any appealing network and then flows into the semantic scene completion (SSC) module as the input.
Ranked #3 on 3D Semantic Scene Completion on SemanticKITTI
Analytical results show that, the proposed VD-RL algorithm is guaranteed to converge to a local optimal solution of the non-convex optimization problem.
Existing works usually estimate the missing shape by decoding a latent feature encoded from the input points.
Since the data size of each computational task is different, as the requested computational task varies, the BSs must adjust their resource (subcarrier and transmit power) and task allocation schemes to effectively serve the users.
In this paper, the problem of delay minimization for federated learning (FL) over wireless communication networks is investigated.
Graph convolutional networks (GCN) have been recently utilized to extract the underlying structures of datasets with some labeled data and high-dimensional features.
We show that combining universal vector quantization methods with FL yields a decentralized training system in which the compression of the trained models induces only a minimum distortion.
However, due to resource constraints and privacy challenges, edge IoT devices may not be able to transmit their collected data to a central controller for training machine learning models.
Meanwhile, the probability that the DBS serves over 50% of user requests increases about 27%, compared to the baseline policy gradient algorithm.
In this network, multiple RISs are spatially distributed to serve wireless users and the energy efficiency of the network is maximized by dynamically controlling the on-off status of each RIS as well as optimizing the reflection coefficients matrix of the RISs.
High-fidelity clothing reconstruction is the key to achieving photorealism in a wide range of applications including human digitization, virtual try-on, etc.
Although occlusion widely exists in nature and remains a fundamental challenge for pose estimation, existing heatmap-based approaches suffer serious degradation on occlusions.
This problem is posed as an optimization problem whose goal is to minimize the energy and time consumption for task computing and transmission by adjusting the user association, service sequence, and task allocation scheme.
Extensive experiments verify the robustness and superiority of our approach in point clouds processing tasks regardless of synthesis data, indoor data, and outdoor data with or without noise.
Ranked #25 on Semantic Segmentation on S3DIS
The marriage of wireless big data and machine learning techniques revolutionizes the wireless system by the data-driven philosophy.
In this paper, we investigate how to manipulate the coefficients obtained via linear regression by adding carefully designed poisoning data points to the dataset or modify the original data points.
We introduce FPConv, a novel surface-style convolution operator designed for 3D point cloud analysis.
Artificial intelligence (AI) assisted unmanned aerial vehicle (UAV) aided next-generation networking is proposed for dynamic environments.
Due to the limited number of resource blocks (RBs) in a wireless network, only a subset of users can be selected to transmit their local FL model parameters to the BS at each learning step.
Hypergraph spectral analysis has emerged as an effective tool processing complex data structures in data analysis.
Along with increasingly popular virtual reality applications, the three-dimensional (3D) point cloud has become a fundamental data structure to characterize 3D objects and surroundings.
This joint learning, wireless resource allocation, and user selection problem is formulated as an optimization problem whose goal is to minimize an FL loss function that captures the performance of the FL algorithm.
We first characterize the optimal rank-one attack strategy that maximizes the subspace distance between the subspace learned from the original data matrix and that learned from the modified data matrix.
The recent success of single-agent reinforcement learning (RL) in Internet of things (IoT) systems motivates the study of multi-agent reinforcement learning (MARL), which is more challenging but more useful in large-scale IoT.
Hyper-parameter optimization remains as the core issue of Gaussian process (GP) for machine learning nowadays.
Third, this work proposes an offline-evaluation based safeguard mechanism to ensure that the online system can always operate with the optimal and well-trained MLB policy, which not only stabilizes the online performance but also enables the exploration beyond current policies to make full use of machine learning in a safe way.
Given a single depth image, our method first goes through the 3D volume branch to obtain a volumetric scene reconstruction as a guide to the next view inpainting step, which attempts to make up the missing information; the third step involves projecting the volume under the same view of the input, concatenating them to complete the current view depth, and integrating all depth into the point cloud.
First, to the best of our knowledge, this paper is the first to empower GP regression with the alternating direction method of multipliers (ADMM) for parallel hyper-parameter optimization in the training phase, where such a scalable training framework well balances the local estimation in baseband units (BBUs) and information consensus among BBUs in a principled way for large-scale executions.
Due to high-resolution and small-size lesion regions, applying existing methods, such as U-Nets, to perform segmentation on fundus photography is very challenging.
Channel interpolation is an essential technique for providing high-accuracy estimation of the channel state information (CSI) for wireless systems design where the frequency-space structural correlations of multi-antenna channel are typically hidden in matrix or tensor forms.
Timely and accurate knowledge of channel state information (CSI) is necessary to support scheduling operations at both physical and network layers.
Similarity search is a fundamental problem in computing science with various applications and has attracted significant research attention, especially in large-scale search with high dimensions.
To construct the mapping between 2D sketches and a vertex-wise scaling field, a novel deep learning architecture is developed.
Submodular maximization problems belong to the family of combinatorial optimization problems and enjoy wide applications.
For the case when the underlying interaction graph is known to be acyclic, it is shown that a simple algorithm that is based on a maximum-weight spanning tree with respect to the plug-in estimates of the influences not only has strong theoretical performance guarantees, but can also outperform generic feature selection algorithms for recovering the interaction graph from i. i. d.
How to learn such a BN structure is a long standing issue, not fully understood even in the statistical learning community.