To tackle this issue, we conduct an in-depth analysis of the performance degradation observed in existing parallel context models, focusing on two aspects: the Quantity and Quality of information utilized for context prediction and decoding.
However, little research has been done on the quality assessment of textured meshes, which hinders the development of quality-oriented applications, such as mesh compression and enhancement.
In this paper, we focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding.
A two-step patch cropping algorithm and a patch texture mapping module refine the size of 1-hop geodesic patches and build the relationship between the mesh geometry and color information, resulting in the generation of 1-hop textured geodesic patches.
Dynamic colored meshes (DCM) are widely used in various applications; however, these meshes may undergo different processes, such as compression or transmission, which can distort them and degrade their quality.
Static meshes with texture map are widely used in modern industrial and manufacturing sectors, attracting considerable attention in the mesh compression community due to its huge amount of data.
However, most previous studies mainly concentrate on enhancing token-level semantic information to alleviate the representations entanglement problem, rather than composing and using the syntactic and semantic representations of sequences appropriately as humans do.
Second, to reduce the significant domain discrepancy, we establish an intermediate domain, the description domain, based on insights from subjective experiments, by considering the domain relevance among samples located in the perception domain and learning a structured latent space.
Learned Image Compression (LIC) has recently become the trending technique for image transmission due to its notable performance.
There is mounting evidence that one of the reasons hindering CG is the representation of the encoder uppermost layer is entangled, i. e., the syntactic and semantic representations of sequences are entangled.
To reduce the negative impact of panoramic distortion, we incorporate a panel geometry embedding network that encodes both the local and global geometric features of a panel.
On the other hand, the model size and inference complexity of DGCNN are 42X and 1203X of those of Green-PointHop, respectively.
An efficient 3D scene flow estimation method called PointFlowHop is proposed in this work.
Many point cloud classification methods are developed under the assumption that all point clouds in the dataset are well aligned with the canonical axes so that the 3D Cartesian point coordinates can be employed to learn features.
A low-complexity point cloud compression method called the Green Point Cloud Geometry Codec (GPCGC), is proposed to encode the 3D spatial coordinates of static point clouds efficiently.
We urgently need to shift the paradigm for data analysis from the classical Euclidean data analysis to both Euclidean and non Euclidean data analysis and develop more and more innovative methods for describing, estimating and inferring non Euclidean geometries of modern real datasets.
To extract effective features for PCQA, we propose a new graph convolution kernel, i. e., GPAConv, which attentively captures the perturbation of structure and texture.
Unlike previous works, our framework is data efficient, which requires a small amount of matting ground-truth to learn to estimate high quality object mattes.
Considering the importance of saliency detection in quality assessment, we propose an effective full-reference PCQA metric which makes the first attempt to utilize the saliency information to facilitate quality prediction, called point cloud quality assessment using 3D saliency maps (PQSM).
In addition, the KRNets are optimized in a meta-learning manner to ensure the knowledge transferring and the student learning are beneficial to improving the reconstructed quality of the student.
In this work, we modify the MFRNet network architecture to enable multiple frame processing, and the new network, multi-frame MFRNet, has been integrated into the EBDA framework using two Versatile Video Coding (VVC) host codecs: VTM 16. 2 and the Fraunhofer Versatile Video Encoder (VVenC 1. 4. 0).
Recently, high-quality video conferencing with fewer transmission bits has become a very hot and challenging problem.
The previous deep video compression approaches only use the single scale motion compensation strategy and rarely adopt the mode prediction technique from the traditional standards like H. 264/H. 265 for both motion and residual compression.
Our model is trained to predict human images in arbitrary poses, which encourages it to extract disentangled and expressive neural textures representing the appearance of different semantic entities.
An unsupervised point cloud object retrieval and pose estimation method, called PCRP, is proposed in this work.
In this work, we propose the first end-to-end optimized framework for compressing automotive stereo videos (i. e., stereo videos from autonomous driving applications) from both left and right views.
GreenPCO is an unsupervised learning method that predicts object motion by matching features of consecutive point cloud scans.
This work addresses two major issues of end-to-end learned image compression (LIC) based on deep neural networks: variable-rate learning where separate networks are required to generate compressed images with varying qualities, and the train-test mismatch between differentiable approximate quantization and true hard quantization.
It is named GSIP (Green Segmentation of Indoor Point clouds) and its performance is evaluated on a representative large-scale benchmark -- the Stanford 3D Indoor Segmentation (S3DIS) dataset.
The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications.
Compressing Deep Neural Network (DNN) models to alleviate the storage and computation requirements is essential for practical applications, especially for resource limited devices.
Specifically, we will first provide an overview of the MPEG VCM group including use cases, requirements, processing pipelines, plan for potential VCM standards, followed by the evaluation framework including machine-vision tasks, dataset, evaluation metrics, and anchor generation.
The key idea is to replace the image to be compressed with a substitutional one that outperforms the original one in a desired way.
Learning-based visual data compression and analysis have attracted great interest from both academia and industry recently.
Inspired by the recent PointHop classification method, an unsupervised 3D point cloud registration method, called R-PointHop, is proposed in this work.
As the successor of H. 265/HEVC, the new versatile video coding standard (H. 266/VVC) can provide up to 50% bitrate saving with the same subjective quality, at the cost of increased decoding complexity.
In the second stage, we derive another warping model to refine warping results in less important regions by eliminating serious distortions in shape, disparity and 3D structure.
This issue makes the generator lack the incentive from the discriminator to learn high-frequency content of data, resulting in a significant spectrum discrepancy between generated images and real images.
On one hand, we propose to discriminate ground-truth waveform from synthetic one in frequency domain for offering more consistency guarantees instead of only in time domain.
The UFF method exploits statistical correlations of points in a point cloud set to learn shape and point features in a one-pass feedforward manner through a cascaded encoder-decoder architecture.
An unsupervised point cloud registration method, called salient points analysis (SPA), is proposed in this work.
Denoisers trained with synthetic data often fail to cope with the diversity of unknown noises, giving way to methods that can adapt to existing noise without knowing its ground truth.
This paper investigates how to leverage a DurIAN-based average model to enable a new speaker to have both accurate pronunciation and fluent cross-lingual speaking with very limited monolingual data.
The PointHop method was recently proposed by Zhang et al. for 3D point cloud classification with unsupervised feature extraction.
Generic object detection algorithms have proven their excellent performance in recent years.
However, video quality exhibits different characteristics from static image quality due to the existence of temporal masking effects.
Recent advances of image-to-image translation focus on learning the one-to-many mapping from two aspects: multi-modal translation and multi-domain translation.
Image inpainting techniques have shown significant improvements by using deep neural networks recently.
In the attribute building stage, we address the problem of unordered point cloud data using a space partitioning procedure and developing a robust descriptor that characterizes the relationship between a point and its one-hop neighbor in a PointHop unit.
The block-based coding structure in the hybrid video coding framework inevitably introduces compression artifacts such as blocking, ringing, etc.
Point cloud is a fundamental 3D representation which is widely used in real world applications such as autonomous driving.
Remarkably, we obtain the frame-level AUC score of 82. 12% on UCF-Crime.
Firstly, due to the noisy input signal of the model, there is still a gap between the quality of generated and natural waveforms.
Despite tremendous progress achieved in temporal action detection, state-of-the-art methods still suffer from the sharp performance deterioration when localizing the starting and ending temporal action boundaries.
Recent advances in image-to-image translation have seen a rise in approaches generating diverse images through a single network.