Myocardial motion tracking stands as an essential clinical tool in the prevention and detection of Cardiovascular Diseases (CVDs), the foremost cause of death globally.
This connection ensures a seamless backpropagation of gradients from the network's output back to the input coordinates, thereby enhancing regularization.
Unlike direct observation-to-action mapping, Karma recurrently maintains a multi-dimensional time series of observations, returns, and actions as input and employs causal sequence modeling via a decision transformer to determine the next action.
DNeRV achieves competitive results against the state-of-the-art neural compression approaches and outperforms existing implicit methods on downstream inpainting and interpolation for $960 \times 1920$ videos.
With this aim, we extensively exploit cross-scale, cross-group, and cross-color correlations of point cloud attribute to ensure accurate probability estimation and thus high coding efficiency.
This paper addresses the problem of lossy image compression, a fundamental problem in image processing and information theory that is involved in many real-world applications.
This work extends the Multiscale Sparse Representation (MSR) framework developed for static Point Cloud Geometry Compression (PCGC) to support the dynamic PCGC through the use of multiscale inter conditional coding.
Conventional cameras capture image irradiance on a sensor and convert it to RGB images using an image signal processor (ISP).
Implicit neural representation (INR) characterizes the attributes of a signal as a function of corresponding coordinates which emerges as a sharp weapon for solving inverse problems.
Quantizing a floating-point neural network to its fixed-point representation is crucial for Learned Image Compression (LIC) because it improves decoding consistency for interoperability and reduces space-time complexity for implementation.
A learning-based adaptive loop filter is developed for the Geometry-based Point Cloud Compression (G-PCC) standard to reduce attribute compression artifacts.
Recent research has shown a strong theoretical connection between variational autoencoders (VAEs) and the rate-distortion theory.
Although convolutional representation of multiscale sparse tensor demonstrated its superior efficiency to accurately model the occupancy probability for the compression of geometry component of dense object point clouds, its capacity for representing sparse LiDAR point cloud geometry (PCG) was largely limited.
We utilize a disparity network to transfer spatiotemporal information across views even in large disparity scenes, based on which, we propose disparity-guided flow-based warping for LSR-HFR view and complementary warping for HSR-LFR view.
Predicting the dynamic behaviors of particles in suspension subject to hydrodynamic interaction (HI) and external drive can be critical for many applications.
To this end, Integrated Convolution and Self-Attention (ICSA) unit is first proposed to form a content-adaptive transform to characterize and embed neighborhood information dynamically of any input.
Image signal processing (ISP) is crucial for camera imaging, and neural networks (NN) solutions are extensively deployed for daytime scenes.
The event camera is a bio-vision inspired camera with high dynamic range, high response speed, and low power consumption, recently attracting extensive attention for its use in vast vision tasks.
End-to-end learned lossy image coders (LICs), as opposed to hand-crafted image codecs, have shown increasing superiority in terms of the rate-distortion performance.
A Transformer-based Image Compression (TIC) approach is developed which reuses the canonical variational autoencoder (VAE) architecture with paired main and hyper encoder-decoders.
We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements, and combines them adaptively to achieve more accurate inter prediction.
Adaptive Bit Rate (ABR) decision plays a crucial role for ensuring satisfactory Quality of Experience (QoE) in video streaming applications, in which past network statistics are mainly leveraged for future network bandwidth prediction.
1 code implementation • 21 Apr 2021 • Ren Yang, Radu Timofte, Jing Liu, Yi Xu, Xinjian Zhang, Minyi Zhao, Shuigeng Zhou, Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy, Xin Li, Fanglong Liu, He Zheng, Lielin Jiang, Qi Zhang, Dongliang He, Fu Li, Qingqing Dang, Yibin Huang, Matteo Maggioni, Zhongqian Fu, Shuai Xiao, Cheng Li, Thomas Tanay, Fenglong Song, Wentao Chao, Qiang Guo, Yan Liu, Jiang Li, Xiaochao Qu, Dewang Hou, Jiayu Yang, Lyn Jiang, Di You, Zhenyu Zhang, Chong Mou, Iaroslav Koshelev, Pavel Ostyakov, Andrey Somov, Jia Hao, Xueyi Zou, Shijie Zhao, Xiaopeng Sun, Yiting Liao, Yuanzhi Zhang, Qing Wang, Gen Zhan, Mengxi Guo, Junlin Li, Ming Lu, Zhan Ma, Pablo Navarrete Michelini, Hai Wang, Yiyun Chen, Jingyu Guo, Liliang Zhang, Wenming Yang, Sijung Kim, Syehoon Oh, Yucong Wang, Minjie Cai, Wei Hao, Kangdi Shi, Liangyan Li, Jun Chen, Wei Gao, Wang Liu, XiaoYu Zhang, Linjie Zhou, Sixin Lin, Ru Wang
This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results.
In this paper, we propose a new distortion quantification method for point clouds, the multiscale potential energy discrepancy (MPED).
This paper proposes a decoder-side Cross Resolution Synthesis (CRS) module to pursue better compression efficiency beyond the latest Versatile Video Coding (VVC), where we encode intra frames at original high resolution (HR), compress inter frames at a lower resolution (LR), and then super-resolve decoded LR inter frames with the help from preceding HR intra and neighboring LR inter frames.
Over the past two decades, traditional block-based video coding has made remarkable progress and spawned a series of well-known standards such as MPEG-4, H. 264/AVC and H. 265/HEVC.
We propose the GraphSIM -- an objective metric to accurately predict the subjective quality of point cloud with superimposed geometry and color impairments.
The Object-Based Image Coding (OBIC) that was extensively studied about two decades ago, promised a vast application perspective for both ultra-low bitrate communication and high-level semantical content understanding, but it had rarely been used due to the inefficient compact representation of object with arbitrary shape.
Traditional video compression technologies have been developed over decades in pursuit of higher coding efficiency.
This paper proposes a novel Non-Local Attention optmization and Improved Context modeling-based image compression (NLAIC) algorithm, which is built on top of the deep nerual network (DNN)-based variational auto-encoder (VAE) structure.
This paper presents a dual camera system for high spatiotemporal resolution (HSTR) video acquisition, where one camera shoots a video with high spatial resolution and low frame rate (HSR-LFR) and another one captures a low spatial resolution and high frame rate (LSR-HFR) video.
This paper presents a novel end-to-end Learned Point Cloud Geometry Compression (a. k. a., Learned-PCGC) framework, to efficiently compress the point cloud geometry (PCG) using deep neural networks (DNN) based variational autoencoders (VAE).
Networked video applications, e. g., video conferencing, often suffer from poor visual quality due to unexpected network fluctuation and limited bandwidth.
This paper proposes a novel Non-Local Attention Optimized Deep Image Compression (NLAIC) framework, which is built on top of the popular variational auto-encoder (VAE) structure.
We propose a MultiScale AutoEncoder(MSAE) based extreme image compression framework to offer visually pleasing reconstruction at a very low bitrate.
Besides, a field study on perceptual quality is also given via a dedicated subjective assessment, to compare the efficiency of our proposed methods and other conventional image compression methods.
We present a lossy image compression method based on deep convolutional neural networks (CNNs), which outperforms the existing BPG, WebP, JPEG2000 and JPEG as measured via multi-scale structural similarity (MS-SSIM), at the same bit rate.