Conventional video compression approaches use the predictive coding architecture and encode the corresponding motion information and residual information.
In video compression, most of the existing deep learning approaches concentrate on the visual quality of a single frame, while ignoring the useful priors as well as the temporal information of adjacent frames.
We present DeepCache, a principled cache design for deep learning inference in continuous mobile vision.
In this paper, we propose a quality enhancement network of versatile video coding (VVC) compressed videos by jointly exploiting spatial details and temporal structure (SDTS).
Standard video codecs rely on optical flow to guide inter-frame prediction: pixels from reference frames are moved via motion vectors to predict target video frames.
We analyze the performance of our approach on a variety of CNN architectures and demonstrate that FPGA implementation of ResNet-18 with our approach results in a reduction of around 40% in the memory energy footprint, compared to quantized network, with negligible impact on accuracy.