Each dimension is quantized to a small set of fixed values, leading to an (implicit) codebook given by the product of these sets.
Despite the tremendous success of diffusion generative models in text-to-image generation, replicating this success in the domain of image compression has proven difficult.
We show how bidirectional transformers trained for masked token prediction can be applied to neural image compression to achieve state-of-the-art results.
By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models.
The resulting video compression transformer outperforms previous methods on standard video compression data sets.
Our approach significantly outperforms previous neural and non-neural video compression methods in a user study, setting a new state-of-the-art in visual quality for neural methods.
We extensively study how to combine Generative Adversarial Networks and learned compression to obtain a state-of-the-art generative lossy compression system.
A popular approach to learning encoders for lossy compression is to use additive uniform noise during training as a differentiable approximation to test-time quantization.
Despite considerable progress on end-to-end optimized deep networks for image compression, video coding remains a challenging task.
We propose an interactive, scribble-based annotation framework which operates on the whole image to produce segmentations for all regions.
We propose the first practical learned lossless image compression system, L3C, and show that it outperforms the popular engineered codecs, PNG, WebP and JPEG 2000.
Ranked #3 on Image Compression on ImageNet32
no code implementations • 3 Oct 2018 • Andrey Ignatov, Radu Timofte, Thang Van Vu, Tung Minh Luu, Trung X. Pham, Cao Van Nguyen, Yongwoo Kim, Jae-Seok Choi, Munchurl Kim, Jie Huang, Jiewen Ran, Chen Xing, Xingguang Zhou, Pengfei Zhu, Mingrui Geng, Yawei Li, Eirikur Agustsson, Shuhang Gu, Luc van Gool, Etienne de Stoutz, Nikolay Kobyshev, Kehui Nie, Yan Zhao, Gen Li, Tong Tong, Qinquan Gao, Liu Hanwen, Pablo Navarrete Michelini, Zhu Dan, Hu Fengshuo, Zheng Hui, Xiumei Wang, Lirui Deng, Rang Meng, Jinghui Qin, Yukai Shi, Wushao Wen, Liang Lin, Ruicheng Feng, Shixiang Wu, Chao Dong, Yu Qiao, Subeesh Vasu, Nimisha Thekke Madam, Praveen Kandula, A. N. Rajagopalan, Jie Liu, Cheolkon Jung
This paper reviews the first challenge on efficient perceptual image enhancement with the focus on deploying deep learning models on smartphones.
We present a learned image compression system based on GANs, operating at extremely low bitrates.
Motivated by recent work on deep neural network (DNN)-based image compression methods showing potential improvements in image quality, savings in storage, and bandwidth reduction, we propose to perform image understanding tasks such as classification and segmentation directly on the compressed representations produced by these compression methods.
During training, the auto-encoder makes use of the context model to estimate the entropy of its representation, and the context model is concurrently updated to learn the dependencies between the symbols in the latent representation.
This year alone has seen unprecedented leaps in the area of learning-based image translation, namely CycleGAN, by Zhu et al.
Ranked #5 on Facial Expression Translation on CelebA
We propose the use of synthetic labels obtained through clustering to disentangle and stabilize GAN training.
Generative models such as Variational Auto Encoders (VAEs) and Generative Adversarial Networks (GANs) are typically trained for a fixed prior distribution in the latent space, such as uniform or Gaussian.
We propose the Anchored Regression Network (ARN), a nonlinear regression network which can be seamlessly integrated into various networks or can be used stand-alone when the features have already been fixed.
Our new WebVision database and relevant studies in this work would benefit the advance of learning state-of-the-art visual models with minimum supervision based on web data.
The 2017 WebVision challenge consists of two tracks, the image classification task on WebVision test set, and the transfer learning task on PASCAL VOC 2012 dataset.
We present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy.
k^2-means builds upon the standard k-means (Lloyd's algorithm) and combines a new strategy to accelerate the convergence with a new low time complexity divisive initialization.
Subspace clustering refers to the problem of clustering high-dimensional data points into a union of low-dimensional linear subspaces, where the number of subspaces, their dimensions and orientations are all unknown.