Knowledge distillation (KD) emerges as a challenging yet promising technique for compressing deep learning models, characterized by the transmission of extensive learning representations from proficient and computationally intensive teacher models to compact student models.
Ranked #21 on Image Super-Resolution on Urban100 - 4x upscaling
Multimedia compression allows us to watch videos, see pictures and hear sounds within a limited bandwidth, which helps the flourish of the internet.
In recommender systems, leveraging Graph Neural Networks (GNNs) to formulate the bipartite relation between users and items is a promising way.
In such models, epigenetic factors are usually proposed to act on the chromatin regions directly involved in the expression of relevant genes.
Remote photoplethysmography (rPPG) is an attractive camera-based health monitoring method that can measure the heart rhythm from facial videos.
To further improve the performance of intra coding in Versatile Video Coding (VVC), an intelligent intra mode derivation method is proposed in this paper, termed as Deep Learning based Intra Mode Derivation (DLIMD).
Blood pressure (BP) monitoring is vital in daily healthcare, especially for cardiovascular diseases.
In this paper, we propose a distortion-aware loop filtering model to improve the performance of intra coding for 360$^o$ videos projected via equirectangular projection (ERP) format.
However, the shape parameters of traditional 3DMMs satisfy the multivariate Gaussian distribution while the identity embeddings satisfy the hypersphere distribution, and this conflict makes it challenging for face reconstruction models to preserve the faithfulness and the shape consistency simultaneously.
Micro-expressions are spontaneous, unconscious facial movements that show people's true inner emotions and have great potential in related fields of psychological testing.
The impact of distorted geometry and texture attributes is further discussed in this paper.
The only input of DymSLAM is stereo video, and its output includes a dense map of the static environment, 3D model of the moving objects and the trajectories of the camera and the moving objects.
We then use deep feature learning to predict samples of the SUR curve and apply the method of least squares to fit the parametric model to the predicted samples.
In this letter, a stereo-based multi-motion visual odometry method is proposed to acquire the poses of the robot and other moving objects.
Furthermore, by manipulating the mapping vectors, an autoencoder is able to generalize SCMA, thus a dense code multiple access (DCMA) scheme is proposed.
To this end, we make use of attention modules that learn to highlight regions in the video and aggregate features for recognition.
Ranked #38 on Action Recognition on UCF101
In this paper, we propose a novel deep network, called rotatable region-based residual network (R$^3$-Net), to detect multi-oriented vehicles in aerial images and videos.
This paper addresses the challenging problem of estimating the general visual attention of people in images.
Due to its promising classification performance, sparse representation based classification(SRC) algorithm has attracted great attention in the past few years.