We show that after applying exposure correction with the proposed model, the portrait matting quality increases significantly.
With the model achieving 94% accuracy on 23 food classes, the developed mobile application has potential to serve the visually impaired in automatic food recognition via images.
We tested the proposed method both on the standard benchmark datasets -- Replay-Mobile, OULU-NPU -- and on a real-world dataset.
In this paper, we present the Real World Occluded Faces (ROF) dataset, that contains faces with both upper face occlusion, due to sunglasses, and lower face occlusion, due to masks.
We first generate a coarse segmentation map from the input image and then predict the alpha matte by utilizing the image and segmentation map.
To train and evaluate the developed system, we collected and annotated images that represent face mask usage and face-hand interaction in the real world.
Indeed, differently from commonly used approaches that consider a neural network as a single computational block, i. e., using the output of the last layer only, MOCCA explicitly leverages the multi-layer structure of deep architectures.
Ranked #27 on Anomaly Detection on MVTec AD
First, we train a network to transform real LR images to the space of bicubically downsampled images in a supervised manner, by using both real LR/HR pairs and synthetic pairs.
We have achieved very promising results, especially on the FERET dataset, generating visually appealing face images from ear image inputs.
Research on offline signature verification has explored a large variety of methods on multiple signature datasets, which are collected under controlled conditions.
By benefiting from perceptual losses, recent studies have improved significantly the performance of the super-resolution task, where a high-resolution image is resolved from its low-resolution counterpart.
Despite significant progress toward super resolving more realistic images by deeper convolutional neural networks (CNNs), reconstructing fine and natural textures still remains a challenging problem.
Experimental results indicated that profile face images contain a rich source of information for age and gender classification.
By leveraging this information, we have utilized deep face models trained on MS-Celeb-1M and fine-tuned on VGGFace2 dataset and achieved state-of-the-art accuracies on the SCFace and ICB-RW benchmarks, even without using any training data from the datasets of these benchmarks.
We present a new dataset for form understanding in noisy scanned documents (FUNSD) that aims at extracting and structuring the textual content of forms.
Moreover, we also conduct experiments on a near-infrared dataset containing facial expression videos of drivers to assess the performance using in-the-wild data for driver emotion recognition.
To overcome these shortcomings, we propose attribute guided face image generation method using a single model, which is capable to synthesize multiple photo-realistic face images conditioned on the attributes of interest.
no code implementations • 11 Mar 2019 • Žiga Emeršič, Aruna Kumar S. V., B. S. Harish, Weronika Gutfeter, Jalil Nourmohammadi Khiarak, Andrzej Pacut, Earnest Hansley, Mauricio Pamplona Segundo, Sudeep Sarkar, Hyeonjung Park, Gi Pyo Nam, Ig-Jae Kim, Sagar G. Sangodkar, Ümit Kaçar, Murvet Kirci, Li Yuan, Jishou Yuan, Haonan Zhao, Fei Lu, Junying Mao, Xiaoshuang Zhang, Dogucan Yaman, Fevziye Irem Eyiokur, Kadir Bulut Özler, Hazim Kemal Ekenel, Debbrota Paul Chowdhury, Sambit Bakshi, Pankaj K. Sa, Banshidhar Majhi, Peter Peer, Vitomir Štruc
The goal of the challenge is to assess the performance of existing ear recognition techniques on a challenging large-scale ear dataset and to analyze performance of the technology from various viewpoints, such as generalization abilities to unseen data characteristics, sensitivity to rotations, occlusions and image resolution and performance bias on sub-groups of subjects, selected based on demographic criteria, i. e. gender and ethnicity.
We introduce a new scene graph generation method called image-level attentional context modeling (ILAC).
Although there have been a few previous work on gender classification using ear images, to the best of our knowledge, this study is the first work on age classification from ear images.
In this paper, we present an end-to-end network, called Cycle-Dehaze, for single image dehazing problem, which does not require pairs of hazy and corresponding ground truth images for training.
Ranked #4 on Image Dehazing on O-Haze
We have first shown the importance of domain adaptation, when deep convolutional neural network models are used for ear recognition.
Littering quantification is an important step for improving cleanliness of cities.
In this paper, we explore this aspect and provide a comprehensive study on combining multiple views for visual speech recognition.
Automatic visual speech recognition is an interesting problem in pattern recognition especially when audio data is noisy or not readily available.
In purely image-based pedestrian detection approaches, the state-of-the-art results have been achieved with convolutional neural networks (CNN) and surprisingly few detection frameworks have been built upon multi-cue approaches.
However, studies systematically exploring the strengths and weaknesses of existing deep models for face recognition are still relatively scarce in the literature.
Our results show that the recognition performance on deidentified images is close to chance, suggesting that the deidentification process based on GNNs is highly effective.
Domain specific VGG-Face CNN model has been found to be more useful and provided better performance for both age and gender classification tasks, when compared with generic AlexNet-like model, which shows that transfering from a closer domain is more useful.
This is particularly important, since in real-world face recognition applications, images may contain various kinds of degradations due to motion blur, noise, compression artifacts, color distortions, and occlusion.
Deep learning based approaches have been dominating the face recognition field due to the significant performance improvement they have provided on the challenging wild datasets.
To account for multiple labels per image, instead of using average age of the annotated face image as the class label, we have grouped the face images that are within a specified age range.