Incorporating human perception into training of convolutional neural networks (CNN) has boosted generalization capabilities of such models in open-set recognition tasks. One of the active research questions is where (in the model architecture) and how to efficiently incorporate always-limited human perceptual data into training strategies of models. In this paper, we introduce MENTOR (huMan pErceptioN-guided preTraining fOr increased geneRalization), which addresses this question through two unique rounds of training the CNNs tasked with open-set anomaly detection. First, we train an autoencoder to learn human saliency maps given an input image, without class labels. The autoencoder is thus tasked with discovering domain-specific salient features which mimic human perception. Second, we remove the decoder part, add a classification layer on top of the encoder, and fine-tune this new model conventionally. We show that MENTOR's benefits are twofold: (a) significant accuracy boost in anomaly detection tasks (in this paper demonstrated for detection of unknown iris presentation attacks, synthetically-generated faces, and anomalies in chest X-ray images), compared to models utilizing conventional transfer learning (e.g., sourcing the weights from ImageNet-pretrained models) as well as to models trained with the state-of-the-art approach incorporating human perception guidance into loss functions, and (b) an increase in the efficiency of model training, requiring fewer epochs to converge compared to state-of-the-art training methods.