The attribution method provides a direction for interpreting opaque neural networks in a visual way by identifying and visualizing the input regions/pixels that dominate the output of a network.
We validated our method on domain adaptation of hand segmentation from real and simulation images.
In this report, we describe the technical details of our submission to the 2021 EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition.
People spend an enormous amount of time and effort looking for lost objects.
''Making black box models explainable'' is a vital problem that accompanies the development of deep learning networks.
Gradient-weighted class activation mapping (Grad-CAM) was used to conceptualize the diagnostic basis of the CAD system.
Recent advances in computer vision have made it possible to automatically assess from videos the manipulation skills of humans in performing a task, which breeds many important applications in domains such as health rehabilitation and manufacturing.
In this work, we address two coupled tasks of gaze prediction and action recognition in egocentric videos by exploring their mutual context.
In the proposed model, we explore various semantic relationships between actions, grasp types and object attributes, and show how the context can be used to boost the recognition of each component.
We present a new computational model for gaze prediction in egocentric videos by exploring patterns in temporal shift of gaze fixations (attention transition) that are dependent on egocentric manipulation tasks.
Spectral analysis of natural scenes can provide much more detailed information about the scene than an ordinary RGB camera.
This unified framework benefits all four tasks - stereo, optical flow, visual odometry and motion segmentation leading to overall higher accuracy and efficiency.
To solve this problem, we describe a local region in an image via hierarchical Gaussian distribution in which both means and covariances are included in their parameters.
We propose a privacy-preserving framework for learning visual classifiers by leveraging distributed private image data.
We envision a future time when wearable cameras are worn by the masses and recording first-person point-of-view videos of everyday life.
In this paper, we propose an effective method for coded hyperspectral image restoration, which exploits extensive structure sparsity in the hyperspectral image.
We aim to understand the dynamics of social interactions between two people by recognizing their actions and reactions using a head-mounted camera.
In both steps, unlike the hierarchical covariance descriptor, the proposed descriptor can model both the mean and the covariance information of pixel features properly.
The local expansion moves extend traditional expansion moves by two ways: localization and spatial propagation.
This paper introduces a novel method to separate fluorescent and reflective components in the spectral domain.
Hyperspectral imaging is beneficial in a diverse range of applications from diagnostic medicine, to agriculture, to surveillance to name a few.
This paper addresses the illumination and reflectance spectra separation (IRSS) problem of a hyperspectral image captured under general spectral illumination.
This sort of symmetry can be observed in a 1D BRDF slice from a subset of surface normals with the same azimuth angle, and we use it to devise an efficient modeling and solution method to constrain and recover the elevation angles of surface normals accurately.
We then show that given the spectral reflectance and fluorescent chromaticity, the fluorescence absorption and emission spectra can also be estimated.
Unlike existing appearance-based methods that assume person-specific training data, we use a large amount of cross-subject training data to train a 3D gaze estimator.