Moreover, existing methods to estimate proprioceptive information such as total forces and torques applied on the finger from camera-based tactile sensors are not effective when the contact geometry is complex.
FingerSLAM is constructed with two constituent pose estimators: a multi-pass refined tactile-based pose estimator that captures movements from detailed local textures, and a single-pass vision-based pose estimator that predicts from a global view of the object.
Perceiving accurate 3D object shape is important for robots to interact with the physical world.
This model -- a deep, multimodal convolutional network -- predicts the outcome of a candidate grasp adjustment, and then executes a grasp by iteratively selecting the most promising actions.
In this work, we investigate the question of whether touch sensing aids in predicting grasp outcomes within a multimodal sensing framework that combines vision and touch.
We propose a self-supervised framework that learns to group visual entities based on their rate of co-occurrence in space and time.
We also discuss the existence of shape and material metamers, or combinations of distinct shape or material parameters that generate the same edge profile.
Our system works by generalizing across object classes: states and transformations learned on one set of objects are used to interpret the image collection for an entirely new object class.
In this paper, we study the problem of reproducing the world lighting from a single image of an object covered with random specular microfacets on the surface.