Tracking objects with persistence in cluttered and dynamic environments remains a difficult challenge for computer vision systems.
We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.
For computer vision systems to operate in dynamic situations, they need to be able to represent and reason about object permanence.
The elementary operation of cropping underpins nearly every computer vision system, ranging from data augmentation and translation invariance to computational photography and representation learning.
This way, the hallucinated details are integrated with the style of the original image, in an attempt to further boost the quality of the result and possibly allow for arbitrary output resolutions to be supported.