Our method leverages a data driven prior in the form of a single image depth prediction network trained on large-scale datasets, the output of which is used as an input to our model.
Depth maps are used in a wide range of applications from 3D rendering to 2D image effects such as Bokeh.
Motion-based video frame interpolation commonly relies on optical flow to warp pixels from the inputs to the desired interpolation instant.
Ranked #1 on Video Frame Interpolation on X4K1000FPS
Specifically, splatting can be used to warp the input images to an arbitrary temporal location based on an optical flow estimate.
Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length.
Ranked #1 on Monocular Depth Estimation on NYU-Depth V2 (absolute relative error metric, using extra training data)
We present a method to perform novel view and time synthesis of dynamic scenes, requiring only a monocular video with known camera poses as input.
Video frame interpolation, the synthesis of novel views in time, is an increasingly popular research direction with many new papers further advancing the state of the art.
Traditional reflection removal algorithms either use a single image as input, which suffers from intrinsic ambiguities, or use multiple images from a moving camera, which is inconvenient for users.
In contrast, how to perform forward warping has seen less attention, partly due to additional challenges such as resolving the conflict of mapping multiple pixels to the same target location in a differentiable way.
Ranked #2 on Video Frame Interpolation on Middlebury
According to this depth estimate, our framework then maps the input image to a point cloud and synthesizes the resulting video frames by rendering the point cloud from the corresponding camera positions.
Ranked #1 on Depth Estimation on NYU-Depth V2
Finally, unlike common approaches that blend the pre-warped frames, our method feeds them and their context maps to a video frame synthesis neural network to produce the interpolated frame in a context-aware fashion.
Our method develops a deep fully convolutional neural network that takes two input frames and estimates pairs of 1D kernels for all pixels simultaneously.
Ranked #8 on Video Frame Interpolation on Middlebury