CaesarNeRF explicitly models pose differences of reference views to combine scene-level semantic representations, providing a calibrated holistic understanding.
Large Language Models (LLMs) have transformed the landscape of artificial intelligence, while their enormous size presents significant challenges in terms of computational costs.
A central challenge of video prediction lies where the system has to reason the objects' future motions from image frames while simultaneously maintaining the consistency of their appearances across frames.
To search an optimal sub-network within a general deep neural network (DNN), existing neural architecture search (NAS) methods typically rely on handcrafting a search space beforehand.
We propose the second generation of Only-Train-Once (OTOv2), which first automatically trains and compresses a general DNN only once from scratch to produce a more compact model with competitive performance without fine-tuning.
To our knowledge, RSOD is the first quantitatively evaluated and graded snowy OD dataset.
Space-time video super-resolution (STVSR) is the task of interpolating videos with both Low Frame Rate (LFR) and Low Resolution (LR) to produce High-Frame-Rate (HFR) and also High-Resolution (HR) counterparts.
Ranked #2 on Space-time Video Super-resolution on Vimeo90K-Medium
Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices.
DNN-based frame interpolation--that generates the intermediate frames given two consecutive frames--typically relies on heavy model architectures with a huge number of features, preventing them from being deployed on systems with limited resources, e. g., mobile devices.
Ranked #1 on Video Frame Interpolation on Middlebury (LPIPS metric)
To our knowledge this is the most complete dataset for super resolution, ISP and image quality enhancement.
We have validated our approach on four recognized datasets (three synthetic and one real-world).
We propose a deep fully convolutional neural network with a new type of layer, named median layer, to restore images contaminated by the salt-and-pepper (s&p) noise.