Video Alignment
33 papers with code • 2 benchmarks • 4 datasets
Most implemented papers
Time-Contrastive Networks: Self-Supervised Learning from Video
While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human.
Learning from Video and Text via Large-Scale Discriminative Clustering
Discriminative clustering has been successfully applied to a number of weakly-supervised learning tasks.
Temporal Cycle-Consistency Learning
We introduce a self-supervised representation learning method based on the task of temporal alignment between videos.
View-Invariant Probabilistic Embedding for Human Pose
Depictions of similar human body configurations can vary with changing viewpoints.
View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose
Recognition of human poses and actions is crucial for autonomous systems to interact smoothly with people.
AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI
To establish a unified evaluation framework for video generation tasks, our benchmark includes 11 metrics spanning four dimensions to assess algorithm performance.
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
Sora's high-motion intensity and long consistent videos have significantly impacted the field of video generation, attracting unprecedented attention.
LAMV: Learning to Align and Match Videos With Kernelized Temporal Layers
This paper considers a learnable approach for comparing and aligning videos.
Dynamic Temporal Alignment of Speech to Lips
This alignment is based on deep audio-visual features, mapping the lips video and the speech signal to a shared representation.
Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video
Our method learns a general skill embedding independently from the task context by using an adversarial loss.