Video Alignment

20 papers with code • 2 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Latest papers with no code

Scaling Up Video Summarization Pretraining with Large Language Models

no code yet • 4 Apr 2024

Long-form video content constitutes a significant portion of internet traffic, making automated video summarization an essential research problem.

The Effects of Short Video-Sharing Services on Video Copy Detection

no code yet • 26 Mar 2024

From the experimental results focusing on segment-level and video-level situations, we can see that three effects: "Segment-level VCD in short video-sharing services is more difficult than those in general video-sharing services", "Video-level VCD in short video-sharing services is easier than those in general video-sharing services", "The video alignment component mainly suppress the detection performance in short video-sharing services".

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

no code yet • 18 Mar 2024

To this end, this paper proposes a novel text-guided video inpainting model that achieves better consistency, controllability and compatibility.

FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing

no code yet • 10 Mar 2024

By leveraging the self-consistency property of CMs, we eliminate the need for time-consuming inversion or additional condition extraction, reducing editing time.

Towards A Better Metric for Text-to-Video Generation

no code yet • 15 Jan 2024

Experiments on the TVGE dataset demonstrate the superiority of the proposed T2VScore on offering a better metric for text-to-video generation.

STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment

no code yet • 12 Oct 2023

Continuously learning a variety of audio-video semantics over time is crucial for audio-related reasoning tasks in our ever-evolving world.

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment

no code yet • ICCV 2023

Nonetheless, the objective of the text-to-video retrieval task is to capture the complementary audio and video information that is pertinent to the text query rather than simply achieving better audio and video alignment.

ContentCTR: Frame-level Live Streaming Click-Through Rate Prediction with Multimodal Transformer

no code yet • 26 Jun 2023

However, most previous works treat the live as a whole item and explore the Click-through-Rate (CTR) prediction framework on item-level, neglecting that the dynamic changes that occur even within the same live room.

Learning to Ground Instructional Articles in Videos through Narrations

no code yet • ICCV 2023

To deal with the scarcity of labeled data at scale, we source the step descriptions from a language knowledge base (wikiHow) containing instructional articles for a large variety of procedural tasks.

Learning by Aligning 2D Skeleton Sequences in Time

no code yet • 31 May 2023

This paper presents a self-supervised temporal video alignment framework which is useful for several fine-grained human activity understanding applications.