Learning to Localize and Align Fine-Grained Actions to Sparse Instructions

22 Sep 2018Meera HahnNataniel RuizJean-Baptiste AlayracIvan LaptevJames M. Rehg

Automatic generation of textual video descriptions that are time-aligned with video content is a long-standing goal in computer vision. The task is challenging due to the difficulty of bridging the semantic gap between the visual and natural language domains... (read more)

PDF Abstract


No code implementations yet. Submit your code now

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.