The problem is only exacerbated by a lack of segmentation in transcripts of audio/video recordings.
Recent literature adds extractive summaries as guidance for abstractive summarization models to provide hints of salient content and achieves better performance.
While several COPs can be formulated as the prioritization of input items, as is common in the information retrieval, it has not been fully explored how the learning-to-rank techniques can be incorporated into deep RL for COPs.
With the explosive growth of livestream broadcasting, there is an urgent need for new summarization technology that enables us to create a preview of streamed content and tap into this wealth of knowledge.
In order to come up with a better representation and capturing of long term spatio-temporal relationships, we propose three variants of Self-Attention Network (SAN), namely, SAN-V1, SAN-V2 and SAN-V3.
Ranked #46 on Skeleton Based Action Recognition on NTU RGB+D
Emerged as one of the best performing techniques for extractive summarization, determinantal point processes select the most probable set of sentences to form a summary according to a probability measure defined by modeling sentence prominence and pairwise repulsion.
The video based CNN works have focused on effective ways to fuse appearance and motion networks, but they typically lack utilizing temporal information over video frames.
The most important obstacles facing multi-document summarization include excessive redundancy in source descriptions and the looming shortage of training data.