15 papers with code • 1 benchmarks • 4 datasets
The goal of the YouMakeup VQA Challenge 2020 is to provide a common benchmark for fine-grained action understanding in domain-specific videos e. g. makeup instructional videos.
In this thesis, we focus on video action understanding problems from an online and real-time processing point of view.
The main reason is that large number of nodes (i. e., video frames) makes GCNs hard to capture and model temporal relations in videos.
However, there remains a lack of studies that extend action composition and leverage multiple viewpoints and multiple modalities of data for representation learning.
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications, whereas the data of rare fine-grained categories is very limited.