…These annotations are generated by various state-of-the-art language models (LLMs) and include detailed descriptions of the activities being performed, the count of people present, and their specific poses
1 PAPER • NO BENCHMARKS YET