Human Interaction Recognition
8 papers with code • 8 benchmarks • 8 datasets
Human Interaction Recognition (HIR) is a field of study that involves the development of computer algorithms to detect and recognize human interactions in videos, images, or other multimedia content. The goal of HIR is to automatically identify and analyze the social interactions between people, their body language, and facial expressions.
Datasets
Most implemented papers
Slow-Fast Auditory Streams For Audio Recognition
We propose a two-stream convolutional network for audio recognition, that operates on time-frequency spectrogram inputs.
Interaction Relational Network for Mutual Action Recognition
Our solution is able to achieve state-of-the-art performance on the traditional interaction recognition datasets SBU and UT, and also on the mutual actions from the large-scale dataset NTU RGB+D.
Two-person Graph Convolutional Network for Skeleton-based Human Interaction Recognition
To overcome the above shortcoming, we introduce a novel unified two-person graph to represent inter-body and intra-body correlations between joints.
Interactive Spatiotemporal Token Attention Network for Skeleton-based General Interactive Action Recognition
To address these problems, we propose an Interactive Spatiotemporal Token Attention Network (ISTA-Net), which simultaneously model spatial, temporal, and interactive relations.
Learning Mutual Excitation for Hand-to-Hand and Human-to-Human Interaction Recognition
Recognizing interactive actions, including hand-to-hand interaction and human-to-human interaction, has attracted increasing attention for various applications in the field of video analysis and human-robot interaction.
SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition
We categorize the key skeletal-temporal relations for action recognition into a total of four distinct types.
Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents
We introduce the concept of "empathic grounding" in conversational agents as an extension of Clark's conceptualization of grounding in conversation in which the grounding criterion includes listener empathy for the speaker's affective state.
CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition
To this end, we introduce a Convex Hull Adaptive Shift based multi-Entity action recognition method (CHASE), which mitigates inter-entity distribution gaps and unbiases subsequent backbones.