MPHOI-72 is a multi-person human-object interaction dataset that can be used for a wide variety of HOI/activity recognition and pose estimation/object tracking tasks. The dataset is challenging due to many body occlusions among the humans and objects. It consists of 72 videos captured from 3 different angles at 30 fps, with totally 26,383 frames and an average length of 12 seconds. It involves 5 humans performing in pairs, 6 object types, 3 activities and 13 sub-activities. The dataset includes color video, depth video, human skeletons, human and object bounding boxes.