FixMyPose is a dataset for automated pose correction. It consists of descriptions to correct a "current" pose to look like a "target" pose, in English and Hindi. The collected descriptions have interesting linguistic properties such as egocentric relations to environment objects, analogous references, etc., requiring an understanding of spatial relations and commonsense knowledge about postures.
Further, to avoid ML biases, the dataset maintains a balance across characters with diverse demographics, who perform a variety of movements in several interior environments (e.g., homes, offices).
This dataset introduces the pose-correctional-captioning task and its reverse target-pose-retrieval task. During the correctional-captioning task, models must generate descriptions of how to move from the current to target pose image, whereas in the retrieval task, models should select the correct target pose given the initial pose and correctional description.