1 code implementation • 24 Feb 2025 • Hogun Kee, Wooseok Oh, Minjae Kang, Hyemin Ahn, Songhwai Oh
In this paper, we present the tidiness score-guided Monte Carlo tree search (TSMCTS), a novel framework designed to address the tabletop tidying up problem using only an RGB-D camera.
no code implementations • 20 Sep 2024 • Xiyana Figuera, Soogeun Park, Hyemin Ahn
However, since random robot poses can result in extreme and infeasible human poses, we propose an additional technique to sort out extreme poses by exploiting a human body prior trained from a large amount of human pose data.
no code implementations • 30 May 2024 • Hyemin Ahn
Our framework works in two processes: (1) training a reward model which perceives the relationship between optical flow (visual rhythm) and music from human dance videos, (2) training the non-humanoid dancer based on that reward model, and reinforcement learning.
1 code implementation • 5 Apr 2024 • Gawon Choi, Hyemin Ahn
This leads us to pose a question: If small LMs can be trained to reason in chains within a single domain, would even small LMs be good task planners for the robots?
no code implementations • 14 Aug 2023 • Esteve Valls Mascaro, Hyemin Ahn, Dongheui Lee
Experimental results show that our model successfully forecasts human motion on the Human3. 6M dataset.
Ranked #12 on
Human Pose Forecasting
on Human3.6M
1 code implementation • 6 Jun 2023 • Zhifan Ni, Esteve Valls Mascaró, Hyemin Ahn, Dongheui Lee
Understanding the human-object interactions (HOIs) from a video is essential to fully comprehend a visual scene.
Ranked #2 on
Human-Object Interaction Anticipation
on VidHOI
Human-Object Interaction Anticipation
Human-Object Interaction Detection
+1
no code implementations • 28 Feb 2023 • Hyemin Ahn, Esteve Valls Mascaro, Dongheui Lee
After many researchers observed fruitfulness from the recent diffusion probabilistic model, its effectiveness in image generation is actively studied these days.
no code implementations • 16 Feb 2023 • Esteve Valls Mascaro, Shuo Ma, Hyemin Ahn, Dongheui Lee
In addition, our model is tested in conditions where the human motion is severely occluded, demonstrating its robustness in reconstructing and predicting 3D human motion in a highly noisy environment.
1 code implementation • 25 Jul 2022 • Esteve Valls Mascaro, Hyemin Ahn, Dongheui Lee
Our framework first extracts two level of human information over the N observed videos human actions through a Hierarchical Multi-task MLP Mixer (H3M).
Ranked #5 on
Long Term Action Anticipation
on Ego4D
no code implementations • 11 Mar 2021 • Sungjoon Choi, Min Jae Song, Hyemin Ahn, Joohyung Kim
In this paper, we present self-supervised shared latent embedding (S3LE), a data-driven motion retargeting method that enables the generation of natural motions in humanoid robots from motion capture data or RGB videos.
1 code implementation • ICCV 2021 • Hyemin Ahn, Dongheui Lee
In this paper, we propose Hierarchical Action Segmentation Refiner (HASR), which can refine temporal action segmentation results from various models by understanding the overall context of a given video in a hierarchical way.
Ranked #13 on
Action Segmentation
on 50 Salads
no code implementations • 16 Dec 2020 • Hyemin Ahn, Obin Kwon, Kyoungdo Kim, Jaeyeon Jeong, Howoong Jun, Hongjung Lee, Dongheui Lee, Songhwai Oh
We also suggest a relevant dataset and model which can be a baseline, and show that our model trained with the proposed dataset can also be applied to the real world based on the CycleGAN.
no code implementations • 11 Nov 2019 • Hyemin Ahn, Jaehun Kim, Kihyun Kim, Songhwai Oh
The trained dance pose generator, which is a generative autoregressive model, is able to synthesize a dance sequence longer than 5, 000 pose frames.
2 code implementations • 28 May 2018 • Hyemin Ahn, Sungjoon Choi, Nuri Kim, Geonho Cha, Songhwai Oh
To handle the inherent ambiguity in human language commands, a suitable question which can resolve the ambiguity is generated.
1 code implementation • 15 Oct 2017 • Hyemin Ahn, Timothy Ha, Yunho Choi, Hwiyeon Yoo, Songhwai Oh
We demonstrate that the network can generate human-like actions which can be transferred to a Baxter robot, such that the robot performs an action based on a provided sentence.