We sampled, modified, and recorded 2, 541 dialogues from the open-domain dialogue dataset DailyDialog which are adequately long to represent context of each dialogue.
We also use a point refinement module via 3D sparse convolution to fuse the information from both LiDAR range image and point cloud representations and reduce the artifacts on the borders of the objects.
This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences.
In our model, medical text annotation is introduced to compensate for the quality deficiency in image data.
Ranked #1 on Medical Image Segmentation on MoNuSeg
Then given a monocular RGB video of this subject, our method integrates information from both the image observation and the avatar prior, and accordingly recon-structs high-fidelity 3D textured models with dynamic details regardless of the visibility.
We introduce the problem of disentangling time-lapse sequences in a way that allows separate, after-the-fact control of overall trends, cyclic effects, and random effects in the images, and describe a technique based on data-driven generative models that achieves this goal.
In this work, we present a text-driven controllable framework, Text2Human, for a high-quality and diverse human generation.
The goal of multi-object tracking (MOT) is detecting and tracking all the objects in a scene, while keeping a unique identifier for each object.
Ranked #1 on Multi-Object Tracking on MOT20 (using extra training data)