1 code implementation • 3 Dec 2024 • Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, Kathrina Wu, Qin Lin, Junkun Yuan, Yanxin Long, Aladdin Wang, Andong Wang, Changlin Li, Duojun Huang, Fang Yang, Hao Tan, Hongmei Wang, Jacob Song, Jiawang Bai, Jianbing Wu, Jinbao Xue, Joey Wang, Kai Wang, Mengyang Liu, Pengyu Li, Shuai Li, Weiyan Wang, Wenqing Yu, Xinchi Deng, Yang Li, Yi Chen, Yutao Cui, Yuanbo Peng, Zhentao Yu, Zhiyu He, Zhiyong Xu, Zixiang Zhou, Zunnan Xu, Yangyu Tao, Qinglin Lu, Songtao Liu, Dax Zhou, Hongfa Wang, Yong Yang, Di Wang, Yuhong Liu, Jie Jiang, Caesar Zhong
In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models.
1 code implementation • 31 Jul 2024 • Zhenghao Zhang, Junchao Liao, Menghao Li, Zuozhuo Dai, Bingxue Qiu, Siyu Zhu, Long Qin, Weizhi Wang
The TE encodes arbitrary trajectories into hierarchical spacetime motion patches with a 3D video compression network.
1 code implementation • 21 Mar 2024 • Shenhao Zhu, Junming Leo Chen, Zuozhuo Dai, Qingkun Su, Yinghui Xu, Xun Cao, Yao Yao, Hao Zhu, Siyu Zhu
In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques.
1 code implementation • 18 Mar 2024 • Zhenghao Zhang, Zuozhuo Dai, Long Qin, Weizhi Wang
Large-scale text-to-video models have shown remarkable abilities, but their direct application in video editing remains challenging due to limited available datasets.
no code implementations • CVPR 2024 • Youtian Lin, Zuozhuo Dai, Siyu Zhu, Yao Yao
Moreover, the explicit deformation modeling for discretized Gaussian points ensures ultra-fast training and rendering of a 4D scene, which is comparable to the original 3DGS designed for static 3D reconstruction.
1 code implementation • 21 Nov 2023 • Zuozhuo Dai, Zhenghao Zhang, Yao Yao, Bingxue Qiu, Siyu Zhu, Long Qin, Weizhi Wang
Image animation is a key task in computer vision which aims to generate dynamic visual content from static image.
no code implementations • 14 Jul 2023 • Zuozhuo Dai, Fangtao Shao, Qingkun Su, Zilong Dong, Siyu Zhu
In the second stage, we propose a novel decoupled video text cross attention module to capture fine-grained multimodal information in spatial and temporal dimensions.
1 code implementation • 22 May 2023 • Zhenghao Zhang, Shengfan Zhang, Zhichao Wei, Zuozhuo Dai, Siyu Zhu
Extensive experiments on the DAVIS2017-unsupervised and YoutubeVIS19\&21 datasets demonstrate the superior performance of UVOSAM without mask supervision compared to existing mask-supervised methods, as well as its ability to generalize to weakly-annotated video datasets.
no code implementations • 20 Jan 2023 • Zhenghao Zhang, Fangtao Shao, Zuozhuo Dai, Siyu Zhu
In this paper, we observe the temporal information is important as well and we propose TAFormer to aggregate spatio-temporal features both in transformer encoder and decoder.
no code implementations • 23 May 2022 • Xiaodong Gu, Chengzhou Tang, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Ping Tan
In the experiments, we evaluate the proposed method on both the 3D scene flow estimation and the point cloud registration task.
1 code implementation • CVPR 2022 • Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, Ping Tan
While recent works design increasingly complicated and powerful networks to directly regress the depth map, we take the path of CRFs optimization.
Ranked #1 on
Depth Prediction
on Matterport3D
no code implementations • CVPR 2022 • Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, Ping Tan
Estimating the accurate depth from a single image is challenging since it is inherently ambiguous and ill-posed.
1 code implementation • CVPR 2022 • Xiaodong Gu, Chengzhou Tang, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Ping Tan
In the experiments, we evaluate the proposed method on both the 3D scene flow estimation and the point cloud registration task.
1 code implementation • 24 Mar 2021 • Xiaodong Gu, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Chengzhou Tang, Zilong Dong, Ping Tan
There are increasing interests of studying the video-to-depth (V2D) problem with machine learning techniques.
4 code implementations • 22 Mar 2021 • Zuozhuo Dai, Guangyuan Wang, Weihao Yuan, Xiaoli Liu, Siyu Zhu, Ping Tan
Thus, our method can solve the problem of cluster inconsistency and be applicable to larger data sets.
Ranked #1 on
Unsupervised Person Re-Identification
on PersonX
no code implementations • 17 Oct 2020 • Rakesh Shrestha, Zhiwen Fan, Qingkun Su, Zuozhuo Dai, Siyu Zhu, Ping Tan
Deep learning based 3D shape generation methods generally utilize latent features extracted from color images to encode the semantics of objects and guide the shape generation process.
4 code implementations • CVPR 2020 • Xiaodong Gu, Zhiwen Fan, Zuozhuo Dai, Siyu Zhu, Feitong Tan, Ping Tan
The deep multi-view stereo (MVS) and stereo matching approaches generally construct 3D cost volumes to regularize and regress the output depth or disparity.
Ranked #13 on
Point Clouds
on Tanks and Temples
5 code implementations • ICCV 2019 • Zuozhuo Dai, Mingqiang Chen, Xiaodong Gu, Siyu Zhu, Ping Tan
In this paper, we propose the Batch DropBlock (BDB) Network which is a two branch network composed of a conventional ResNet-50 as the global branch and a feature dropping branch.
Ranked #8 on
Person Re-Identification
on Market-1501-C