no code implementations • 1 Oct 2024 • Laura Bravo-Sánchez, Jaewoo Heo, Zhenzhen Weng, Kuan-Chieh Wang, Serena Yeung-Levy
Social dynamics in close human interactions pose significant challenges for Human Mesh Estimation (HME), particularly due to the complexity of physical contacts and the scarcity of training data.
no code implementations • 15 Aug 2024 • Zeyu Wang, Zhenzhen Weng, Serena Yeung-Levy
Humans continuously perceive and process visual signals.
1 code implementation • 5 Jul 2024 • Zhaorun Chen, Yichao Du, Zichen Wen, Yiyang Zhou, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, Huaxiu Yao
Compared with open-source VLMs, smaller-sized scoring models can provide better feedback regarding text-image alignment and image quality, while VLMs provide more accurate feedback regarding safety and generation bias due to their stronger reasoning capabilities.
no code implementations • 26 Feb 2024 • Zeyu Wang, Zhenzhen Weng, Serena Yeung-Levy
Conventional approaches to human mesh recovery predominantly employ a region-based strategy.
no code implementations • 22 Jan 2024 • Zhenzhen Weng, Jingyuan Liu, Hao Tan, Zhan Xu, Yang Zhou, Serena Yeung-Levy, Jimei Yang
We present Human-LRM, a diffusion-guided feed-forward model that predicts the implicit field of a human from a single image.
no code implementations • CVPR 2023 • Zhenzhen Weng, Alexander S. Gorban, Jingwei Ji, Mahyar Najibi, Yin Zhou, Dragomir Anguelov
We show that by training on a large training set from Waymo Open Dataset without any human annotated keypoints, we are able to achieve reasonable performance as compared to the fully supervised approach.
no code implementations • 25 May 2023 • Zhenzhen Weng, Zeyu Wang, Serena Yeung
Recent advancements in text-to-image generation have enabled significant progress in zero-shot 3D shape generation.
1 code implementation • 16 Mar 2023 • Zhenzhen Weng, Laura Bravo-Sánchez, Serena Yeung-Levy
Recent text-to-image generative models have exhibited remarkable abilities in generating high-fidelity and photo-realistic images.
no code implementations • CVPR 2023 • Kuan-Chieh Wang, Zhenzhen Weng, Maria Xenochristou, João Pedro Araújo, Jeffrey Gu, Karen Liu, Serena Yeung
Empirically, we show that NeMo can recover 3D motion in sports using videos from the Penn Action dataset, where NeMo outperforms existing HMR methods in terms of 2D keypoint detection.
1 code implementation • 28 Dec 2022 • Kuan-Chieh Wang, Zhenzhen Weng, Maria Xenochristou, Joao Pedro Araujo, Jeffrey Gu, C. Karen Liu, Serena Yeung
Empirically, we show that NeMo can recover 3D motion in sports using videos from the Penn Action dataset, where NeMo outperforms existing HMR methods in terms of 2D keypoint detection.
1 code implementation • 21 Jun 2022 • Zhenzhen Weng, Kuan-Chieh Wang, Angjoo Kanazawa, Serena Yeung
The ability to perceive 3D human bodies from a single image has a multitude of applications ranging from entertainment and robotics to neuroscience and healthcare.
no code implementations • CVPR 2021 • Zhenzhen Weng, Mehmet Giray Ogut, Shai Limonchik, Serena Yeung
Instance segmentation is an active topic in computer vision that is usually solved by using supervised learning approaches over very large datasets composed of object level masks.
Ranked #5 on Novel Object Detection on LVIS v1.0 val
1 code implementation • CVPR 2021 • Zhenzhen Weng, Serena Yeung
Indeed, from a single image of a person placed in an indoor scene, we as humans are adept at resolving ambiguities of the human pose and room layout through our knowledge of the physical laws and prior perception of the plausible object and human poses.
2 code implementations • NeurIPS 2019 • Vincent S. Chen, Sen Wu, Zhenzhen Weng, Alexander Ratner, Christopher Ré
In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and "question" sentences might be important to a dialogue agent's language understanding for product purposes.