Search Results for author: Rujie Wu

Found 3 papers, 2 papers with code

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

no code implementations18 Mar 2024 Yue Fan, Xiaojian Ma, Rujie Wu, Yuntao Du, Jiaqi Li, Zhi Gao, Qing Li

We explore how reconciling several foundation models (large language models and vision-language models) with a novel unified memory mechanism could tackle the challenging video understanding problem, especially capturing the long-term temporal relations in lengthy videos.

Video Understanding

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World

1 code implementation16 Oct 2023 Rujie Wu, Xiaojian Ma, Zhenliang Zhang, Wei Wang, Qing Li, Song-Chun Zhu, Yizhou Wang

We even conceived a neuro-symbolic reasoning approach that reconciles LLMs & VLMs with logical reasoning to emulate the human problem-solving process for Bongard Problems.

Few-Shot Learning Logical Reasoning +1

Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection

1 code implementation22 Jul 2022 Hang Ye, Wentao Zhu, Chunyu Wang, Rujie Wu, Yizhou Wang

While the voxel-based methods have achieved promising results for multi-person 3D pose estimation from multi-cameras, they suffer from heavy computation burdens, especially for large scenes.

3D Multi-Person Pose Estimation 3D Pose Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.