Search Results for author: Rujie Wu

Found 3 papers, 2 papers with code

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

no code implementations • 18 Mar 2024 • Yue Fan, Xiaojian Ma, Rujie Wu, Yuntao Du, Jiaqi Li, Zhi Gao, Qing Li

We explore how reconciling several foundation models (large language models and vision-language models) with a novel unified memory mechanism could tackle the challenging video understanding problem, especially capturing the long-term temporal relations in lengthy videos.

Video Understanding

Paper
Add Code

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World

1 code implementation • 16 Oct 2023 • Rujie Wu, Xiaojian Ma, Zhenliang Zhang, Wei Wang, Qing Li, Song-Chun Zhu, Yizhou Wang

We even conceived a neuro-symbolic reasoning approach that reconciles LLMs & VLMs with logical reasoning to emulate the human problem-solving process for Bongard Problems.

Ranked #1 on Visual Reasoning on Bongard-OpenWorld

Few-Shot Learning Logical Reasoning +1

Paper
Code

Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection

1 code implementation • 22 Jul 2022 • Hang Ye, Wentao Zhu, Chunyu Wang, Rujie Wu, Yizhou Wang

While the voxel-based methods have achieved promising results for multi-person 3D pose estimation from multi-cameras, they suffer from heavy computation burdens, especially for large scenes.

Ranked #5 on 3D Multi-Person Pose Estimation on Campus

3D Multi-Person Pose Estimation 3D Pose Estimation

139

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.