Search Results for author: Weichen Zhang

Found 12 papers, 3 papers with code

UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces

no code implementations8 Mar 2025 Baining Zhao, Jianjie Fang, Zichao Dai, Ziyou Wang, Jirong Zha, Weichen Zhang, Chen Gao, Yue Wang, Jinqiang Cui, Xinlei Chen, Yong Li

Large multimodal models exhibit remarkable intelligence, yet their embodied cognitive abilities during motion in open-ended urban 3D space remain to be explored.

Benchmarking counterfactual +1

Understanding and Evaluating Hallucinations in 3D Visual Language Models

no code implementations18 Feb 2025 Ruiying Peng, Kaiyuan Li, Weichen Zhang, Chen Gao, Xinlei Chen, Yong Li

Recently, 3D-LLMs, which combine point-cloud encoders with large models, have been proposed to tackle complex tasks in embodied intelligence and scene understanding.

Diversity Scene Understanding

EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment

no code implementations12 Oct 2024 Chen Gao, Baining Zhao, Weichen Zhang, Jinzhu Mao, Jun Zhang, Zhiheng Zheng, Fanhang Man, Jianjie Fang, Zile Zhou, Jinqiang Cui, Xinlei Chen, Yong Li

To address it, in this paper, we construct a benchmark platform for embodied intelligence evaluation in real-world city environments.

GUI Action Narrator: Where and When Did That Action Take Place?

no code implementations19 Jun 2024 Qinchen Wu, Difei Gao, Kevin Qinghong Lin, Zhuoyu Wu, Xiangwu Guo, Peiran Li, Weichen Zhang, Hengxu Wang, Mike Zheng Shou

The advent of Multimodal LLMs has significantly enhanced image OCR recognition capabilities, making GUI automation a viable reality for increasing efficiency in digital tasks.

Optical Character Recognition (OCR) Video Captioning

ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

1 code implementation20 Dec 2023 Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou

Graphical User Interface (GUI) automation holds significant promise for assisting users with complex tasks, thereby boosting human productivity.

Language Modelling Large Language Model

MA-NeRF: Motion-Assisted Neural Radiance Fields for Face Synthesis from Sparse Images

no code implementations17 Jun 2023 Weichen Zhang, Xiang Zhou, Yukang Cao, Wensen Feng, Chun Yuan

We improve from NeRF and propose a novel framework that, by leveraging the parametric 3DMM models, can reconstruct a high-fidelity drivable face avatar and successfully handle the unseen expressions.

Face Generation NeRF +1

Towards Arbitrary Text-driven Image Manipulation via Space Alignment

no code implementations25 Jan 2023 Yunpeng Bai, Zihan Zhong, Chao Dong, Weichen Zhang, Guowei Xu, Chun Yuan

Then, the text input can be directly accessed into the StyleGAN space and be used to find the semantic shift according to the text description.

Attribute Image Manipulation

SRDAN: Scale-Aware and Range-Aware Domain Adaptation Network for Cross-Dataset 3D Object Detection

1 code implementation CVPR 2021 Weichen Zhang, Wen Li, Dong Xu

In this work, we propose a new cross-dataset 3D object detection method named Scale-aware and Range-aware Domain Adaptation Network (SRDAN).

3D Object Detection Domain Adaptation +2

Collaborative and Adversarial Network for Unsupervised Domain Adaptation

1 code implementation CVPR 2018 Weichen Zhang, Wanli Ouyang, Wen Li, Dong Xu

In this paper, we propose a new unsupervised domain adaptation approach called Collaborative and Adversarial Network (CAN) through domain-collaborative and domain-adversarial training of neural networks.

Unsupervised Domain Adaptation

Cannot find the paper you are looking for? You can Submit a new open access paper.