Search Results for author: Letian Zhang

Found 9 papers, 3 papers with code

Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

1 code implementation18 Jun 2024 Bingchen Zhao, Yongshuo Zong, Letian Zhang, Timothy Hospedales

The advancement of large language models (LLMs) has significantly broadened the scope of applications in natural language processing, with multi-modal LLMs extending these capabilities to integrate and interpret visual data.

ImageNet3D: Towards General-Purpose Object-Level 3D Understanding

1 code implementation13 Jun 2024 Wufei Ma, Guanning Zeng, Guofeng Zhang, Qihao Liu, Letian Zhang, Adam Kortylewski, Yaoyao Liu, Alan Yuille

A vision model with general-purpose object-level 3D understanding should be capable of inferring both 2D (e. g., class name and bounding box) and 3D information (e. g., 3D location and 3D viewpoint) for arbitrary rigid objects in natural images.

Image Captioning Linear Probing Object-Level 3D Awareness +2

IL-NeRF: Incremental Learning for Neural Radiance Fields with Camera Pose Alignment

no code implementations10 Dec 2023 Letian Zhang, Ming Li, Chen Chen, Jie Xu

This poses a paradox as the necessary camera pose must be estimated from the entire dataset, even though the data arrives sequentially and future chunks are inaccessible.

Incremental Learning Knowledge Distillation

E$^3$Pose: Energy-Efficient Edge-assisted Multi-camera System for Multi-human 3D Pose Estimation

no code implementations21 Jan 2023 Letian Zhang, Jie Xu

To achieve this goal, E$^3$Pose incorporates an attention-based LSTM to predict the occlusion information of each camera view and guide camera selection before cameras are selected to process the images of a scene, and runs a camera selection algorithm based on the Lyapunov optimization framework to make long-term adaptive selection decisions.

3D Pose Estimation

Autodidactic Neurosurgeon: Collaborative Deep Inference for Mobile Edge Intelligence via Online Learning

no code implementations2 Feb 2021 Letian Zhang, Lixing Chen, Jie Xu

The basic idea of this system is to partition a deep neural network (DNN) into a front-end part running on the mobile device and a back-end part running on the edge server, with the key challenge being how to locate the optimal partition point to minimize the end-to-end inference delay.

Decision Making object-detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.