Search Results for author: Yiming Zhao

Found 22 papers, 9 papers with code

CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments

1 code implementation2 Mar 2025 Mingcong Lei, Ge Wang, Yiming Zhao, Zhixin Mai, Qing Zhao, Yao Guo, Zhen Li, Shuguang Cui, Yatong Han, Jinke Ren

To address these limitations in dynamic environments, we propose Closed-Loop Embodied Agent (CLEA) -- a novel architecture incorporating four specialized open-source LLMs with functional decoupling for closed-loop task management.

Task Planning

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

no code implementations25 Feb 2025 Yifan Pu, Yiming Zhao, Zhicong Tang, Ruihong Yin, Haoxing Ye, Yuhui Yuan, Dong Chen, Jianmin Bao, Sirui Zhang, Yanbin Wang, Lin Liang, Lijuan Wang, Ji Li, Xiu Li, Zhouhui Lian, Gao Huang, Baining Guo

In this paper, we introduce the Anonymous Region Transformer (ART), which facilitates the direct generation of variable multi-layer transparent images based on a global text prompt and an anonymous region layout.

Image Generation

STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning

no code implementations14 Feb 2025 Mingcong Lei, Yiming Zhao, Ge Wang, Zhixin Mai, Shuguang Cui, Yatong Han, Jinke Ren

A key objective of embodied intelligence is enabling agents to perform long-horizon tasks in dynamic environments while maintaining robust decision-making and adaptability.

Decision Making Spatial Reasoning +1

Neural-Polyptych: Content Controllable Painting Recreation for Diverse Genres

no code implementations29 Sep 2024 Yiming Zhao, Dewen Guo, Zhouhui Lian, Yue Gao, Jianhong Han, Jie Feng, Guoping Wang, Bingfeng Zhou, Sheng Li

To bridge the gap between artists and non-specialists, we present a unified framework, Neural-Polyptych, to facilitate the creation of expansive, high-resolution paintings by seamlessly incorporating interactive hand-drawn sketches with fragments from original paintings.

EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision

no code implementations3 Sep 2024 Yiming Zhao, Taein Kwon, Paul Streli, Marc Pollefeys, Christian Holz

However, estimating these interactions from an egocentric camera perspective is challenging, largely due to the lack of comprehensive datasets that provide both accurate hand poses on contacting surfaces and detailed annotations of pressure information.

Benchmarking Mixed Reality +1

Pano2Room: Novel View Synthesis from a Single Indoor Panorama

1 code implementation21 Aug 2024 Guo Pu, Yiming Zhao, Zhouhui Lian

The key idea is to initially construct a preliminary mesh from the input panorama, and iteratively refine this mesh using a panoramic RGBD inpainter while collecting photo-realistic 3D-consistent pseudo novel views.

Novel View Synthesis

Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

no code implementations14 Jun 2024 Zeyu Liu, Weicong Liang, Yiming Zhao, Bohan Chen, Lin Liang, Lijuan Wang, Ji Li, Yuhui Yuan

With the combination of these techniques, we deliver a powerful customized multilingual text encoder, Glyph-ByT5-v2, and a strong aesthetic graphic generation model, Glyph-SDXL-v2, that can support accurate spelling in 10 different languages.

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

1 code implementation8 Dec 2023 Yiming Zhao, Zhouhui Lian

Text-to-Image (T2I) generation methods based on diffusion model have garnered significant attention in the last few years.

Image Generation Scene Text Editing

Segment Anything Model-guided Collaborative Learning Network for Scribble-supervised Polyp Segmentation

no code implementations1 Dec 2023 Yiming Zhao, Tao Zhou, Yunqi Gu, Yi Zhou, Yizhe Zhang, Ye Wu, Huazhu Fu

Specifically, we first propose a Cross-level Enhancement and Aggregation Network (CEA-Net) for weakly-supervised polyp segmentation.

Segmentation Weakly supervised segmentation

Human from Blur: Human Pose Tracking from Blurry Images

no code implementations ICCV 2023 Yiming Zhao, Denys Rozumnyi, Jie Song, Otmar Hilliges, Marc Pollefeys, Martin R. Oswald

The key idea is to tackle the inverse problem of image deblurring by modeling the forward problem with a 3D human model, a texture map, and a sequence of poses to describe human motion.

Deblurring Image Deblurring +2

VQNet 2.0: A New Generation Machine Learning Framework that Unifies Classical and Quantum

no code implementations9 Jan 2023 Huanyu Bian, Zhilong Jia, Menghan Dou, Yuan Fang, Lei LI, Yiming Zhao, Hanchao Wang, Zhaohui Zhou, Wei Wang, Wenyu Zhu, Ye Li, Yang Yang, Weiming Zhang, Nenghai Yu, Zhaoyun Chen, Guoping Guo

Therefore, based on VQNet 1. 0, we further propose VQNet 2. 0, a new generation of unified classical and quantum machine learning framework that supports hybrid optimization.

Quantum Machine Learning Unity

A Near Sensor Edge Computing System for Point Cloud Semantic Segmentation

no code implementations12 Jul 2022 Lin Bai, Yiming Zhao, Xinming Huang

In this system, a FPGA-based deep learning accelerator core (DPU) is placed next to the LiDAR sensor, to perform point cloud pre-processing and segmentation neural network.

Autonomous Driving Decision Making +3

Automatic Expert Selection for Multi-Scenario and Multi-Task Search

no code implementations28 May 2022 Xinyu Zou, Zhi Hu, Yiming Zhao, Xuchu Ding, Zhongyi Liu, Chenliang Li, Aixin Sun

At each multi-scenario/multi-task layer, a novel expert selection algorithm is proposed to automatically identify scenario-/task-specific and shared experts for each input.

Multi-Task Learning

FIDNet: LiDAR Point Cloud Semantic Segmentation with Fully Interpolation Decoding

1 code implementation8 Sep 2021 Yiming Zhao, Lin Bai, Xinming Huang

In this paper, we propose a new projection-based LiDAR semantic segmentation pipeline that consists of a novel network structure and an efficient post-processing step.

LIDAR Semantic Segmentation Robust 3D Semantic Segmentation +1

Enabling 3D Object Detection with a Low-Resolution LiDAR

no code implementations4 May 2021 Lin Bai, Yiming Zhao, Xinming Huang

Light Detection And Ranging (LiDAR) has been widely used in autonomous vehicles for perception and localization.

3D Object Detection Autonomous Driving +3

Deep Lucas-Kanade Homography for Multimodal Image Alignment

1 code implementation CVPR 2021 Yiming Zhao, Xinming Huang, Ziming Zhang

With those properties, directly updating the Lucas-Kanade algorithm on our feature maps will precisely align image pairs with large appearance changes.

A Surface Geometry Model for LiDAR Depth Completion

1 code implementation17 Apr 2021 Yiming Zhao, Lin Bai, Ziming Zhang, Xinming Huang

Therefore, it is assumed those pixels share the same surface with the nearest LiDAR point, and their respective depth can be estimated as the nearest LiDAR depth value plus a residual error.

Depth Completion Self-Supervised Learning

A CNN Accelerator on FPGA Using Depthwise Separable Convolution

no code implementations3 Sep 2018 Lin Bai, Yiming Zhao, Xinming Huang

The state-of-the-art CNNs, such as MobileNetV2 and Xception, adopt depthwise separable convolution to replace the standard convolution for embedded platforms.

Cannot find the paper you are looking for? You can Submit a new open access paper.