Search Results for author: Zhantao Yang

Found 6 papers, 1 papers with code

The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control

no code implementations4 Dec 2024 Ruili Feng, Han Zhang, Zhantao Yang, Jie Xiao, Zhilei Shu, Zhiheng Liu, Andy Zheng, Yukun Huang, Yu Liu, Hongyang Zhang

We present The Matrix, the first foundational realistic world simulator capable of generating continuous 720p high-fidelity real-scene video streams with real-time, responsive control in both first- and third-person perspectives, enabling immersive exploration of richly dynamic environments.

Zero-shot Generalization

Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce

no code implementations28 Oct 2024 Zhantao Yang, Han Zhang, Fangyi Chen, Anudeepsekhar Bolimera, Marios Savvides

For e-commerce, an efficient and low-cost automated knowledge graph construction method is the foundation of enabling various successful downstream applications.

Benchmarking graph construction +3

BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

no code implementations3 Jul 2024 Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng

This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation.

Image Generation Question Answering +1

RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection

1 code implementation30 May 2024 Fangyi Chen, Han Zhang, Zhantao Yang, Hao Chen, Kai Hu, Marios Savvides

Open-vocabulary object detection (OVD) requires solid modeling of the region-semantic relationship, which could be learned from massive region-text pairs.

Ranked #12 on Open Vocabulary Object Detection on LVIS v1.0 (using extra training data)

Image Captioning Image Inpainting +4

Lipschitz Singularities in Diffusion Models

no code implementations20 Jun 2023 Zhantao Yang, Ruili Feng, Han Zhang, Yujun Shen, Kai Zhu, Lianghua Huang, Yifei Zhang, Yu Liu, Deli Zhao, Jingren Zhou, Fan Cheng

Diffusion models, which employ stochastic differential equations to sample images through integrals, have emerged as a dominant class of generative models.

Dimensionality-Varying Diffusion Process

no code implementations CVPR 2023 Han Zhang, Ruili Feng, Zhantao Yang, Lianghua Huang, Yu Liu, Yifei Zhang, Yujun Shen, Deli Zhao, Jingren Zhou, Fan Cheng

Diffusion models, which learn to reverse a signal destruction process to generate new data, typically require the signal at each step to have the same dimension.

Image Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.