no code implementations • 4 Dec 2024 • Ruili Feng, Han Zhang, Zhantao Yang, Jie Xiao, Zhilei Shu, Zhiheng Liu, Andy Zheng, Yukun Huang, Yu Liu, Hongyang Zhang
We present The Matrix, the first foundational realistic world simulator capable of generating continuous 720p high-fidelity real-scene video streams with real-time, responsive control in both first- and third-person perspectives, enabling immersive exploration of richly dynamic environments.
no code implementations • 28 Oct 2024 • Zhantao Yang, Han Zhang, Fangyi Chen, Anudeepsekhar Bolimera, Marios Savvides
For e-commerce, an efficient and low-cost automated knowledge graph construction method is the foundation of enabling various successful downstream applications.
no code implementations • 3 Jul 2024 • Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng
This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation.
1 code implementation • 30 May 2024 • Fangyi Chen, Han Zhang, Zhantao Yang, Hao Chen, Kai Hu, Marios Savvides
Open-vocabulary object detection (OVD) requires solid modeling of the region-semantic relationship, which could be learned from massive region-text pairs.
Ranked #12 on
Open Vocabulary Object Detection
on LVIS v1.0
(using extra training data)
no code implementations • 20 Jun 2023 • Zhantao Yang, Ruili Feng, Han Zhang, Yujun Shen, Kai Zhu, Lianghua Huang, Yifei Zhang, Yu Liu, Deli Zhao, Jingren Zhou, Fan Cheng
Diffusion models, which employ stochastic differential equations to sample images through integrals, have emerged as a dominant class of generative models.
no code implementations • CVPR 2023 • Han Zhang, Ruili Feng, Zhantao Yang, Lianghua Huang, Yu Liu, Yifei Zhang, Yujun Shen, Deli Zhao, Jingren Zhou, Fan Cheng
Diffusion models, which learn to reverse a signal destruction process to generate new data, typically require the signal at each step to have the same dimension.