no code implementations • 4 Jan 2025 • Kaiyuan Tian, Yani Chi, Yufan Zhou, An Liu
This paper proposes a two-timescale approach for joint MU uplink channel estimation and localization in MIMO-OFDM systems, which fully captures the spatial characteristics of MUs.
1 code implementation • 20 Dec 2024 • Shijie Zhou, Ruiyi Zhang, Yufan Zhou, Changyou Chen
Large multimodal models still struggle with text-rich images because of inadequate training data.
no code implementations • 17 Dec 2024 • Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Jing Liu, Ruiyi Zhang, Ryan A. Rossi, Hao Tan, Tong Yu, Xiang Chen, Yufan Zhou, Tong Sun, Pu Zhao, Yanzhi Wang, Jiuxiang Gu
Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing.
no code implementations • 13 Dec 2024 • Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Nanxuan Zhao, Jing Shi, Tong Sun
Compared to previous methods, SUGAR achieves state-of-the-art results in identity preservation, video dynamics, and video-text alignment for subject-driven video customization, demonstrating the effectiveness of our proposed method.
no code implementations • 10 Dec 2024 • Mingxi Lei, Chunwei Ma, Meng Ding, Yufan Zhou, Ziyun Huang, Jinhui Xu
Deep learning models often struggle with generalization when deploying on real-world data, due to the common distributional shift to the training data.
no code implementations • 2 Nov 2024 • Jian Chen, Ruiyi Zhang, Yufan Zhou, Tong Yu, Franck Dernoncourt, Jiuxiang Gu, Ryan A. Rossi, Changyou Chen, Tong Sun
In this work, we present a novel framework named LoRA-Contextualizing Adaptation of Large multimodal models (LoCAL), which broadens the capabilities of any LMM to support long-document understanding.
1 code implementation • 9 Oct 2024 • Jian Chen, Ruiyi Zhang, Yufan Zhou, Jennifer Healey, Jiuxiang Gu, Zhiqiang Xu, Changyou Chen
Automatic generation of graphical layouts is crucial for many real-world applications, including designing posters, flyers, advertisements, and graphical user interfaces.
1 code implementation • 4 Oct 2024 • Haibo Wang, Zhiyang Xu, Yu Cheng, Shizhe Diao, Yufan Zhou, Yixin Cao, Qifan Wang, Weifeng Ge, Lifu Huang
Video Large Language Models (Video-LLMs) have demonstrated remarkable capabilities in coarse-grained video understanding, however, they struggle with fine-grained temporal grounding.
1 code implementation • 26 Aug 2024 • Jian Chen, Ruiyi Zhang, Yufan Zhou, Ryan Rossi, Jiuxiang Gu, Changyou Chen
Large multimodal models (LMMs) have demonstrated impressive capabilities in understanding various types of image, including text-rich images.
no code implementations • 27 Jul 2024 • Ruiyi Zhang, Yufan Zhou, Jian Chen, Jiuxiang Gu, Changyou Chen, Tong Sun
Large multimodal language models have demonstrated impressive capabilities in understanding and manipulating images.
no code implementations • 24 Jul 2024 • An Liu, Yufan Zhou, Wenkang Xu
We investigate the problem of recovering a structured sparse signal from a linear observation model with an uncertain dynamic grid in the sensing matrix.
no code implementations • 17 Jun 2024 • Jianyi Zhang, Yufan Zhou, Jiuxiang Gu, Curtis Wigington, Tong Yu, Yiran Chen, Tong Sun, Ruiyi Zhang
Diffusion models have demonstrated exceptional capabilities in generating a broad spectrum of visual content, yet their proficiency in rendering text is still limited: they often generate inaccurate characters or words that fail to blend well with the underlying image.
no code implementations • 13 Jun 2024 • Yufan Zhou, Ruiyi Zhang, Kaizhi Zheng, Nanxuan Zhao, Jiuxiang Gu, Zichao Wang, Xin Eric Wang, Tong Sun
Our dataset is 5 times the size of previous largest dataset, yet our cost is tens of thousands of GPU hours lower.
1 code implementation • CVPR 2024 • Ruiyi Zhang, Yanzhe Zhang, Jian Chen, Yufan Zhou, Jiuxiang Gu, Changyou Chen, Tong Sun
In this work, we introduce TRINS: a Text-Rich image INStruction dataset, with the objective of enhancing the reading ability of the multimodal large language model.
no code implementations • 16 Feb 2024 • Zihao Lin, Mohammad Beigi, Hongxuan Li, Yufan Zhou, Yuxiang Zhang, Qifan Wang, Wenpeng Yin, Lifu Huang
Our in-depth study advocates more careful use of ME in real-world scenarios.
1 code implementation • 7 Feb 2024 • Jian Chen, Ruiyi Zhang, Yufan Zhou, Rajiv Jain, Zhiqiang Xu, Ryan Rossi, Changyou Chen
Controllable layout generation refers to the process of creating a plausible visual arrangement of elements within a graphic design (e. g., document and web designs) with constraints representing design intentions.
1 code implementation • CVPR 2024 • Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Tong Sun
Some existing methods do not require fine-tuning, while their performance are unsatisfactory.
2 code implementations • 29 Jun 2023 • Yanzhe Zhang, Ruiyi Zhang, Jiuxiang Gu, Yufan Zhou, Nedim Lipka, Diyi Yang, Tong Sun
Instruction tuning unlocks the superior capability of Large Language Models (LLM) to interact with humans.
1 code implementation • 23 May 2023 • Yufan Zhou, Ruiyi Zhang, Tong Sun, Jinhui Xu
However, generating images of novel concept provided by the user input image is still a challenging task.
1 code implementation • 9 May 2023 • Jianyi Zhang, Saeed Vahidian, Martin Kuo, Chunyuan Li, Ruiyi Zhang, Tong Yu, Yufan Zhou, Guoyin Wang, Yiran Chen
This repository offers a foundational framework for exploring federated fine-tuning of LLMs using heterogeneous instructions across diverse categories.
no code implementations • 25 Dec 2022 • Yufan Zhou, Haiwei Dong, Abdulmotaleb El Saddik
In this paper, we study the task of 3D human pose estimation from depth images.
1 code implementation • CVPR 2023 • Yufan Zhou, Bingchen Liu, Yizhe Zhu, Xiao Yang, Changyou Chen, Jinhui Xu
Unlike the baseline diffusion model used in DALL-E 2, our method seamlessly encodes prior knowledge of the pre-trained CLIP model in its diffusion process by designing a new initialization distribution and a new transition step of the diffusion.
Ranked #3 on Text-to-Image Generation on Multi-Modal-CelebA-HQ
no code implementations • 25 Oct 2022 • Yufan Zhou, Chunyuan Li, Changyou Chen, Jianfeng Gao, Jinhui Xu
The low requirement of the proposed method yields high flexibility and usability: it can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning; it can be applied on different models including generative adversarial networks (GANs) and diffusion models.
no code implementations • CVPR 2022 • Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun
One of the major challenges in training text-to-image generation models is the need of a large number of high-quality text-image pairs.
no code implementations • 7 Dec 2021 • Yufan Zhou, Chunyuan Li, Changyou Chen, Jinhui Xu
With the rapidly growing model complexity and data volume, training deep generative models (DGMs) for better performance has becoming an increasingly more important challenge.
3 code implementations • 27 Nov 2021 • Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun
One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs.
Ranked #2 on Text-to-Image Generation on Multi-Modal-CelebA-HQ
no code implementations • 10 May 2021 • Yufan Zhou, Changyou Chen, Jinhui Xu
Learning high-dimensional distributions is an important yet challenging problem in machine learning with applications in various domains.
no code implementations • 7 Feb 2021 • Yufan Zhou, Zhenyi Wang, Jiayi Xian, Changyou Chen, Jinhui Xu
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
no code implementations • ICLR 2021 • Yufan Zhou, Zhenyi Wang, Jiayi Xian, Changyou Chen, Jinhui Xu
Within this paradigm, we introduce two meta learning algorithms in RKHS, which no longer need an explicit inner-loop adaptation as in the MAML framework.
no code implementations • ICLR 2021 • Kevin J Liang, Weituo Hao, Dinghan Shen, Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin
Large-scale language models have recently demonstrated impressive empirical performance.
no code implementations • NeurIPS 2020 • Yufan Zhou, Changyou Chen, Jinhui Xu
Manifold learning is a fundamental problem in machine learning with numerous applications.
no code implementations • 16 May 2020 • Yufan Zhou, Jiayi Xian, Changyou Chen, Jinhui Xu
We then propose feature aggregation as the composition of the original neighbor-based kernel and a learnable kernel to encode feature similarities in a feature space.
1 code implementation • AAAI 2019 • Zhenyi Wang, Ping Yu, Yang Zhao, Ruiyi Zhang, Yufan Zhou, Junsong Yuan, Changyou Chen
In this paper, we focus on skeleton-based action generation and propose to model smooth and diverse transitions on a latent space of action sequences with much lower dimensionality.
Ranked #4 on Human action generation on NTU RGB+D 2D
no code implementations • 2 Dec 2019 • Yufan Zhou, Changyou Chen, Jinhui Xu
Learning with kernels is an important concept in machine learning.