Search Results for author: Yufan Zhou

Found 34 papers, 13 papers with code

Scattering Environment Aware Joint Multi-user Channel Estimation and Localization with Spatially Reused Pilots

no code implementations4 Jan 2025 Kaiyuan Tian, Yani Chi, Yufan Zhou, An Liu

This paper proposes a two-timescale approach for joint MU uplink channel estimation and localization in MIMO-OFDM systems, which fully captures the spatial characteristics of MUs.

A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction Generation

1 code implementation20 Dec 2024 Shijie Zhou, Ruiyi Zhang, Yufan Zhou, Changyou Chen

Large multimodal models still struggle with text-rich images because of inadequate training data.

Image Captioning

Numerical Pruning for Efficient Autoregressive Models

no code implementations17 Dec 2024 Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Jing Liu, Ruiyi Zhang, Ryan A. Rossi, Hao Tan, Tong Yu, Xiang Chen, Yufan Zhou, Tong Sun, Pu Zhao, Yanzhi Wang, Jiuxiang Gu

Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing.

Decoder Image Generation

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner

no code implementations13 Dec 2024 Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Nanxuan Zhao, Jing Shi, Tong Sun

Compared to previous methods, SUGAR achieves state-of-the-art results in identity preservation, video dynamics, and video-text alignment for subject-driven video customization, demonstrating the effectiveness of our proposed method.

TTVD: Towards a Geometric Framework for Test-Time Adaptation Based on Voronoi Diagram

no code implementations10 Dec 2024 Mingxi Lei, Chunwei Ma, Meng Ding, Yufan Zhou, Ziyun Huang, Jinhui Xu

Deep learning models often struggle with generalization when deploying on real-world data, due to the common distributional shift to the training data.

Test-time Adaptation

LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding

no code implementations2 Nov 2024 Jian Chen, Ruiyi Zhang, Yufan Zhou, Tong Yu, Franck Dernoncourt, Jiuxiang Gu, Ryan A. Rossi, Changyou Chen, Tong Sun

In this work, we present a novel framework named LoRA-Contextualizing Adaptation of Large multimodal models (LoCAL), which broadens the capabilities of any LMM to support long-document understanding.

document understanding Question Answering +1

TextLap: Customizing Language Models for Text-to-Layout Planning

1 code implementation9 Oct 2024 Jian Chen, Ruiyi Zhang, Yufan Zhou, Jennifer Healey, Jiuxiang Gu, Zhiqiang Xu, Changyou Chen

Automatic generation of graphical layouts is crucial for many real-world applications, including designing posters, flyers, advertisements, and graphical user interfaces.

Image Generation Natural Language Understanding

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

1 code implementation4 Oct 2024 Haibo Wang, Zhiyang Xu, Yu Cheng, Shizhe Diao, Yufan Zhou, Yixin Cao, Qifan Wang, Weifeng Ge, Lifu Huang

Video Large Language Models (Video-LLMs) have demonstrated remarkable capabilities in coarse-grained video understanding, however, they struggle with fine-grained temporal grounding.

Dense Video Captioning Sentence +1

MMR: Evaluating Reading Ability of Large Multimodal Models

1 code implementation26 Aug 2024 Jian Chen, Ruiyi Zhang, Yufan Zhou, Ryan Rossi, Jiuxiang Gu, Changyou Chen

Large multimodal models (LMMs) have demonstrated impressive capabilities in understanding various types of image, including text-rich images.

Font Recognition MMR total +4

LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models

no code implementations27 Jul 2024 Ruiyi Zhang, Yufan Zhou, Jian Chen, Jiuxiang Gu, Changyou Chen, Tong Sun

Large multimodal language models have demonstrated impressive capabilities in understanding and manipulating images.

Language Modeling Language Modelling +2

Subspace Constrained Variational Bayesian Inference for Structured Compressive Sensing with a Dynamic Grid

no code implementations24 Jul 2024 An Liu, Yufan Zhou, Wenkang Xu

We investigate the problem of recovering a structured sparse signal from a linear observation model with an uncertain dynamic grid in the sensing matrix.

Bayesian Inference Compressive Sensing

ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models and Large Language Models

no code implementations17 Jun 2024 Jianyi Zhang, Yufan Zhou, Jiuxiang Gu, Curtis Wigington, Tong Yu, Yiran Chen, Tong Sun, Ruiyi Zhang

Diffusion models have demonstrated exceptional capabilities in generating a broad spectrum of visual content, yet their proficiency in rendering text is still limited: they often generate inaccurate characters or words that fail to blend well with the underlying image.

Disentanglement Image Generation

TRINS: Towards Multimodal Language Models that Can Read

1 code implementation CVPR 2024 Ruiyi Zhang, Yanzhe Zhang, Jian Chen, Yufan Zhou, Jiuxiang Gu, Changyou Chen, Tong Sun

In this work, we introduce TRINS: a Text-Rich image INStruction dataset, with the objective of enhancing the reading ability of the multimodal large language model.

Language Modeling Language Modelling +2

Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints

1 code implementation7 Feb 2024 Jian Chen, Ruiyi Zhang, Yufan Zhou, Rajiv Jain, Zhiqiang Xu, Ryan Rossi, Changyou Chen

Controllable layout generation refers to the process of creating a plausible visual arrangement of elements within a graphic design (e. g., document and web designs) with constraints representing design intentions.

Layout Design

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

2 code implementations29 Jun 2023 Yanzhe Zhang, Ruiyi Zhang, Jiuxiang Gu, Yufan Zhou, Nedim Lipka, Diyi Yang, Tong Sun

Instruction tuning unlocks the superior capability of Large Language Models (LLM) to interact with humans.

16k Image Captioning +3

Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach

1 code implementation23 May 2023 Yufan Zhou, Ruiyi Zhang, Tong Sun, Jinhui Xu

However, generating images of novel concept provided by the user input image is still a challenging task.

Text-to-Image Generation

Towards Building the Federated GPT: Federated Instruction Tuning

1 code implementation9 May 2023 Jianyi Zhang, Saeed Vahidian, Martin Kuo, Chunyuan Li, Ruiyi Zhang, Tong Yu, Yufan Zhou, Guoyin Wang, Yiran Chen

This repository offers a foundational framework for exploring federated fine-tuning of LLMs using heterogeneous instructions across diverse categories.

Federated Learning

Shifted Diffusion for Text-to-image Generation

1 code implementation CVPR 2023 Yufan Zhou, Bingchen Liu, Yizhe Zhu, Xiao Yang, Changyou Chen, Jinhui Xu

Unlike the baseline diffusion model used in DALL-E 2, our method seamlessly encodes prior knowledge of the pre-trained CLIP model in its diffusion process by designing a new initialization distribution and a new transition step of the diffusion.

Zero-Shot Text-to-Image Generation

Lafite2: Few-shot Text-to-Image Generation

no code implementations25 Oct 2022 Yufan Zhou, Chunyuan Li, Changyou Chen, Jianfeng Gao, Jinhui Xu

The low requirement of the proposed method yields high flexibility and usability: it can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning; it can be applied on different models including generative adversarial networks (GANs) and diffusion models.

Retrieval Text-to-Image Generation

Towards Language-Free Training for Text-to-Image Generation

no code implementations CVPR 2022 Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun

One of the major challenges in training text-to-image generation models is the need of a large number of high-quality text-image pairs.

Zero-Shot Text-to-Image Generation

A Generic Approach for Enhancing GANs by Regularized Latent Optimization

no code implementations7 Dec 2021 Yufan Zhou, Chunyuan Li, Changyou Chen, Jinhui Xu

With the rapidly growing model complexity and data volume, training deep generative models (DGMs) for better performance has becoming an increasingly more important challenge.

Image Inpainting text-guided-image-editing +1

Learning High-Dimensional Distributions with Latent Neural Fokker-Planck Kernels

no code implementations10 May 2021 Yufan Zhou, Changyou Chen, Jinhui Xu

Learning high-dimensional distributions is an important yet challenging problem in machine learning with applications in various domains.

Vocal Bursts Intensity Prediction

Meta-Learning with Neural Tangent Kernels

no code implementations7 Feb 2021 Yufan Zhou, Zhenyi Wang, Jiayi Xian, Changyou Chen, Jinhui Xu

We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.

Meta-Learning

Meta-Learning in Reproducing Kernel Hilbert Space

no code implementations ICLR 2021 Yufan Zhou, Zhenyi Wang, Jiayi Xian, Changyou Chen, Jinhui Xu

Within this paradigm, we introduce two meta learning algorithms in RKHS, which no longer need an explicit inner-loop adaptation as in the MAML framework.

Meta-Learning

Graph Neural Networks with Composite Kernels

no code implementations16 May 2020 Yufan Zhou, Jiayi Xian, Changyou Chen, Jinhui Xu

We then propose feature aggregation as the composition of the original neighbor-based kernel and a learnable kernel to encode feature similarities in a feature space.

Graph Attention

Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions

1 code implementation AAAI 2019 Zhenyi Wang, Ping Yu, Yang Zhao, Ruiyi Zhang, Yufan Zhou, Junsong Yuan, Changyou Chen

In this paper, we focus on skeleton-based action generation and propose to model smooth and diverse transitions on a latent space of action sequences with much lower dimensionality.

Action Generation Decoder +1

Cannot find the paper you are looking for? You can Submit a new open access paper.