Search Results for author: Yutong Feng

Found 30 papers, 8 papers with code

In-Context LoRA for Diffusion Transformers

1 code implementation31 Oct 2024 Lianghua Huang, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Huanzhang Dou, Chen Liang, Yutong Feng, Yu Liu, Jingren Zhou

While task-specific in terms of tuning data, our framework remains task-agnostic in architecture and pipeline, offering a powerful tool for the community and providing valuable insights for further research on product-level task-agnostic generation systems.

Image Generation

Group Diffusion Transformers are Unsupervised Multitask Learners

no code implementations19 Oct 2024 Lianghua Huang, Wei Wang, Zhi-Fan Wu, Huanzhang Dou, Yupeng Shi, Yutong Feng, Chen Liang, Yu Liu, Jingren Zhou

In this work, we introduce Group Diffusion Transformers (GDTs), a novel framework that unifies diverse visual generation tasks by redefining them as a group generation problem.

Colorization Style Transfer

DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control

no code implementations17 Oct 2024 Yujie Wei, Shiwei Zhang, Hangjie Yuan, Xiang Wang, Haonan Qiu, Rui Zhao, Yutong Feng, Feng Liu, Zhizhong Huang, Jiaxin Ye, Yingya Zhang, Hongming Shan

In this paper, we present DreamVideo-2, a zero-shot video customization framework capable of generating videos with a specific subject and motion trajectory, guided by a single image and a bounding box sequence, respectively, and without the need for test-time fine-tuning.

Video Generation

Zero-shot Image Editing with Reference Imitation

no code implementations11 Jun 2024 Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, Hengshuang Zhao

Image editing serves as a practical yet challenging task considering the diverse demands from users, where one of the hardest parts is to precisely describe how the edited image should look like.

Semantic correspondence

FlashFace: Human Image Personalization with High-fidelity Identity Preservation

no code implementations25 Mar 2024 Shilong Zhang, Lianghua Huang, Xi Chen, Yifei Zhang, Zhi-Fan Wu, Yutong Feng, Wei Wang, Yujun Shen, Yu Liu, Ping Luo

This work presents FlashFace, a practical tool with which users can easily personalize their own photos on the fly by providing one or a few reference face images and a text prompt.

Face Swapping Instruction Following +1

Spatio-Temporal Field Neural Networks for Air Quality Inference

no code implementations2 Mar 2024 Yutong Feng, Qiongyan Wang, Yutong Xia, Junlin Huang, Siru Zhong, Yuxuan Liang

The air quality inference problem aims to utilize historical data from a limited number of observation sites to infer the air quality index at an unknown location.

Air Quality Inference

Check Locate Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

no code implementations CVPR 2024 Biao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu

To align the generated image with layout instructions we present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time.

Text-to-Image Generation

LivePhoto: Real Image Animation with Text-guided Motion Control

no code implementations5 Dec 2023 Xi Chen, Zhiheng Liu, Mengting Chen, Yutong Feng, Yu Liu, Yujun Shen, Hengshuang Zhao

In particular, considering the facts that (1) text can only describe motions roughly (e. g., regardless of the moving speed) and (2) text may include both content and motion descriptions, we introduce a motion intensity estimation module as well as a text re-weighting module to reduce the ambiguity of text-to-motion mapping.

Image Animation Text-to-Video Generation +1

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

1 code implementation CVPR 2024 Yutong Feng, Biao Gong, Di Chen, Yujun Shen, Yu Liu, Jingren Zhou

Existing text-to-image (T2I) diffusion models usually struggle in interpreting complex prompts, especially those with quantity, object-attribute binding, and multi-subject descriptions.

Attribute Denoising +1

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

no code implementations27 Nov 2023 Biao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu

To align the generated image with layout instructions, we present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time.

Text-to-Image Generation

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

no code implementations CVPR 2024 Siteng Huang, Biao Gong, Yutong Feng, Xi Chen, Yuqian Fu, Yu Liu, Donglin Wang

Experimental results show that existing subject-driven customization methods fail to learn the representative characteristics of actions and struggle in decoupling actions from context features, including appearance.

Text-to-Image Generation

Incentive Mechanism Design for Unbiased Federated Learning with Randomized Client Participation

no code implementations17 Apr 2023 Bing Luo, Yutong Feng, Shiqiang Wang, Jianwei Huang, Leandros Tassiulas

Incentive mechanism is crucial for federated learning (FL) when rational clients do not have the same interests in the global model as the server.

Federated Learning

Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning

1 code implementation CVPR 2024 Siteng Huang, Biao Gong, Yutong Feng, Min Zhang, Yiliang Lv, Donglin Wang

Recent compositional zero-shot learning (CZSL) methods adapt pre-trained vision-language models (VLMs) by constructing trainable prompts only for composed state-object pairs.

Compositional Zero-Shot Learning Object

ViM: Vision Middleware for Unified Downstream Transferring

no code implementations ICCV 2023 Yutong Feng, Biao Gong, Jianwen Jiang, Yiliang Lv, Yujun Shen, Deli Zhao, Jingren Zhou

ViM consists of a zoo of lightweight plug-in modules, each of which is independently learned on a midstream dataset with a shared frozen backbone.

Grow and Merge: A Unified Framework for Continuous Categories Discovery

no code implementations9 Oct 2022 Xinwei Zhang, Jianwen Jiang, Yutong Feng, Zhi-Fan Wu, Xibin Zhao, Hai Wan, Mingqian Tang, Rong Jin, Yue Gao

Although a number of studies are devoted to novel category discovery, most of them assume a static setting where both labeled and unlabeled data are given at once for finding new categories.

Self-Supervised Learning

Rethinking Supervised Pre-training for Better Downstream Transferring

no code implementations ICLR 2022 Yutong Feng, Jianwen Jiang, Mingqian Tang, Rong Jin, Yue Gao

Though for most cases, the pre-training stage is conducted based on supervised methods, recent works on self-supervised pre-training have shown powerful transferability and even outperform supervised pre-training on multiple downstream tasks.

Open-Ended Question Answering

Weakly-Supervised Temporal Action Localization Through Local-Global Background Modeling

no code implementations20 Jun 2021 Xiang Wang, Zhiwu Qing, Ziyuan Huang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Yuanjie Shao, Nong Sang

Then our proposed Local-Global Background Modeling Network (LGBM-Net) is trained to localize instances by using only video-level labels based on Multi-Instance Learning (MIL).

Weakly-supervised Learning Weakly-supervised Temporal Action Localization +1

Relation Modeling in Spatio-Temporal Action Localization

no code implementations15 Jun 2021 Yutong Feng, Jianwen Jiang, Ziyuan Huang, Zhiwu Qing, Xiang Wang, Shiwei Zhang, Mingqian Tang, Yue Gao

This paper presents our solution to the AVA-Kinetics Crossover Challenge of ActivityNet workshop at CVPR 2021.

Ranked #4 on Spatio-Temporal Action Localization on AVA-Kinetics (using extra training data)

Action Detection Relation +2

A Stronger Baseline for Ego-Centric Action Detection

1 code implementation13 Jun 2021 Zhiwu Qing, Ziyuan Huang, Xiang Wang, Yutong Feng, Shiwei Zhang, Jianwen Jiang, Mingqian Tang, Changxin Gao, Marcelo H. Ang Jr, Nong Sang

This technical report analyzes an egocentric video action detection method we used in the 2021 EPIC-KITCHENS-100 competition hosted in CVPR2021 Workshop.

Action Detection

Incremental Learning on Growing Graphs

no code implementations1 Jan 2021 Yutong Feng, Jianwen Jiang, Yue Gao

To tackle this problem, we introduce incremental graph learning (IGL), a general framework to formulate the learning on growing graphs in an incremental manner, where traditional graph learning method could be deployed as a basic model.

Graph Learning Incremental Learning +2

Event Stream Super-Resolution via Spatiotemporal Constraint Learning

no code implementations ICCV 2021 Siqi Li, Yutong Feng, Yipeng Li, Yu Jiang, Changqing Zou, Yue Gao

Therefore, it is imperative to explore the algorithm of event stream super-resolution, which is a non-trivial task due to the sparsity and strong spatio-temporal correlation of the events from an event camera.

Image Reconstruction Philosophy +1

Design of High-Frequency Trading Algorithm Based on Machine Learning

no code implementations21 Dec 2019 Boyue Fang, Yutong Feng

Based on iterative optimization and activation function in deep learning, we proposed a new analytical framework of high-frequency trading information, that reduced structural loss in the assembly of Volume-synchronized probability of Informed Trading ($VPIN$), Generalized Autoregressive Conditional Heteroscedasticity (GARCH) and Support Vector Machine (SVM) to make full use of the order book information.

Trading and Market Microstructure

MeshNet: Mesh Neural Network for 3D Shape Representation

2 code implementations28 Nov 2018 Yutong Feng, Yifan Feng, Haoxuan You, Xibin Zhao, Yue Gao

However, there is little effort on using mesh data in recent years, due to the complexity and irregularity of mesh data.

3D Shape Classification 3D Shape Representation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.