Search Results for author: Minbin Huang

Found 6 papers, 6 papers with code

CAPE: Context-Adaptive Positional Encoding for Length Extrapolation

1 code implementation • 23 May 2024 • Chuanyang Zheng, Yihang Gao, Han Shi, Minbin Huang, Jingyao Li, Jing Xiong, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li

Positional encoding plays a crucial role in transformers, significantly impacting model performance and length generalization.

Paper
Code

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

1 code implementation • 14 May 2024 • Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu, Zheng Fang, Weiyan Wang, Jinbao Xue, Yangyu Tao, Jianchen Zhu, Kai Liu, Sihuan Lin, Yifu Sun, Yun Li, Dongdong Wang, Mingtao Chen, Zhichao Hu, Xiao Xiao, Yan Chen, Yuhong Liu, Wei Liu, Di Wang, Yong Yang, Jie Jiang, Qinglin Lu

For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images.

Image Generation Language Modelling +2

2,261

Paper
Code

TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation

1 code implementation • 29 Apr 2024 • Junhao Cheng, Baiqiao Yin, Kaixin Cai, Minbin Huang, Hanhui Li, Yuxin He, Xi Lu, Yue Li, Yifei Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

To address this issue, we introduce TheaterGen, a training-free framework that integrates large language models (LLMs) and text-to-image (T2I) models to provide the capability of multi-turn image generation.

Denoising Image Generation +2

Paper
Code

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

1 code implementation • 13 Mar 2024 • Minbin Huang, Yanxin Long, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu, Wei Liu

However, many of these works face challenges in identifying correct output modalities and generating coherent images accordingly as the number of output modalities increases and the conversations go deeper.

Prompt Engineering Text-to-Image Generation

2,261

Paper
Code

Boosting Visual-Language Models by Exploiting Hard Samples

1 code implementation • 9 May 2023 • Haonan Wang, Minbin Huang, Runhui Huang, Lanqing Hong, Hang Xu, Tianyang Hu, Xiaodan Liang, Zhenguo Li, Hong Cheng, Kenji Kawaguchi

In this work, we present HELIP, a cost-effective strategy tailored to enhance the performance of existing CLIP models without the need for training a model from scratch or collecting additional data.

Retrieval Zero-Shot Learning

Paper
Code

Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search

1 code implementation • CVPR 2022 • Minbin Huang, Zhijian Huang, Changlin Li, Xin Chen, Hang Xu, Zhenguo Li, Xiaodan Liang

It is able to find top 0. 16\% and 0. 29\% architectures on average on two search spaces under the budget of only 50 models.

Neural Architecture Search Relation

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.