Search Results for author: Chuan Li

Found 15 papers, 7 papers with code

A 2D Semantic-Aware Position Encoding for Vision Transformers

no code implementations14 May 2025 Xi Chen, Shiyang Zhou, Muqi Huang, Jiaxu Feng, Yun Xiong, Kun Zhou, Biao Yang, Yuhui Zhang, Huishuai Bao, Sijia Peng, Chuan Li, Feng Shi

Traditional approaches like absolute position encoding and relative position encoding primarily focus on 1D linear position relationship, often neglecting the semantic similarity between distant yet contextually related patches.

Position Semantic Similarity +2

NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks

no code implementations28 Apr 2025 Chia-Yu Hung, Qi Sun, Pengfei Hong, Amir Zadeh, Chuan Li, U-Xuan Tan, Navonil Majumder, Soujanya Poria

Existing Visual-Language-Action (VLA) models have shown promising performance in zero-shot scenarios, demonstrating impressive task execution and reasoning capabilities.

Task Planning Vision-Language-Action +1

Goku: Flow Based Video Generative Foundation Models

no code implementations7 Feb 2025 Shoufa Chen, Chongjian Ge, Yuqi Zhang, Yida Zhang, Fengda Zhu, Hao Yang, Hongxiang Hao, Hui Wu, Zhichao Lai, Yifei Hu, Ting-Che Lin, Shilong Zhang, Fu Li, Chuan Li, Xing Wang, Yanghua Peng, Peize Sun, Ping Luo, Yi Jiang, Zehuan Yuan, Bingyue Peng, Xiaobing Liu

This paper introduces Goku, a state-of-the-art family of joint image-and-video generation models leveraging rectified flow Transformers to achieve industry-leading performance.

Text-to-Image Generation Video Generation

Scalable Language Models with Posterior Inference of Latent Thought Vectors

no code implementations3 Feb 2025 Deqian Kong, Minglu Zhao, Dehong Xu, Bo Pang, Shu Wang, Edouardo Honig, Zhangzhang Si, Chuan Li, Jianwen Xie, Sirui Xie, Ying Nian Wu

We propose a novel family of language models, Latent-Thought Language Models (LTMs), which incorporate explicit latent thought vectors that follow an explicit prior model in latent space.

Decoder Language Modeling +2

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

1 code implementation30 Dec 2024 Chia-Yu Hung, Navonil Majumder, Zhifeng Kong, Ambuj Mehrish, Amir Ali Bagherzadeh, Chuan Li, Rafael Valle, Bryan Catanzaro, Soujanya Poria

We introduce TangoFlux, an efficient Text-to-Audio (TTA) generative model with 515M parameters, capable of generating up to 30 seconds of 44. 1kHz audio in just 3. 7 seconds on a single A40 GPU.

Audio Generation

Utilizing Large Language Models for Natural Interface to Pharmacology Databases

no code implementations26 Jul 2023 Hong Lu, Chuan Li, Yinheng Li, Jie Zhao

The drug development process necessitates that pharmacologists undertake various tasks, such as reviewing literature, formulating hypotheses, designing experiments, and interpreting results.

Language Modeling Language Modelling +1

clip2latent: Text driven sampling of a pre-trained StyleGAN using denoising diffusion and CLIP

2 code implementations5 Oct 2022 Justin N. M. Pinkney, Chuan Li

We introduce a new method to efficiently create text-to-image models from a pre-trained CLIP and StyleGAN.

Denoising

NPRportrait 1.0: A Three-Level Benchmark for Non-Photorealistic Rendering of Portraits

no code implementations1 Sep 2020 Paul L. Rosin, Yu-Kun Lai, David Mould, Ran Yi, Itamar Berger, Lars Doyle, Seungyong Lee, Chuan Li, Yong-Jin Liu, Amir Semmo, Ariel Shamir, Minjung Son, Holger Winnemoller

Despite the recent upsurge of activity in image-based non-photorealistic rendering (NPR), and in particular portrait image stylisation, due to the advent of neural style transfer, the state of performance evaluation in this field is limited, especially compared to the norms in the computer vision and machine learning communities.

Style Transfer

RenderNet: A deep convolutional network for differentiable rendering from 3D shapes

1 code implementation NeurIPS 2018 Thu Nguyen-Phuoc, Chuan Li, Stephen Balaban, Yong-Liang Yang

We present RenderNet, a differentiable rendering convolutional network with a novel projection unit that can render 2D images from 3D shapes.

Inverse Rendering

Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks

2 code implementations15 Apr 2016 Chuan Li, Michael Wand

This paper proposes Markovian Generative Adversarial Networks (MGANs), a method for training generative neural networks for efficient texture synthesis.

Style Transfer Texture Synthesis

Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis

7 code implementations CVPR 2016 Chuan Li, Michael Wand

This paper studies a combination of generative Markov random field (MRF) models and discriminatively trained deep convolutional neural networks (dCNNs) for synthesizing 2D images.

Image Generation Texture Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.