Search Results for author: Bin Fu

Found 26 papers, 10 papers with code

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

1 code implementation • 8 Mar 2024 • XiWei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, Gang Yu

Diffusion models have demonstrated remarkable performance in the domain of text-to-image generation.

241

Paper
Code

Computing Threshold Circuits with Void Reactions in Step Chemical Reaction Networks

no code implementations • 13 Feb 2024 • Rachel Anderson, Alberto Avila, Bin Fu, Timothy Gomez, Elise Grizzell, Aiden Massie, Gourab Mukhopadhyay, Adrian Salinas, Robert Schweller, Evan Tomai, Tim Wylie

We introduce a new model of \emph{step} Chemical Reaction Networks (step CRNs), motivated by the step-wise addition of materials in standard lab procedures.

Paper
Add Code

AppAgent: Multimodal Agents as Smartphone Users

no code implementations • 21 Dec 2023 • Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, Gang Yu

Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks.

Navigate

Paper
Add Code

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

1 code implementation • 21 Dec 2023 • Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong liu, Gang Yu

This paper presents Paint3D, a novel coarse-to-fine generative framework that is capable of producing high-resolution, lighting-less, and diverse 2K UV texture maps for untextured 3D meshes conditioned on text or image inputs.

502

Paper
Code

FaceStudio: Put Your Face Everywhere in Seconds

no code implementations • 5 Dec 2023 • Yuxuan Yan, Chi Zhang, Rui Wang, Yichao Zhou, Gege Zhang, Pei Cheng, Gang Yu, Bin Fu

This study investigates identity-preserving image synthesis, an intriguing task in image generation that seeks to maintain a subject's identity while adding a personalized, stylistic touch.

Image Generation

Paper
Add Code

ChartLlama: A Multimodal LLM for Chart Understanding and Generation

no code implementations • 27 Nov 2023 • Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, Hanwang Zhang

Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset.

Language Modelling Large Language Model

Paper
Add Code

SAM-Med3D

1 code implementation • 23 Oct 2023 • Haoyu Wang, Sizheng Guo, Jin Ye, Zhongying Deng, Junlong Cheng, Tianbin Li, Jianpin Chen, Yanzhou Su, Ziyan Huang, Yiqing Shen, Bin Fu, Shaoting Zhang, Junjun He, Yu Qiao

These issues can hardly be addressed by fine-tuning SAM on medical data because the original 2D structure of SAM neglects 3D spatial information.

3D Architecture Image Segmentation +1

344

Paper
Code

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering

no code implementations • ICCV 2023 • Chi Zhang, Wei Yin, Gang Yu, Zhibin Wang, Tao Chen, Bin Fu, Joey Tianyi Zhou, Chunhua Shen

In this paper, we propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.

Monocular Depth Estimation

Paper
Add Code

Deformation Robust Text Spotting with Geometric Prior

no code implementations • 31 Aug 2023 • Xixuan Hao, Aozhong zhang, Xianze Meng, Bin Fu

Based on this database, we develop a deformation robust text spotting method (DR TextSpotter) to solve the recognition problem of complex deformation of characters in different fonts.

Text Detection Text Spotting +1

Paper
Add Code

StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data

1 code implementation • 20 Aug 2023 • Yanda Li, Chi Zhang, Gang Yu, Zhibin Wang, Bin Fu, Guosheng Lin, Chunhua Shen, Ling Chen, Yunchao Wei

However, these datasets often exhibit domain bias, potentially constraining the generative capabilities of the models.

Ranked #51 on Visual Question Answering on MM-Vet

Visual Question Answering

Paper
Code

Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation

1 code implementation • NeurIPS 2023 • Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, Bin Fu, Tao Chen, Gang Yu, Shenghua Gao

We present a novel alignment-before-generation approach to tackle the challenging task of generating general 3D shapes based on 2D images or texts.

3D Shape Generation

270

Paper
Code

StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation

2 code implementations • 30 May 2023 • Chi Zhang, YiWen Chen, Yijun Fu, Zhenglin Zhou, Gang Yu, Billzb Wang, Bin Fu, Tao Chen, Guosheng Lin, Chunhua Shen

The recent advancements in image-text diffusion models have stimulated research interest in large-scale 3D generative models.

3D Generation Attribute +1

497

Paper
Code

Neural Transformation Fields for Arbitrary-Styled Font Generation

1 code implementation • CVPR 2023 • Bin Fu, Junjun He, Jianjun Wang, Yu Qiao

Few-shot font generation (FFG), aiming at generating font images with a few samples, is an emerging topic in recent years due to the academic and commercial values.

Disentanglement Font Generation

Paper
Code

Executing your Commands via Motion Diffusion in Latent Space

1 code implementation • CVPR 2023 • Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, Jingyi Yu, Gang Yu

We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors.

Ranked #2 on Motion Synthesis on HumanAct12

Motion Synthesis

503

Paper
Code

Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI & AIM 2022 Challenge: Report

no code implementations • 7 Nov 2022 • Andrey Ignatov, Grigory Malivenko, Radu Timofte, Lukasz Treszczotko, Xin Chang, Piotr Ksiazek, Michal Lopuszynski, Maciej Pioro, Rafal Rudnicki, Maciej Smyl, Yujie Ma, Zhenyu Li, Zehui Chen, Jialei Xu, Xianming Liu, Junjun Jiang, XueChao Shi, Difan Xu, Yanan Li, Xiaotao Wang, Lei Lei, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo, Gang Yu, Bin Fu, Jiaqi Li, Yiran Wang, Zihao Huang, Zhiguo Cao, Marcos V. Conde, Denis Sapozhnikov, Byeong Hyun Lee, Dongwon Park, Seongmin Hong, Joonhee Lee, Seunggyu Lee, Se Young Chun

Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks.

Bokeh Effect Rendering Depth Estimation +3

Paper
Add Code

Learning Variational Motion Prior for Video-based Motion Capture

no code implementations • 27 Oct 2022 • Xin Chen, Zhuo Su, Lingbo Yang, Pei Cheng, Lan Xu, Bin Fu, Gang Yu

To improve the generalization capacity of prior space, we propose a transformer-based variational autoencoder pretrained over marker-based 3D mocap data, with a novel style-mapping block to boost the generation quality.

Pose Estimation

Paper
Add Code

Hierarchical Normalization for Robust Monocular Depth Estimation

no code implementations • 18 Oct 2022 • Chi Zhang, Wei Yin, Zhibin Wang, Gang Yu, Bin Fu, Chunhua Shen

In this paper, we address monocular depth estimation with deep neural networks.

Monocular Depth Estimation

Paper
Add Code

GenText: Unsupervised Artistic Text Generation via Decoupled Font and Texture Manipulation

no code implementations • 20 Jul 2022 • Qirui Huang, Bin Fu, Aozhong zhang, Yu Qiao

Specifically, our current work incorporates three different stages, stylization, destylization, and font transfer, respectively, into a unified platform with a single powerful encoder network and two separate style generator networks, one for font transfer, the other for stylization and destylization.

Style Transfer Text Style Transfer

Paper
Add Code

Enhancing Quality of Pose-varied Face Restoration with Local Weak Feature Sensing and GAN Prior

no code implementations • 28 May 2022 • Kai Hu, Yu Liu, Renhe Liu, Wei Lu, Gang Yu, Bin Fu

In the asymmetric codec, we adopt a mixed multi-path residual block (MMRB) to gradually extract weak texture features of input images, which can better preserve the original facial features and avoid excessive fantasy.

Blind Face Restoration Super-Resolution

Paper
Add Code

Fine-grained Identity Preserving Landmark Synthesis for Face Reenactment

no code implementations • 10 Oct 2021 • Haichao Zhang, Youcheng Ben, Weixi Zhang, Tao Chen, Gang Yu, Bin Fu

Recent face reenactment works are limited by the coarse reference landmarks, leading to unsatisfactory identity preserving performance due to the distribution gap between the manipulated landmarks and those sampled from a real person.

Face Reenactment

Paper
Add Code

Shuffle Transformer with Feature Alignment for Video Face Parsing

no code implementations • 16 Jun 2021 • Rui Zhang, Yang Han, Zilong Huang, Pei Cheng, Guozhong Luo, Gang Yu, Bin Fu

This is a short technical report introducing the solution of the Team TCParser for Short-video Face Parsing Track of The 3rd Person in Context (PIC) Workshop and Challenge at CVPR 2021.

Face Parsing

Paper
Add Code

Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer

4 code implementations • 7 Jun 2021 • Zilong Huang, Youcheng Ben, Guozhong Luo, Pei Cheng, Gang Yu, Bin Fu

In this work, we revisit the spatial shuffle as an efficient way to build connections among windows.

Ranked #45 on Semantic Segmentation on ADE20K val

Image Classification object-detection +3

1,671

Paper
Code

Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report

no code implementations • 17 May 2021 • Andrey Ignatov, Grigory Malivenko, David Plowman, Samarth Shukla, Radu Timofte, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo, Gang Yu, Bin Fu, Yiran Wang, Xingyi Li, Min Shi, Ke Xian, Zhiguo Cao, Jin-Hua Du, Pei-Lin Wu, Chao Ge, Jiaoyang Yao, Fangwen Tu, Bo Li, Jung Eun Yoo, Kwanggyoon Seo, Jialei Xu, Zhenyu Li, Xianming Liu, Junjun Jiang, Wei-Chi Chen, Shayan Joya, Huanhuan Fan, Zhaobing Kang, Ang Li, Tianpeng Feng, Yang Liu, Chuannan Sheng, Jian Yin, Fausto T. Benavide

While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference.

Depth Estimation

Paper
Add Code

Software-Defined Edge Computing: A New Architecture Paradigm to Support IoT Data Analysis

no code implementations • 22 Apr 2021 • Di wu, XiaoFeng Xie, Xiang Ni, Bin Fu, Hanhui Deng, Haibo Zeng, Zhijin Qin

We further present an experiment on data anomaly detection in this architecture, and the comparison between two architectures for ECG diagnosis.

Anomaly Detection Edge-computing

Paper
Add Code

A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

no code implementations • 26 Jul 2020 • Bin Fu, Yunqi Qiu, Chengguang Tang, Yang Li, Haiyang Yu, Jian Sun

Question Answering (QA) over Knowledge Base (KB) aims to automatically answer natural language questions via well-structured relation information between entities stored in knowledge bases.

Information Retrieval Question Answering +2

Paper
Add Code

Deep & Cross Network for Ad Click Predictions

16 code implementations • 17 Aug 2017 • Ruoxi Wang, Bin Fu, Gang Fu, Mingliang Wang

Feature engineering has been the key to the success of many prediction models.

Click-Through Rate Prediction Feature Engineering

7,338

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.