Search Results for author: Ziheng Wu

Found 12 papers, 5 papers with code

Valley2: Exploring Multimodal Models with Scalable Vision-Language Design

1 code implementation10 Jan 2025 Ziheng Wu, Zhenghao Chen, Ruipu Luo, Can Zhang, Yuan Gao, Zhentao He, Xian Wang, Haoran Lin, Minghui Qiu

Recently, vision-language models have made remarkable progress, demonstrating outstanding capabilities in various tasks such as image captioning and video understanding.

Image Captioning Language Modeling +4

Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

no code implementations14 Jun 2024 Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, Jingyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan, Xiaoxiang Zhou, Rong Fan, Ruihan Chen, Wenchao Guo, Jianpeng Zhang, Tony C. W. Mok, Zi Li, Le Lu, Dehai Lang, Xiaoqiang Li, Guofu Wang, Wei Lu, Zhengxing Huang, Minfeng Xu, HongKun Zhang

Our AI model performed well on non-contrast CT at all applicable early stages of differential diagnosis workflows, effectively reduced the overall missed diagnosis and misdiagnosis rate from 48. 8% to 4. 8% and shortened the diagnosis time for patients with misguided initial suspicion from an average of 681. 8 (74-11, 820) mins to 68. 5 (23-195) mins.

Anatomy Specificity

BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image Synthesis

no code implementations12 Nov 2023 Tingfeng Cao, Chengyu Wang, Bingyan Liu, Ziheng Wu, Jinhui Zhu, Jun Huang

Then, to ensure that our generated prompts can generate more beautiful images, we further propose a Reinforcement Learning with Visual AI Feedback technique to fine-tune our model to maximize the reward values of the generated prompts, where the reward values are calculated based on the PickScore and the Aesthetic Scores.

Prompt Engineering Text-to-Image Generation

Hierarchical Side-Tuning for Vision Transformers

no code implementations9 Oct 2023 Weifeng Lin, Ziheng Wu, Wentao Yang, Mingxin Huang, Jun Huang, Lianwen Jin

In this paper, we introduce Hierarchical Side-Tuning (HST), an innovative PETL method facilitating the transfer of ViT models to diverse downstream tasks.

Image Classification Instance Segmentation +5

EasyPhoto: Your Smart AI Photo Generator

2 code implementations7 Oct 2023 Ziheng Wu, Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Xing Shi, Jun Huang

By training a digital doppelganger of a specific user ID using 5 to 20 relevant images, the finetuned model (according to the trained LoRA model) allows for the generation of AI photos using arbitrary templates.

DualToken-ViT: Position-aware Efficient Vision Transformer with Dual Token Fusion

no code implementations21 Sep 2023 Zhenzhen Chu, Jiayu Chen, Cen Chen, Chengyu Wang, Ziheng Wu, Jun Huang, Weining Qian

Position-aware global tokens also contain the position information of the image, which makes our model better for vision tasks.

Image Classification object-detection +3

FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content

1 code implementation28 Aug 2023 Yang Liu, Cheng Yu, Lei Shang, Yongyi He, Ziheng Wu, Xingjun Wang, Chao Xu, Haoyu Xie, Weida Wang, Yuze Zhao, Lin Zhu, Chen Cheng, Weitao Chen, Yuan YAO, Wenmeng Zhou, Jiaqi Xu, Qiang Wang, Yingda Chen, Xuansong Xie, Baigui Sun

In this paper, we present FaceChain, a personalized portrait generation framework that combines a series of customized image-generation model and a rich set of face-related perceptual understanding models (\eg, face detection, deep face embedding extraction, and facial attribute recognition), to tackle aforementioned challenges and to generate truthful personalized portraits, with only a handful of portrait images as input.

Attribute Personalized Image Generation +2

Scale-Aware Modulation Meet Transformer

1 code implementation ICCV 2023 Weifeng Lin, Ziheng Wu, Jiayu Chen, Jun Huang, Lianwen Jin

Specifically, SMT with 11. 5M / 2. 4GFLOPs and 32M / 7. 7GFLOPs can achieve 82. 2% and 84. 3% top-1 accuracy on ImageNet-1K, respectively.

object-detection Object Detection +1

SC-ML: Self-supervised Counterfactual Metric Learning for Debiased Visual Question Answering

no code implementations4 Apr 2023 Xinyao Shu, ShiYang Yan, Xu Yang, Ziheng Wu, Zhongfeng Chen, Zhenyu Lu

Unfortunately, language bias is a common problem in VQA, which refers to the model generating answers only by associating with the questions while ignoring the visual content, resulting in biased results.

counterfactual Metric Learning +2

YOLOX-PAI: An Improved YOLOX, Stronger and Faster than YOLOv6

3 code implementations27 Aug 2022 Ziheng Wu, Xinyi Zou, Wenmeng Zhou, Jun Huang

We develop an all-in-one computer vision toolbox named EasyCV to facilitate the use of various SOTA computer vision methods.

object-detection Object Detection

Elastic-Link for Binarized Neural Network

no code implementations19 Dec 2021 Jie Hu, Ziheng Wu, Vince Tan, Zhilin Lu, Mengze Zeng, Enhua Wu

For example, we raise the top-1 accuracy of binarized ResNet26 from 57. 9% to 64. 0%.

Binarization

Cannot find the paper you are looking for? You can Submit a new open access paper.