Search Results for author: Ziyong Feng

Found 21 papers, 11 papers with code

ORID: Organ-Regional Information Driven Framework for Radiology Report Generation

no code implementations20 Nov 2024 Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai

To advance these approaches, this paper introduces an Organ-Regional Information Driven (ORID) framework which can effectively integrate multi-modal information and reduce the influence of noise from unrelated organs.

Decoder Graph Neural Network

Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension

1 code implementation18 Oct 2024 Yin Xie, Kaicheng Yang, Ninghua Yang, Weimo Deng, Xiangzi Dai, Tiancheng Gu, Yumeng Wang, Xiang An, Yongle Zhao, Ziyong Feng, Jiankang Deng

Then, we conceptualize visual tokens as analogous to a "foreign language" for the LLMs and propose a mixed attention mechanism with bidirectional visual attention and unidirectional textual attention to comprehensively enhance the understanding of visual tokens.

Caption Generation

CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination

no code implementations18 Aug 2024 Kaicheng Yang, Tiancheng Gu, Xiang An, Haiqiang Jiang, Xiangzi Dai, Ziyong Feng, Weidong Cai, Jiankang Deng

In this paper, we introduce CLIP-CID, a novel distillation mechanism that effectively transfers knowledge from a large vision-language foundation model to a smaller model.

Knowledge Distillation Transfer Learning +1

VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling

1 code implementation2 Aug 2024 Qian Zhang, Xiangzi Dai, Ninghua Yang, Xiang An, Ziyong Feng, Xingyu Ren

However, the original VAR model is constrained to class-conditioned synthesis, relying solely on textual captions for guidance.

Image Generation

Multi-label Cluster Discrimination for Visual Representation Learning

1 code implementation24 Jul 2024 Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jiankang Deng

In this paper, we propose a novel Multi-Label Cluster Discrimination method named MLCD to enhance representation learning.

Ranked #2 on Visual Question Answering (VQA) on DocVQA test (using extra training data)

Contrastive Learning Image-text Retrieval +7

High-Fidelity Facial Albedo Estimation via Texture Quantization

no code implementations19 Jun 2024 Zimin Ran, Xingyu Ren, Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jia Guo, Linchao Zhu, Jiankang Deng

In this paper, we present a novel facial albedo reconstruction model, HiFiAlbedo, which recovers the albedo map directly from a single image without the need for captured albedo data.

3D Face Reconstruction Quantization

RWKV-CLIP: A Robust Vision-Language Representation Learner

2 code implementations11 Jun 2024 Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai, Jiankang Deng

Contrastive Language-Image Pre-training (CLIP) has significantly improved performance in various vision-language tasks by expanding the dataset with image-text pairs obtained from websites.

Image-text Retrieval Representation Learning +2

1st Place Solution to the 1st SkatingVerse Challenge

no code implementations22 Apr 2024 Tao Sun, Yuanzi Fu, Kaicheng Yang, Jian Wu, Ziyong Feng

This paper presents the winning solution for the 1st SkatingVerse Challenge.

Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models

no code implementations28 Mar 2024 Jiaxing Chen, Yuxuan Liu, Dehu Li, Xiang An, Weimo Deng, Ziyong Feng, Yongle Zhao, Yin Xie

P2G utilizes the tool-usage potential of MLLMs to employ expert agents for on-the-fly grounding of reasoning into critical visual and textual elements in images, thereby enabling deliberate reasoning through multimodal prompting.

Instruction Following Visual Reasoning

IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models

no code implementations20 Mar 2024 Siying Cui, Jia Guo, Xiang An, Jiankang Deng, Yongle Zhao, Xinyu Wei, Ziyong Feng

Leveraging Stable Diffusion for the generation of personalized portraits has emerged as a powerful and noteworthy tool, enabling users to create high-fidelity, custom character avatars based on their specific prompts.

Diversity Image Generation +1

ALIP: Adaptive Language-Image Pre-training with Synthetic Caption

1 code implementation ICCV 2023 Kaicheng Yang, Jiankang Deng, Xiang An, Jiawei Li, Ziyong Feng, Jia Guo, Jing Yang, Tongliang Liu

However, the presence of intrinsic noise and unmatched image-text pairs in web data can potentially affect the performance of representation learning.

Image-text Retrieval Representation Learning +1

Unicom: Universal and Compact Representation Learning for Image Retrieval

3 code implementations12 Apr 2023 Xiang An, Jiankang Deng, Kaicheng Yang, Jaiwei Li, Ziyong Feng, Jia Guo, Jing Yang, Tongliang Liu

To further enhance the low-dimensional feature representation, we randomly select partial feature dimensions when calculating the similarities between embeddings and class-wise prototypes.

Image Retrieval Metric Learning +4

Killing Two Birds with One Stone:Efficient and Robust Training of Face Recognition CNNs by Partial FC

6 code implementations28 Mar 2022 Xiang An, Jiankang Deng, Jia Guo, Ziyong Feng, Xuhan Zhu, Jing Yang, Tongliang Liu

In each iteration, positive class centers and a random subset of negative class centers are selected to compute the margin-based softmax loss.

Face Recognition Face Verification

Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC

1 code implementation CVPR 2022 Xiang An, Jiankang Deng, Jia Guo, Ziyong Feng, Xuhan Zhu, Jing Yang, Tongliang Liu

In each iteration, positive class centers and a random subset of negative class centers are selected to compute the margin-based softmax loss.

Face Recognition

Sequentially Aggregated Convolutional Networks

1 code implementation27 Nov 2018 Yiwen Huang, Rihui Wu, Pinglai Ou, Ziyong Feng

We thus exploit the aggregation nature of shortcut connections at a finer architectural level and place them within wide convolutional layers.

General Classification Image Classification

DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images

5 code implementations24 May 2016 Zhuoyao Zhong, Lianwen Jin, Shuye Zhang, Ziyong Feng

In this paper, we develop a novel unified framework called DeepText for text region proposal generation and text detection in natural images via a fully convolutional neural network (CNN).

Region Proposal Text Classification +1

Fully Convolutional Recurrent Network for Handwritten Chinese Text Recognition

no code implementations18 Apr 2016 Zecheng Xie, Zenghui Sun, Lianwen Jin, Ziyong Feng, Shuye Zhang

This paper proposes an end-to-end framework, namely fully convolutional recurrent network (FCRN) for handwritten Chinese text recognition (HCTR).

Handwriting Recognition Handwritten Chinese Text Recognition +1

A new humanlike facial attractiveness predictor with cascaded fine-tuning deep learning model

no code implementations8 Nov 2015 Jie Xu, Lianwen Jin, Lingyu Liang, Ziyong Feng, Duorui Xie

This paper proposes a deep leaning method to address the challenging facial attractiveness prediction problem.

Facial Beauty Prediction

Improved Deep Convolutional Neural Network For Online Handwritten Chinese Character Recognition using Domain-Specific Knowledge

no code implementations28 May 2015 Weixin Yang, Lianwen Jin, Zecheng Xie, Ziyong Feng

Deep convolutional neural networks (DCNNs) have achieved great success in various computer vision and pattern recognition applications, including those for handwritten Chinese character recognition (HCCR).

Diversity

DropSample: A New Training Method to Enhance Deep Convolutional Neural Networks for Large-Scale Unconstrained Handwritten Chinese Character Recognition

no code implementations20 May 2015 Weixin Yang, Lianwen Jin, DaCheng Tao, Zecheng Xie, Ziyong Feng

Inspired by the theory of Leitners learning box from the field of psychology, we propose DropSample, a new method for training deep convolutional neural networks (DCNNs), and apply it to large-scale online handwritten Chinese character recognition (HCCR).

Cannot find the paper you are looking for? You can Submit a new open access paper.