Search Results for author: Zhiyuan Zhao

Found 28 papers, 18 papers with code

FGAseg: Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation

1 code implementation1 Jan 2025 Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

The core of FGAseg is a Pixel-Level Alignment module that employs a cross-modal attention mechanism and a text-pixel alignment loss to refine the coarse-grained alignment from CLIP, achieving finer-grained pixel-text semantic alignment.

Open Vocabulary Semantic Segmentation Open-Vocabulary Semantic Segmentation +1

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

1 code implementation10 Dec 2024 Linke Ouyang, Yuan Qu, Hongbin Zhou, Jiawei Zhu, Rui Zhang, Qunshu Lin, Bin Wang, Zhiyuan Zhao, Man Jiang, Xiaomeng Zhao, Jin Shi, Fan Wu, Pei Chu, Minghao Liu, Zhenxiang Li, Chao Xu, Bo Zhang, Botian Shi, Zhongying Tu, Conghui He

Document content extraction is crucial in computer vision, especially for meeting the high-quality data needs of large language models (LLMs) and retrieval-augmented generation (RAG) technologies.

Attribute Benchmarking +2

TSI-Bench: Benchmarking Time Series Imputation

4 code implementations18 Jun 2024 Wenjie Du, Jun Wang, Linglong Qian, Yiyuan Yang, Zina Ibrahim, Fanxing Liu, Zepu Wang, Haoxin Liu, Zhiyuan Zhao, Yingjie Zhou, Wenjia Wang, Kaize Ding, Yuxuan Liang, B. Aditya Prakash, Qingsong Wen

Despite the development of numerous deep learning algorithms for time series imputation, the community lacks standardized and comprehensive benchmark platforms to effectively evaluate imputation performance across different settings.

Benchmarking Deep Learning +3

Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control

no code implementations5 Jun 2024 Jingyun Xue, Hongfa Wang, Qi Tian, Yue Ma, Andong Wang, Zhiyuan Zhao, Shaobo Min, Wenzhe Zhao, Kaihao Zhang, Heung-Yeung Shum, Wei Liu, Mengyang Liu, Wenhan Luo

While existing character image animation methods using pose sequences and reference images have shown promising performance, they tend to struggle with incoherent animation in complex scenarios, such as multiple character animation and body occlusion.

Image Animation Video Generation

DSDL: Data Set Description Language for Bridging Modalities and Tasks in AI Data

1 code implementation28 May 2024 Bin Wang, Linke Ouyang, Fan Wu, Wenchang Ning, Xiao Han, Zhiyuan Zhao, Jiahui Peng, Yiying Jiang, Dahua Lin, Conghui He

In the era of artificial intelligence, the diversity of data modalities and annotation formats often renders data unusable directly, requiring understanding and format conversion before it can be used by researchers or developers with different needs.

Diversity

U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation

1 code implementation24 May 2024 Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

To address this issue, we leverage the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion and introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation.

Segmentation Semantic Segmentation

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

1 code implementation28 Nov 2023 Zhiyuan Zhao, Bin Wang, Linke Ouyang, Xiaoyi Dong, Jiaqi Wang, Conghui He

Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem", in which the models generate textual descriptions that inaccurately depict or entirely fabricate content from associated images.

Hallucination MME

Performative Time-Series Forecasting

1 code implementation9 Oct 2023 Zhiyuan Zhao, Alexander Rodriguez, B. Aditya Prakash

Time-series forecasting is a critical challenge in various domains and has witnessed substantial progress in recent years.

Time Series Time Series Forecasting

MLLM-DataEngine: An Iterative Refinement Approach for MLLM

1 code implementation25 Aug 2023 Zhiyuan Zhao, Linke Ouyang, Bin Wang, Siyuan Huang, Pan Zhang, Xiaoyi Dong, Jiaqi Wang, Conghui He

Despite the great advance of Multimodal Large Language Models (MLLMs) in both instruction dataset building and benchmarking, the independence of training and evaluation makes current MLLMs hard to further improve their capability under the guidance of evaluation results with a relatively low human cost.

Benchmarking

PINNsFormer: A Transformer-Based Framework For Physics-Informed Neural Networks

1 code implementation21 Jul 2023 Zhiyuan Zhao, Xueying Ding, B. Aditya Prakash

Physics-Informed Neural Networks (PINNs) have emerged as a promising deep learning framework for approximating numerical solutions to partial differential equations (PDEs).

TridentSE: Guiding Speech Enhancement with 32 Global Tokens

no code implementations24 Oct 2022 Dacheng Yin, Zhiyuan Zhao, Chuanxin Tang, Zhiwei Xiong, Chong Luo

In this paper, we present TridentSE, a novel architecture for speech enhancement, which is capable of efficiently capturing both global information and local details.

Speech Enhancement

Exploring Effective Knowledge Transfer for Few-shot Object Detection

1 code implementation5 Oct 2022 Zhiyuan Zhao, Qingjie Liu, Yunhong Wang

For the high-shot regime, we propose to use the knowledge learned from ImageNet as guidance for the feature learning in the fine-tuning stage, which will implicitly align the distributions of the novel classes.

Few-Shot Object Detection Object +2

An Anchor-Free Detector for Continuous Speech Keyword Spotting

no code implementations9 Aug 2022 Zhiyuan Zhao, Chuanxin Tang, Chengdong Yao, Chong Luo

Continuous Speech Keyword Spotting (CSKWS) is a task to detect predefined keywords in a continuous speech.

Keyword Spotting object-detection +1

RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion

no code implementations28 Jun 2022 Dacheng Yin, Chuanxin Tang, Yanqing Liu, Xiaoqiang Wang, Zhiyuan Zhao, Yucheng Zhao, Zhiwei Xiong, Sheng Zhao, Chong Luo

In the proposed paradigm, global and local factors in speech are explicitly decomposed and separately manipulated to achieve high speaker similarity and continuous prosody.

Sentence

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

1 code implementation12 Sep 2021 Chuanxin Tang, Chong Luo, Zhiyuan Zhao, Dacheng Yin, Yucheng Zhao, Wenjun Zeng

Given a piece of speech and its transcript text, text-based speech editing aims to generate speech that can be seamlessly inserted into the given speech by editing the transcript.

Decoder Text to Speech +1

Beating the Standard Quantum Limit under Ambient Conditions with Solid-State Spins

no code implementations28 Jan 2021 Tianyu Xie, Zhiyuan Zhao, Xi Kong, Wenchao Ma, Mengqi Wang, Xiangyu Ye, Pei Yu, Zhiping Yang, Shaoyi Xu, Pengfei Wang, Ya Wang, Fazhan Shi, Jiangfeng Du

However, it has not been realized in solid-state spin systems at ambient conditions, owing to its intrinsic complexity for the preparation and survival of pure and entangled quantum states.

Quantum Physics

A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in Aerial View

no code implementations29 Sep 2020 Zhiyuan Zhao, Tao Han, Junyu. Gao, Qi. Wang, Xuelong. Li

Drones shooting can be applied in dynamic traffic monitoring, object detecting and tracking, and other vision tasks.

Crowd Counting Density Estimation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.