1 code implementation • 1 Jan 2025 • Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li
The core of FGAseg is a Pixel-Level Alignment module that employs a cross-modal attention mechanism and a text-pixel alignment loss to refine the coarse-grained alignment from CLIP, achieving finer-grained pixel-text semantic alignment.
Open Vocabulary Semantic Segmentation Open-Vocabulary Semantic Segmentation +1
1 code implementation • 10 Dec 2024 • Linke Ouyang, Yuan Qu, Hongbin Zhou, Jiawei Zhu, Rui Zhang, Qunshu Lin, Bin Wang, Zhiyuan Zhao, Man Jiang, Xiaomeng Zhao, Jin Shi, Fan Wu, Pei Chu, Minghao Liu, Zhenxiang Li, Chao Xu, Bo Zhang, Botian Shi, Zhongying Tu, Conghui He
Document content extraction is crucial in computer vision, especially for meeting the high-quality data needs of large language models (LLMs) and retrieval-augmented generation (RAG) technologies.
2 code implementations • 16 Oct 2024 • Zhiyuan Zhao, Hengrui Kang, Bin Wang, Conghui He
Pre-training on the resulting DocSynth-300K dataset significantly improves fine-tuning performance across various document types.
Ranked #2 on Document Layout Analysis on D4LA
2 code implementations • 27 Sep 2024 • Bin Wang, Chao Xu, Xiaomeng Zhao, Linke Ouyang, Fan Wu, Zhiyuan Zhao, Rui Xu, Kaiwen Liu, Yuan Qu, FuKai Shang, Bo Zhang, Liqun Wei, Zhihao Sui, Wei Li, Botian Shi, Yu Qiao, Dahua Lin, Conghui He
Document content analysis has been a crucial research area in computer vision.
1 code implementation • 15 Aug 2024 • Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Yuan Yuan
Text has become the predominant form of communication on social media, embedding a wealth of emotional nuances.
4 code implementations • 18 Jun 2024 • Wenjie Du, Jun Wang, Linglong Qian, Yiyuan Yang, Zina Ibrahim, Fanxing Liu, Zepu Wang, Haoxin Liu, Zhiyuan Zhao, Yingjie Zhou, Wenjia Wang, Kaize Ding, Yuxuan Liang, B. Aditya Prakash, Qingsong Wen
Despite the development of numerous deep learning algorithms for time series imputation, the community lacks standardized and comprehensive benchmark platforms to effectively evaluate imputation performance across different settings.
1 code implementation • 13 Jun 2024 • Haoxin Liu, Harshavardhan Kamarthi, Lingkai Kong, Zhiyuan Zhao, Chao Zhang, B. Aditya Prakash
In this paper, we aim to alleviate the inherent OOD problem in TSF via invariant learning.
2 code implementations • 12 Jun 2024 • Haoxin Liu, Shangqing Xu, Zhiyuan Zhao, Lingkai Kong, Harshavardhan Kamarthi, Aditya B. Sasanur, Megha Sharma, Jiaming Cui, Qingsong Wen, Chao Zhang, B. Aditya Prakash
To overcome this obstacle, we introduce Time-MMD, the first multi-domain, multimodal time series dataset covering 9 primary data domains.
no code implementations • 5 Jun 2024 • Jingyun Xue, Hongfa Wang, Qi Tian, Yue Ma, Andong Wang, Zhiyuan Zhao, Shaobo Min, Wenzhe Zhao, Kaihao Zhang, Heung-Yeung Shum, Wei Liu, Mengyang Liu, Wenhan Luo
While existing character image animation methods using pose sequences and reference images have shown promising performance, they tend to struggle with incoherent animation in complex scenarios, such as multiple character animation and body occlusion.
1 code implementation • 28 May 2024 • Bin Wang, Linke Ouyang, Fan Wu, Wenchang Ning, Xiao Han, Zhiyuan Zhao, Jiahui Peng, Yiying Jiang, Dahua Lin, Conghui He
In the era of artificial intelligence, the diversity of data modalities and annotation formats often renders data unusable directly, requiring understanding and format conversion before it can be used by researchers or developers with different needs.
1 code implementation • 24 May 2024 • Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li
To address this issue, we leverage the inherent capabilities of the model itself to discover the optimal equilibrium in multimodal fusion and introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation.
no code implementations • 25 Feb 2024 • Haoxin Liu, Zhiyuan Zhao, Jindong Wang, Harshavardhan Kamarthi, B. Aditya Prakash
Time-series forecasting (TSF) finds broad applications in real-world scenarios.
no code implementations • 30 Nov 2023 • Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin, Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong
Second, it preserves the high-fidelity generation ability of the pre-trained image diffusion models by making only minimal network modifications.
1 code implementation • 28 Nov 2023 • Zhiyuan Zhao, Bin Wang, Linke Ouyang, Xiaoyi Dong, Jiaqi Wang, Conghui He
Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem", in which the models generate textual descriptions that inaccurately depict or entirely fabricate content from associated images.
1 code implementation • 9 Oct 2023 • Zhiyuan Zhao, Alexander Rodriguez, B. Aditya Prakash
Time-series forecasting is a critical challenge in various domains and has witnessed substantial progress in recent years.
2 code implementations • 26 Sep 2023 • Pan Zhang, Xiaoyi Dong, Bin Wang, Yuhang Cao, Chao Xu, Linke Ouyang, Zhiyuan Zhao, Haodong Duan, Songyang Zhang, Shuangrui Ding, Wenwei Zhang, Hang Yan, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang
We propose InternLM-XComposer, a vision-language large model that enables advanced image-text comprehension and composition.
Ranked #9 on Visual Question Answering (VQA) on InfiMM-Eval
1 code implementation • 25 Aug 2023 • Zhiyuan Zhao, Linke Ouyang, Bin Wang, Siyuan Huang, Pan Zhang, Xiaoyi Dong, Jiaqi Wang, Conghui He
Despite the great advance of Multimodal Large Language Models (MLLMs) in both instruction dataset building and benchmarking, the independence of training and evaluation makes current MLLMs hard to further improve their capability under the guidance of evaluation results with a relatively low human cost.
1 code implementation • 21 Jul 2023 • Zhiyuan Zhao, Xueying Ding, B. Aditya Prakash
Physics-Informed Neural Networks (PINNs) have emerged as a promising deep learning framework for approximating numerical solutions to partial differential equations (PDEs).
no code implementations • 12 Apr 2023 • Zhiyuan Zhao, Lijun Wu, Chuanxin Tang, Dacheng Yin, Yucheng Zhao, Chong Luo
Filler words like ``um" or ``uh" are common in spontaneous speech.
no code implementations • 24 Oct 2022 • Dacheng Yin, Zhiyuan Zhao, Chuanxin Tang, Zhiwei Xiong, Chong Luo
In this paper, we present TridentSE, a novel architecture for speech enhancement, which is capable of efficiently capturing both global information and local details.
1 code implementation • 5 Oct 2022 • Zhiyuan Zhao, Qingjie Liu, Yunhong Wang
For the high-shot regime, we propose to use the knowledge learned from ImageNet as guidance for the feature learning in the fine-tuning stage, which will implicitly align the distributions of the novel classes.
no code implementations • 9 Aug 2022 • Zhiyuan Zhao, Chuanxin Tang, Chengdong Yao, Chong Luo
Continuous Speech Keyword Spotting (CSKWS) is a task to detect predefined keywords in a continuous speech.
no code implementations • 28 Jun 2022 • Dacheng Yin, Chuanxin Tang, Yanqing Liu, Xiaoqiang Wang, Zhiyuan Zhao, Yucheng Zhao, Zhiwei Xiong, Sheng Zhao, Chong Luo
In the proposed paradigm, global and local factors in speech are explicitly decomposed and separately manipulated to achieve high speaker similarity and continuous prosody.
1 code implementation • 12 Sep 2021 • Chuanxin Tang, Chong Luo, Zhiyuan Zhao, Dacheng Yin, Yucheng Zhao, Wenjun Zeng
Given a piece of speech and its transcript text, text-based speech editing aims to generate speech that can be seamlessly inserted into the given speech by editing the transcript.
1 code implementation • 19 Jul 2021 • Dawei Du, Longyin Wen, Pengfei Zhu, Heng Fan, QinGhua Hu, Haibin Ling, Mubarak Shah, Junwen Pan, Ali Al-Ali, Amr Mohamed, Bakour Imene, Bin Dong, Binyu Zhang, Bouchali Hadia Nesma, Chenfeng Xu, Chenzhen Duan, Ciro Castiello, Corrado Mencar, Dingkang Liang, Florian Krüger, Gennaro Vessio, Giovanna Castellano, Jieru Wang, Junyu Gao, Khalid Abualsaud, Laihui Ding, Lei Zhao, Marco Cianciotta, Muhammad Saqib, Noor Almaadeed, Omar Elharrouss, Pei Lyu, Qi Wang, Shidong Liu, Shuang Qiu, Siyang Pan, Somaya Al-Maadeed, Sultan Daud Khan, Tamer Khattab, Tao Han, Thomas Golda, Wei Xu, Xiang Bai, Xiaoqing Xu, Xuelong Li, Yanyun Zhao, Ye Tian, Yingnan Lin, Yongchao Xu, Yuehan Yao, Zhenyu Xu, Zhijian Zhao, Zhipeng Luo, Zhiwei Wei, Zhiyuan Zhao
Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint.
no code implementations • 3 Feb 2021 • Yucheng Zhao, Dacheng Yin, Chong Luo, Zhiyuan Zhao, Chuanxin Tang, Wenjun Zeng, Zheng-Jun Zha
This paper presents a self-supervised learning framework, named MGF, for general-purpose speech representation learning.
no code implementations • 28 Jan 2021 • Tianyu Xie, Zhiyuan Zhao, Xi Kong, Wenchao Ma, Mengqi Wang, Xiangyu Ye, Pei Yu, Zhiping Yang, Shaoyi Xu, Pengfei Wang, Ya Wang, Fazhan Shi, Jiangfeng Du
However, it has not been realized in solid-state spin systems at ambient conditions, owing to its intrinsic complexity for the preparation and survival of pure and entangled quantum states.
Quantum Physics
no code implementations • 29 Sep 2020 • Zhiyuan Zhao, Tao Han, Junyu. Gao, Qi. Wang, Xuelong. Li
Drones shooting can be applied in dynamic traffic monitoring, object detecting and tracking, and other vision tasks.