no code implementations • 28 Mar 2025 • Huixiang Zhen, Xiaotong Li, Wenyin Gong, Ling Wang, Xiangyun Hu
This indicator simultaneously considers convergence, diversity, and distribution to improve the efficiency of identifying promising candidate solutions, which significantly improves algorithm performance.
no code implementations • 14 Feb 2025 • Venkatesh Sivaraman, Anika Vaishampayan, Xiaotong Li, Brian R Buck, Ziyong Ma, Richard D Boyce, Adam Perer
Temporal predictive models have the potential to improve decisions in health care, public services, and other domains, yet they often fail to effectively support decision-makers.
1 code implementation • 10 Feb 2025 • Haiwen Diao, Xiaotong Li, Yufeng Cui, Yueze Wang, Haoge Deng, Ting Pan, Wenxuan Wang, Huchuan Lu, Xinlong Wang
Existing encoder-free vision-language models (VLMs) are rapidly narrowing the performance gap with their encoder-based counterparts, highlighting the promising potential for unified multimodal systems with structural simplicity and efficient deployment.
4 code implementations • 24 Oct 2024 • Shuhao Gu, Jialing Zhang, Siyuan Zhou, Kevin Yu, Zhaohu Xing, Liangdong Wang, Zhou Cao, Jintao Jia, Zhuoyi Zhang, YiXuan Wang, Zhenchong Hu, Bo-Wen Zhang, Jijie Li, Dong Liang, Yingli Zhao, Songjing Wang, Yulong Ao, Yiming Ju, Huanhuan Ma, Xiaotong Li, Haiwen Diao, Yufeng Cui, Xinlong Wang, Yaoqi Liu, Fangxiang Feng, Guang Liu
Despite the availability of several open-source multimodal datasets, limitations in the scale and quality of open-source instruction data hinder the performance of VLMs trained on these datasets, leading to a significant gap compared to models trained on closed-source data.
Ranked #4 on
Image Generation
on TextAtlasEval
no code implementations • 10 Oct 2024 • Xiang Zhuang, Keyan Ding, Tianwen Lyu, Yinuo Jiang, Xiaotong Li, Zhuoyi Xiang, Zeyuan Wang, Ming Qin, Kehua Feng, Jike Wang, Qiang Zhang, Huajun Chen
Understanding and designing biomolecules, such as proteins and small molecules, is central to advancing drug discovery, synthetic biology, and enzyme engineering.
no code implementations • 8 Oct 2024 • Sha Guo, Zhuo Chen, Yang Zhao, Ning Zhang, Xiaotong Li, Lingyu Duan
Extensive experiments demonstrate the effectiveness of the proposed framework in both image reconstruction and downstream machine vision tasks such as object detection, segmentation, and facial landmark detection, achieving superior perceptual quality compared to state-of-the-art methods.
1 code implementation • 11 Jul 2024 • Xiaotong Li, Fan Zhang, Haiwen Diao, Yueze Wang, Xinlong Wang, Ling-Yu Duan
To facilitate the cutting-edge research of MLLMs on comprehensive vision perception, we thereby propose Perceptual Fusion, using a low-budget but highly effective caption engine for complete and accurate image descriptions.
Ranked #127 on
Visual Question Answering
on MM-Vet
1 code implementation • 17 Jun 2024 • Haiwen Diao, Yufeng Cui, Xiaotong Li, Yueze Wang, Huchuan Lu, Xinlong Wang
Training pure VLMs that accept the seamless vision and language inputs, i. e., without vision encoders, remains challenging and rarely explored.
1 code implementation • 26 Jan 2024 • Qiang Zhang, Keyang Ding, Tianwen Lyv, Xinda Wang, Qingyu Yin, Yiwen Zhang, Jing Yu, Yuhao Wang, Xiaotong Li, Zhuoyi Xiang, Kehua Feng, Xiang Zhuang, Zeyuan Wang, Ming Qin, Mengyao Zhang, Jinlu Zhang, Jiyu Cui, Tao Huang, Pengju Yan, Renjun Xu, Hongyang Chen, Xiaolin Li, Xiaohui Fan, Huabin Xing, Huajun Chen
Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension, representing a significant stride toward artificial general intelligence.
no code implementations • CVPR 2024 • Zixuan Hu, Xiaotong Li, Shixiang Tang, Jun Liu, Yichun Hu, Ling-Yu Duan
The remarkable success of "pretrain-then-finetune" paradigm has led to a proliferation of available pre-trained models for vision tasks.
no code implementations • 5 Oct 2023 • Zeyuan Wang, Qiang Zhang, Keyan Ding, Ming Qin, Xiang Zhuang, Xiaotong Li, Huajun Chen
To address this challenge, we propose InstructProtein, an innovative LLM that possesses bidirectional generation capabilities in both human and protein languages: (i) taking a protein sequence as input to predict its textual function description and (ii) using natural language to prompt protein sequence generation.
1 code implementation • ICCV 2023 • Xiaotong Li, Zixuan Hu, Yixiao Ge, Ying Shan, Ling-Yu Duan
The experimental results on 10 downstream tasks and 12 self-supervised models demonstrate that our approach can seamlessly integrate into existing ranking techniques and enhance their performances, revealing its effectiveness for the model selection task and its potential for understanding the mechanism in transfer learning.
1 code implementation • 16 Jan 2023 • Xiaotong Li, Zixuan Hu, Jun Liu, Yixiao Ge, Yongxing Dai, Ling-Yu Duan
In this paper, we improve the network generalization ability by modeling domain shifts with uncertainty (DSU), i. e., characterizing the feature statistics as uncertain distributions during training.
1 code implementation • 19 May 2022 • Kun Yi, Yixiao Ge, Xiaotong Li, Shusheng Yang, Dian Li, Jianping Wu, Ying Shan, XiaoHu Qie
Since the development of self-supervised visual representation learning from contrastive learning to masked image modeling (MIM), there is no significant difference in essence, that is, how to design proper pretext tasks for vision dictionary look-up.
1 code implementation • 29 Mar 2022 • Xiaotong Li, Yixiao Ge, Kun Yi, Zixuan Hu, Ying Shan, Ling-Yu Duan
Image BERT pre-training with masked image modeling (MIM) becomes a popular practice to cope with self-supervised representation learning.
1 code implementation • ICLR 2022 • Xiaotong Li, Yongxing Dai, Yixiao Ge, Jun Liu, Ying Shan, Ling-Yu Duan
In this paper, we improve the network generalization ability by modeling the uncertainty of domain shifts with synthesized feature statistics during training.
no code implementations • CVPR 2021 • Yongxing Dai, Xiaotong Li, Jun Liu, Zekun Tong, Ling-Yu Duan
Specifically, we propose a decorrelation loss to make the source domain networks (experts) keep the diversity and discriminability of individual domains' characteristics.