1 code implementation • 10 Jun 2025 • Shiji Zhao, Chi Chen, Ranjie Duan, Xizhe Wang, Xingxing Wei
Adversarial Training (AT) is widely recognized as an effective approach to enhance the adversarial robustness of Deep Neural Networks.
1 code implementation • 2 Jun 2025 • Zhong Zhang, Yaxi Lu, Yikun Fu, Yupeng Huo, Shenzhi Yang, Yesai Wu, Han Si, Xin Cong, Haotian Chen, Yankai Lin, Jie Xie, Wei Zhou, Wang Xu, Yuanheng Zhang, Zhou Su, Zhongwu Zhai, Xiaoming Liu, Yudong Mei, JianMing Xu, Hongyan Tian, Chongyi Wang, Chi Chen, Yuan YAO, Zhiyuan Liu, Maosong Sun
The recent progress of large language model agents has opened new possibilities for automating tasks through graphical user interfaces (GUIs), especially in mobile environments where intelligent interaction can greatly enhance usability.
no code implementations • 30 May 2025 • Cheng Zeng, Xiatian Qi, Chi Chen, Kai Sun, Wangle Zhang, Yuxuan Liu, Yan Meng, Bisheng Yang
Transformers have been seldom employed in point cloud roof plane instance segmentation, which is the focus of this study, and existing superpoint Transformers suffer from limited performance due to the use of low-quality superpoints.
1 code implementation • 27 May 2025 • Fuwen Luo, Shengfeng Lou, Chi Chen, Ziyue Wang, Chenliang Li, Weizhou Shen, Jiyue Guo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu
Video temporal understanding is crucial for multimodal large language models (MLLMs) to reason over events in videos.
1 code implementation • 26 May 2025 • Dairu Liu, Ziyue Wang, Minyuan Ruan, Fuwen Luo, Chi Chen, Peng Li, Yang Liu
Images usually convey richer detail than text, but often include redundant information which potentially downgrades multimodal reasoning performance.
1 code implementation • 17 May 2025 • Xuanle Zhao, Xuexin Liu, Haoyue Yang, Xianzhen Luo, Fanhu Zeng, Jianling Li, Qi Shi, Chi Chen
In contrast, small-scale models, including chart-domain models, struggle both with following editing instructions and generating overall chart images, underscoring the need for further development in this area.
1 code implementation • 10 May 2025 • Xinyue Lou, You Li, Jinan Xu, Xiangyu Shi, Chi Chen, Kaiyu Huang
To address this gap, we conduct a comprehensive and systematic safety evaluation of 11 MLRMs across 5 benchmarks and unveil prevalent safety degradation phenomena in most advanced models.
1 code implementation • CVPR 2025 • Yiyang Du, Xiaochen Wang, Chi Chen, Jiabo Ye, Yiru Wang, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Zhifang Sui, Maosong Sun, Yang Liu
Specifically, we first design mapping function between models to apply model merging on MLLMs with different architecture.
1 code implementation • 17 Mar 2025 • Xinyu Ma, Ziyang Ding, Zhicong Luo, Chi Chen, Zonghao Guo, Derek F. Wong, Xiaoyi Feng, Maosong Sun
Human experts excel at fine-grained visual discrimination by leveraging domain knowledge to refine perceptual features, a capability that remains underdeveloped in current Multimodal Large Language Models (MLLMs).
1 code implementation • 16 Mar 2025 • Xiaoying Zhang, Da Peng, YiPeng Zhang, Zonghao Guo, Chengyue Wu, Chi Chen, Wei Ke, Helen Meng, Maosong Sun
These curated samples are subsequently used for large-scale multimodal pre-training, completing a self-learning cycle that strengthens the model's cognitive foundation.
1 code implementation • 13 Mar 2025 • Xingfei Wei, Qiankun Mo, Chi Chen, Mark Bathe, Rigoberto Hernandez
Artificial intelligence (AI) models remain an emerging strategy to accelerate materials design and development.
1 code implementation • 13 Mar 2025 • Ziyue Wang, Yurui Dong, Fuwen Luo, Minyuan Ruan, Zhili Cheng, Chi Chen, Peng Li, Yang Liu
However, existing evaluations primarily assess the final task completion, often degrading assessments to isolated abilities such as visual grounding and visual question answering.
1 code implementation • 11 Jan 2025 • Xuanle Zhao, Xianzhen Luo, Qi Shi, Chi Chen, Shuo Wang, Wanxiang Che, Zhiyuan Liu, Maosong Sun
: (1) Low executability and poor restoration of chart details in the generated code and (2) Lack of large-scale and diverse training data.
no code implementations • 10 Jan 2025 • You Li, Heyu Huang, Chi Chen, Kaiyu Huang, Chao Huang, Zonghao Guo, Zhiyuan Liu, Jinan Xu, Yuhua Li, Ruixuan Li, Maosong Sun
The recent advancement of Multimodal Large Language Models (MLLMs) has significantly improved their fine-grained perception of single images and general comprehension across multiple images.
no code implementations • 9 Jan 2025 • Shiji Zhao, Ranjie Duan, Fengxiang Wang, Chi Chen, Caixin Kang, Jialing Tao, Yuefeng Chen, Hui Xue, Xingxing Wei
Despite achieving some progress, they have a low attack success rate on commercial closed-source MLLMs.
1 code implementation • 18 Dec 2024 • YiPeng Zhang, Yifan Liu, Zonghao Guo, Yidan Zhang, Xuesong Yang, Chi Chen, Jun Song, Bo Zheng, Yuan YAO, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun
To address this issue, we present LLaVA-UHD v2, an advanced MLLM centered around a Hierarchical window transformer that enables capturing diverse visual granularity by constructing and integrating a high-resolution feature pyramid.
1 code implementation • 6 Nov 2024 • Junming Lin, Zheng Fang, Chi Chen, Zihao Wan, Fuwen Luo, Peng Li, Yang Liu, Maosong Sun
In this paper, we introduce StreamingBench, the first comprehensive benchmark designed to evaluate the streaming video understanding capabilities of MLLMs.
no code implementations • 21 Oct 2024 • Zhongchen Deng, Zhechen Yang, Chi Chen, Cheng Zeng, Yan Meng, Bisheng Yang
Based on EfficientSAM, a fast version of SAM, we propose a plane instance segmentation network called PlaneSAM, which can fully integrate the information of the RGB bands (spectral bands) and the D band (geometric band), thereby improving the effectiveness of plane instance segmentation in a multimodal manner.
1 code implementation • 7 Oct 2024 • Ziyue Wang, Chi Chen, Fuwen Luo, Yurui Dong, Yuanchi Zhang, Yuzhuang Xu, Xiaolong Wang, Peng Li, Yang Liu
Active perception, a crucial human capability, involves setting a goal based on the current understanding of the environment and performing actions to achieve that goal.
1 code implementation • 15 Aug 2024 • Wenxuan Li, Qin Zou, Chi Chen, Bo Du, Long Chen, Jian Zhou, Hongkai Yu
3D object detection in driving scenarios faces the challenge of complex road environments, which can lead to the loss or incompleteness of key features, thereby affecting perception performance.
1 code implementation • 7 Apr 2024 • Xingyu Su, Xiaojie Zhu, Yang Li, Yong Li, Chi Chen, Paulo Esteves-Veríssimo
Amidst the surge in deep learning-based password guessing models, challenges of generating high-quality passwords and reducing duplicate passwords persist.
1 code implementation • 4 Apr 2024 • Houzhe Wang, Xiaojie Zhu, Chi Chen, Paulo Esteves-Veríssimo
To address the challenge of low validity in existing machine unlearning algorithms, we propose a novel loss function.
no code implementations • 21 Feb 2024 • Fuwen Luo, Chi Chen, Zihao Wan, Zhaolu Kang, Qidong Yan, Yingjie Li, Xiaolong Wang, Siyu Wang, Ziyue Wang, Xiaoyue Mi, Peng Li, Ning Ma, Maosong Sun, Yang Liu
Multimodal large language models (MLLMs) have demonstrated promising results in a variety of tasks that combine vision and language.
1 code implementation • 20 Feb 2024 • Chi Chen, Yiyang Du, Zheng Fang, Ziyue Wang, Fuwen Luo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Maosong Sun, Yang Liu
In this paper, we propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model.
1 code implementation • 19 Feb 2024 • Ziyue Wang, Chi Chen, Yiqi Zhu, Fuwen Luo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Maosong Sun, Yang Liu
With the bloom of Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) that incorporate LLMs with pre-trained vision models have recently demonstrated impressive performance across diverse vision-language tasks.
1 code implementation • 20 Nov 2023 • Ziyue Wang, Chi Chen, Peng Li, Yang Liu
Large Language Models (LLMs) demonstrate impressive reasoning ability and the maintenance of world knowledge not only in natural language tasks, but also in some vision-language tasks such as open-domain knowledge-based visual question answering (OK-VQA).
1 code implementation • 25 Aug 2023 • Chi Chen, Ruoyu Qin, Fuwen Luo, Xiaoyue Mi, Peng Li, Maosong Sun, Yang Liu
However, existing visual instruction tuning methods only utilize image-language instruction data to align the language and image modalities, lacking a more fine-grained cross-modal alignment.
no code implementations • 8 Jun 2023 • Shuxin Zheng, Jiyan He, Chang Liu, Yu Shi, Ziheng Lu, Weitao Feng, Fusong Ju, Jiaxi Wang, Jianwei Zhu, Yaosen Min, He Zhang, Shidi Tang, Hongxia Hao, Peiran Jin, Chi Chen, Frank Noé, Haiguang Liu, Tie-Yan Liu
In this paper, we introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
no code implementations • 24 May 2023 • Chi Chen, Peng Li, Maosong Sun, Yang Liu
Weakly supervised vision-and-language pre-training (WVLP), which learns cross-modal representations with limited cross-modal supervision, has been shown to effectively reduce the data cost of pre-training while maintaining decent performance on downstream tasks.
1 code implementation • 24 Feb 2022 • Ruiqi Ma, Chi Chen, Bisheng Yang, Deren Li, Haiping Wang, Yangzi Cong, Zongtian Hu
At present, the anchor-based or anchor-free models that use LiDAR point clouds for 3D object detection use the center assigner strategy to infer the 3D bounding boxes.
no code implementations • 11 Mar 2021 • Wan-Ping Chan, Jyun-Hong Chen, Wei-Lun Chou, Wen-Yuan Chen, Hao-Yu Liu, Hsiao-Ching Hu, Chien-Chung Jeng, Jie-Ren Li, Chi Chen, Shiuan-Yeh Chen
Strong coupling between light and matter is the foundation of promising quantum photonic devices such as deterministic single photon sources, single atom lasers and photonic quantum gates, which consist of an atom and a photonic cavity.
Optics Quantum Physics
1 code implementation • 4 Feb 2021 • Chi Chen, Shyue Ping Ong
Here we leverage the transfer learning concept and the graph network deep learning framework and develop the AtomSets machine learning framework for consistent high model accuracy at both small and large materials data.
Feature Engineering
Transfer Learning
Materials Science
1 code implementation • ACL 2021 • Chi Chen, Maosong Sun, Yang Liu
Word alignment, which aims to align translationally equivalent words between source and target sentences, plays an important role in many natural language processing tasks.
no code implementations • 15 Oct 2020 • Chi Chen, Xin Peng, Zhenchang Xing, Jun Sun, Xin Wang, Yifan Zhao, Wenyun Zhao
APIRec-CST is a deep learning model that combines the API usage with the text information in the source code based on an API Context Graph Network and a Code Token Network that simultaneously learn structural and textual features for API recommendation.
3 code implementations • 9 May 2020 • Chi Chen, Yunxing Zuo, Weike Ye, Xiangguo Li, Shyue Ping Ong
Predicting the properties of a material from the arrangement of its atoms is a fundamental goal in materials science.
Materials Science Disordered Systems and Neural Networks
3 code implementations • Chem. Mater. 2018 • Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, Shyue Ping Ong
Similarly, we show that MEGNet models trained on $\sim 60, 000$ crystals in the Materials Project substantially outperform prior ML models in the prediction of the formation energies, band gaps and elastic moduli of crystals, achieving better than DFT accuracy over a much larger data set.
Ranked #5 on
Formation Energy
on Materials Project
Drug Discovery
Formation Energy
Materials Science
Computational Physics
no code implementations • 6 Nov 2017 • Chen Zheng, Kiran Mathew, Chi Chen, Yiming Chen, Hanmei Tang, Alan Dozier, Joshua J. Kas, Fernando D. Vila, John J. Rehr, Louis F. J. Piper, Kristin Persson, Shyue Ping Ong
We report the development of XASdb, a large database of computed reference X-ray absorption spectra (XAS), and a novel Ensemble-Learned Spectra IdEntification (ELSIE) algorithm for the matching of spectra.
Materials Science