Search Results for author: Chi Chen

Found 37 papers, 27 papers with code

Towards Class-wise Fair Adversarial Training via Anti-Bias Soft Label Distillation

1 code implementation10 Jun 2025 Shiji Zhao, Chi Chen, Ranjie Duan, Xizhe Wang, Xingxing Wei

Adversarial Training (AT) is widely recognized as an effective approach to enhance the adversarial robustness of Deep Neural Networks.

Adversarial Robustness Fairness +1

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

1 code implementation2 Jun 2025 Zhong Zhang, Yaxi Lu, Yikun Fu, Yupeng Huo, Shenzhi Yang, Yesai Wu, Han Si, Xin Cong, Haotian Chen, Yankai Lin, Jie Xie, Wei Zhou, Wang Xu, Yuanheng Zhang, Zhou Su, Zhongwu Zhai, Xiaoming Liu, Yudong Mei, JianMing Xu, Hongyan Tian, Chongyi Wang, Chi Chen, Yuan YAO, Zhiyuan Liu, Maosong Sun

The recent progress of large language model agents has opened new possibilities for automating tasks through graphical user interfaces (GUIs), especially in mobile environments where intelligent interaction can greatly enhance usability.

AI Agent Diversity +1

SPPSFormer: High-quality Superpoint-based Transformer for Roof Plane Instance Segmentation from Point Clouds

no code implementations30 May 2025 Cheng Zeng, Xiatian Qi, Chi Chen, Kai Sun, Wangle Zhang, Yuxuan Liu, Yan Meng, Bisheng Yang

Transformers have been seldom employed in point cloud roof plane instance segmentation, which is the focus of this study, and existing superpoint Transformers suffer from limited performance due to the use of low-quality superpoints.

Data Augmentation Instance Segmentation +2

Visual Abstract Thinking Empowers Multimodal Reasoning

1 code implementation26 May 2025 Dairu Liu, Ziyue Wang, Minyuan Ruan, Fuwen Luo, Chi Chen, Peng Li, Yang Liu

Images usually convey richer detail than text, but often include redundant information which potentially downgrades multimodal reasoning performance.

Multimodal Reasoning Relational Reasoning +1

ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs' Capability via Chart Editing

1 code implementation17 May 2025 Xuanle Zhao, Xuexin Liu, Haoyue Yang, Xianzhen Luo, Fanhu Zeng, Jianling Li, Qi Shi, Chi Chen

In contrast, small-scale models, including chart-domain models, struggle both with following editing instructions and generating overall chart images, underscoring the need for further development in this area.

Chart Understanding

Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model

1 code implementation10 May 2025 Xinyue Lou, You Li, Jinan Xu, Xiangyu Shi, Chi Chen, Kaiyu Huang

To address this gap, we conduct a comprehensive and systematic safety evaluation of 11 MLRMs across 5 benchmarks and unveil prevalent safety degradation phenomena in most advanced models.

Safety Alignment

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

1 code implementation17 Mar 2025 Xinyu Ma, Ziyang Ding, Zhicong Luo, Chi Chen, Zonghao Guo, Derek F. Wong, Xiaoyi Feng, Maosong Sun

Human experts excel at fine-grained visual discrimination by leveraging domain knowledge to refine perceptual features, a capability that remains underdeveloped in current Multimodal Large Language Models (MLLMs).

Domain Generalization Multimodal Reasoning +1

Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition

1 code implementation16 Mar 2025 Xiaoying Zhang, Da Peng, YiPeng Zhang, Zonghao Guo, Chengyue Wu, Chi Chen, Wei Ke, Helen Meng, Maosong Sun

These curated samples are subsequently used for large-scale multimodal pre-training, completing a self-learning cycle that strengthens the model's cognitive foundation.

Caption Generation Image Captioning +2

How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game

1 code implementation13 Mar 2025 Ziyue Wang, Yurui Dong, Fuwen Luo, Minyuan Ruan, Zhili Cheng, Chi Chen, Peng Li, Yang Liu

However, existing evaluations primarily assess the final task completion, often degrading assessments to isolated abilities such as visual grounding and visual question answering.

Multimodal Reasoning Question Answering +3

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

1 code implementation11 Jan 2025 Xuanle Zhao, Xianzhen Luo, Qi Shi, Chi Chen, Shuo Wang, Wanxiang Che, Zhiyuan Liu, Maosong Sun

: (1) Low executability and poor restoration of chart details in the generated code and (2) Lack of large-scale and diverse training data.

Chart Understanding Code Generation +4

Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models

no code implementations10 Jan 2025 You Li, Heyu Huang, Chi Chen, Kaiyu Huang, Chao Huang, Zonghao Guo, Zhiyuan Liu, Jinan Xu, Yuhua Li, Ruixuan Li, Maosong Sun

The recent advancement of Multimodal Large Language Models (MLLMs) has significantly improved their fine-grained perception of single images and general comprehension across multiple images.

Form Image Comprehension +1

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

1 code implementation18 Dec 2024 YiPeng Zhang, Yifan Liu, Zonghao Guo, Yidan Zhang, Xuesong Yang, Chi Chen, Jun Song, Bo Zheng, Yuan YAO, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

To address this issue, we present LLaVA-UHD v2, an advanced MLLM centered around a Hierarchical window transformer that enables capturing diverse visual granularity by constructing and integrating a high-resolution feature pyramid.

Attribute Text Generation

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding

1 code implementation6 Nov 2024 Junming Lin, Zheng Fang, Chi Chen, Zihao Wan, Fuwen Luo, Peng Li, Yang Liu, Maosong Sun

In this paper, we introduce StreamingBench, the first comprehensive benchmark designed to evaluate the streaming video understanding capabilities of MLLMs.

Image Comprehension Video Understanding

PlaneSAM: Multimodal Plane Instance Segmentation Using the Segment Anything Model

no code implementations21 Oct 2024 Zhongchen Deng, Zhechen Yang, Chi Chen, Cheng Zeng, Yan Meng, Bisheng Yang

Based on EfficientSAM, a fast version of SAM, we propose a plane instance segmentation network called PlaneSAM, which can fully integrate the information of the RGB bands (spectral bands) and the D band (geometric band), thereby improving the effectiveness of plane instance segmentation in a multimodal manner.

Instance Segmentation Plane Instance Segmentation +2

ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models

1 code implementation7 Oct 2024 Ziyue Wang, Chi Chen, Fuwen Luo, Yurui Dong, Yuanchi Zhang, Yuzhuang Xu, Xiaolong Wang, Peng Li, Yang Liu

Active perception, a crucial human capability, involves setting a goal based on the current understanding of the environment and performing actions to achieve that goal.

Question Answering Visual Question Answering

Co-Fix3D: Enhancing 3D Object Detection with Collaborative Refinement

1 code implementation15 Aug 2024 Wenxuan Li, Qin Zou, Chi Chen, Bo Du, Long Chen, Jian Zhou, Hongkai Yu

3D object detection in driving scenarios faces the challenge of complex road environments, which can lead to the loss or incompleteness of key features, thereby affecting perception performance.

3D Object Detection Autonomous Driving +2

PagPassGPT: Pattern Guided Password Guessing via Generative Pretrained Transformer

1 code implementation7 Apr 2024 Xingyu Su, Xiaojie Zhu, Yang Li, Yong Li, Chi Chen, Paulo Esteves-Veríssimo

Amidst the surge in deep learning-based password guessing models, challenges of generating high-quality passwords and reducing duplicate passwords persist.

Goldfish: An Efficient Federated Unlearning Framework

1 code implementation4 Apr 2024 Houzhe Wang, Xiaojie Zhu, Chi Chen, Paulo Esteves-Veríssimo

To address the challenge of low validity in existing machine unlearning algorithms, we propose a novel loss function.

Knowledge Distillation Machine Unlearning

Model Composition for Multimodal Large Language Models

1 code implementation20 Feb 2024 Chi Chen, Yiyang Du, Zheng Fang, Ziyue Wang, Fuwen Luo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Maosong Sun, Yang Liu

In this paper, we propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model.

model

Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion

1 code implementation19 Feb 2024 Ziyue Wang, Chi Chen, Yiqi Zhu, Fuwen Luo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Maosong Sun, Yang Liu

With the bloom of Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) that incorporate LLMs with pre-trained vision models have recently demonstrated impressive performance across diverse vision-language tasks.

Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions

1 code implementation20 Nov 2023 Ziyue Wang, Chi Chen, Peng Li, Yang Liu

Large Language Models (LLMs) demonstrate impressive reasoning ability and the maintenance of world knowledge not only in natural language tasks, but also in some vision-language tasks such as open-domain knowledge-based visual question answering (OK-VQA).

Question Answering Visual Question Answering +1

Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models

1 code implementation25 Aug 2023 Chi Chen, Ruoyu Qin, Fuwen Luo, Xiaoyue Mi, Peng Li, Maosong Sun, Yang Liu

However, existing visual instruction tuning methods only utilize image-language instruction data to align the language and image modalities, lacking a more fine-grained cross-modal alignment.

cross-modal alignment Position

Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning

no code implementations8 Jun 2023 Shuxin Zheng, Jiyan He, Chang Liu, Yu Shi, Ziheng Lu, Weitao Feng, Fusong Ju, Jiaxi Wang, Jianwei Zhu, Yaosen Min, He Zhang, Shidi Tang, Hongxia Hao, Peiran Jin, Chi Chen, Frank Noé, Haiguang Liu, Tie-Yan Liu

In this paper, we introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.

Deep Learning

Weakly Supervised Vision-and-Language Pre-training with Relative Representations

no code implementations24 May 2023 Chi Chen, Peng Li, Maosong Sun, Yang Liu

Weakly supervised vision-and-language pre-training (WVLP), which learns cross-modal representations with limited cross-modal supervision, has been shown to effectively reduce the data cost of pre-training while maintaining decent performance on downstream tasks.

Retrieval

CG-SSD: Corner Guided Single Stage 3D Object Detection from LiDAR Point Cloud

1 code implementation24 Feb 2022 Ruiqi Ma, Chi Chen, Bisheng Yang, Deren Li, Haiping Wang, Yangzi Cong, Zongtian Hu

At present, the anchor-based or anchor-free models that use LiDAR point clouds for 3D object detection use the center assigner strategy to infer the 3D bounding boxes.

3D Object Detection Object +1

Efficient DNA-driven nanocavities for approaching quasi-deterministic strong coupling to a few fluorophores

no code implementations11 Mar 2021 Wan-Ping Chan, Jyun-Hong Chen, Wei-Lun Chou, Wen-Yuan Chen, Hao-Yu Liu, Hsiao-Ching Hu, Chien-Chung Jeng, Jie-Ren Li, Chi Chen, Shiuan-Yeh Chen

Strong coupling between light and matter is the foundation of promising quantum photonic devices such as deterministic single photon sources, single atom lasers and photonic quantum gates, which consist of an atom and a photonic cavity.

Optics Quantum Physics

AtomSets -- A Hierarchical Transfer Learning Framework for Small and Large Materials Datasets

1 code implementation4 Feb 2021 Chi Chen, Shyue Ping Ong

Here we leverage the transfer learning concept and the graph network deep learning framework and develop the AtomSets machine learning framework for consistent high model accuracy at both small and large materials data.

Feature Engineering Transfer Learning Materials Science

Mask-Align: Self-Supervised Neural Word Alignment

1 code implementation ACL 2021 Chi Chen, Maosong Sun, Yang Liu

Word alignment, which aims to align translationally equivalent words between source and target sentences, plays an important role in many natural language processing tasks.

Machine Translation Translation +1

Holistic Combination of Structural and Textual Code Information for Context based API Recommendation

no code implementations15 Oct 2020 Chi Chen, Xin Peng, Zhenchang Xing, Jun Sun, Xin Wang, Yifan Zhao, Wenyun Zhao

APIRec-CST is a deep learning model that combines the API usage with the text information in the source code based on an API Context Graph Network and a Code Token Network that simultaneously learn structural and textual features for API recommendation.

Learning Properties of Ordered and Disordered Materials from Multi-fidelity Data

3 code implementations9 May 2020 Chi Chen, Yunxing Zuo, Weike Ye, Xiangguo Li, Shyue Ping Ong

Predicting the properties of a material from the arrangement of its atoms is a fundamental goal in materials science.

Materials Science Disordered Systems and Neural Networks

Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals

3 code implementations Chem. Mater. 2018 Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, Shyue Ping Ong

Similarly, we show that MEGNet models trained on $\sim 60, 000$ crystals in the Materials Project substantially outperform prior ML models in the prediction of the formation energies, band gaps and elastic moduli of crystals, achieving better than DFT accuracy over a much larger data set.

Drug Discovery Formation Energy Materials Science Computational Physics

Automated Generation and Ensemble-Learned Matching of X-ray Absorption Spectra

no code implementations6 Nov 2017 Chen Zheng, Kiran Mathew, Chi Chen, Yiming Chen, Hanmei Tang, Alan Dozier, Joshua J. Kas, Fernando D. Vila, John J. Rehr, Louis F. J. Piper, Kristin Persson, Shyue Ping Ong

We report the development of XASdb, a large database of computed reference X-ray absorption spectra (XAS), and a novel Ensemble-Learned Spectra IdEntification (ELSIE) algorithm for the matching of spectra.

Materials Science

Cannot find the paper you are looking for? You can Submit a new open access paper.