Search Results for author: Chengyou Jia

Found 14 papers, 1 papers with code

PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning

no code implementations17 Feb 2025 Xinyu Zhang, Yuxuan Dong, Yanrui Wu, Jiaxing Huang, Chengyou Jia, Basura Fernando, Mike Zheng Shou, Lingling Zhang, Jun Liu

These findings position PhysReason as a novel and comprehensive benchmark for evaluating physics-based reasoning capabilities in large language models.

ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting

no code implementations26 Nov 2024 Chengyou Jia, Changliang Xia, Zhuohang Dang, Weijia Wu, Hangwei Qian, Minnan Luo

Despite the significant advancements in text-to-image (T2I) generative models, users often face a trial-and-error challenge in practical scenarios.

Text-to-Image Generation

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

1 code implementation30 Oct 2024 Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, Yu Qiao

Existing efforts in building GUI agents heavily rely on the availability of robust commercial Vision-Language Models (VLMs) such as GPT-4o and GeminiProVision.

Natural Language Visual Grounding

AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

no code implementations24 Oct 2024 Chengyou Jia, Minnan Luo, Zhuohang Dang, Qiushi Sun, Fangzhi Xu, Junlin Hu, Tianbao Xie, Zhiyong Wu

Comprehensive quantitative and qualitative results further demonstrate AgentStore's ability to enhance agent systems in both generalization and specialization, underscoring its potential for developing the specialized generalist computer assistant.

Disentangled Noisy Correspondence Learning

no code implementations10 Aug 2024 Zhuohang Dang, Minnan Luo, Jihong Wang, Chengyou Jia, Haochen Han, Herun Wan, Guang Dai, Xiaojun Chang, Jingdong Wang

Moreover, although intuitive, directly applying previous cross-modal disentanglement methods suffers from limited noise tolerance and disentanglement efficacy.

cross-modal alignment Cross-Modal Retrieval +2

Disentangled Representation Learning with Transmitted Information Bottleneck

no code implementations3 Nov 2023 Zhuohang Dang, Minnan Luo, Chengyou Jia, Guang Dai, Jihong Wang, Xiaojun Chang, Jingdong Wang

Encoding only the task-related information from the raw data, \ie, disentangled representation learning, can greatly contribute to the robustness and generalizability of models.

Disentanglement Variational Inference

PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement

no code implementations20 Sep 2023 Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Jingdong Wang

Dominant Person Search methods aim to localize and recognize query persons in a unified network, which jointly optimizes two sub-tasks, \ie, pedestrian detection and Re-IDentification (ReID).

Denoising Pedestrian Detection +2

SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation

no code implementations20 Aug 2023 Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Mengmeng Wang, Jingdong Wang

Despite significant progress in Text-to-Image (T2I) generative models, even lengthy and complex text descriptions still struggle to convey detailed controls.

Diversity Form +1

Multi-Modality Multi-Scale Cardiovascular Disease Subtypes Classification Using Raman Image and Medical History

no code implementations18 Apr 2023 Bo Yu, Hechang Chen, Chengyou Jia, Hongren Zhou, Lele Cong, Xiankai Li, Jianhui Zhuang, Xianling Cong

Second, a probability matrix and a weight matrix are used to enhance the classification capacity by combining the RS and medical history data in the multi-modality data fusion module.

Deep Learning Specificity

Disentangled Generation with Information Bottleneck for Few-Shot Learning

no code implementations29 Nov 2022 Zhuohang Dang, Jihong Wang, Minnan Luo, Chengyou Jia, Caixia Yan, Qinghua Zheng

To these challenges, we propose a novel Information Bottleneck (IB) based Disentangled Generation Framework for FSL, termed as DisGenIB, that can simultaneously guarantee the discrimination and diversity of generated samples.

Disentanglement Diversity +1

Cannot find the paper you are looking for? You can Submit a new open access paper.