Search Results for author: Jiaxing Huang

Found 61 papers, 34 papers with code

Airalogy: AI-empowered universal data digitization for research automation

1 code implementation23 Jun 2025 Zijie Yang, Qiji Zhou, Fang Guo, Sijie Zhang, Yexun Xi, Jinglei Nie, Yudian Zhu, Liping Huang, Chou Wu, Yonghe Xia, Xiaoyu Ma, Yingming Pu, Panzhong Lu, Junshu Pan, Mingtao Chen, Tiannan Guo, Yanmei Dou, Hongyu Chen, Anping Zeng, Jiaxing Huang, Tian Xu, Yue Zhang

In this study, we address these challenges by developing Airalogy (https://airalogy. com), the world's first AI- and community-driven platform that balances universality and standardization for digitizing research data across multiple disciplines.

Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval

no code implementations26 May 2025 Rong-Cheng Tu, Wenhao Sun, Hanzhe You, Yingjie Wang, Jiaxing Huang, Li Shen, DaCheng Tao

Zero-Shot Composed Image Retrieval (ZS-CIR) aims to retrieve target images given a compositional query, consisting of a reference image and a modifying text-without relying on annotated training data.

Contrastive Learning Image Retrieval +3

R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search

1 code implementation22 May 2025 Yibo Wang, Li Shen, Huanjin Yao, Tiansheng Huang, Rui Liu, Naiqiang Tan, Jiaxing Huang, Kai Zhang, DaCheng Tao

Chain-of-Thought (CoT) reasoning enhances large language models (LLMs) by enabling step-by-step problem-solving, yet its extension to Long-CoT introduces substantial computational overhead due to increased token length.

R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO

2 code implementations22 May 2025 Huanjin Yao, Qixiang Yin, Jingyi Zhang, Min Yang, Yibo Wang, Wenhao Wu, Fei Su, Li Shen, Minghui Qiu, DaCheng Tao, Jiaxing Huang

To this end, we propose Share-GRPO, a novel RL approach that tackle these issues by exploring and sharing diverse reasoning trajectories over expanded question space.

Reinforcement Learning (RL)

Training-Free Text-Guided Image Editing with Visual Autoregressive Model

1 code implementation31 Mar 2025 YuFei Wang, Lanqing Guo, Zhihao LI, Jiaxing Huang, Pichao Wang, Bihan Wen, Jian Wang

Text-guided image editing is an essential task that enables users to modify images through natural language descriptions.

text-guided-image-editing

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

1 code implementation17 Mar 2025 Jingyi Zhang, Jiaxing Huang, Huanjin Yao, Shunyu Liu, Xikun Zhang, Shijian Lu, DaCheng Tao

Recent studies generally enhance MLLMs' reasoning capabilities via supervised fine-tuning on high-quality chain-of-thought reasoning data, which often leads models to merely imitate successful reasoning paths without understanding what the wrong reasoning paths are.

Reasoning with Reinforced Functional Token Tuning

1 code implementation19 Feb 2025 Kongcheng Zhang, Qi Yao, Baisheng Lai, Jiaxing Huang, Wenkai Fang, DaCheng Tao, Mingli Song, Shunyu Liu

Specifically, RFTT comprises two phases: (1) supervised fine-tuning performs prompt-driven tree search to obtain self-generated training data annotated with functional tokens, which warms up the model to learn these tokens for reasoning; and (2) online reinforcement learning further allows the model to explore different reasoning pathways through functional token sampling without relying on prompts, thereby facilitating effective self-improvement for functional reasoning.

Math

PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning

no code implementations17 Feb 2025 Xinyu Zhang, Yuxuan Dong, Yanrui Wu, Jiaxing Huang, Chengyou Jia, Basura Fernando, Mike Zheng Shou, Lingling Zhang, Jun Liu

These findings position PhysReason as a novel and comprehensive benchmark for evaluating physics-based reasoning capabilities in large language models.

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

2 code implementations24 Dec 2024 Huanjin Yao, Jiaxing Huang, Wenhao Wu, Jingyi Zhang, Yibo Wang, Shunyu Liu, Yingjie Wang, Yuxin Song, Haocheng Feng, Li Shen, DaCheng Tao

Using CoMCTS, we construct Mulberry-260k, a multimodal dataset with a tree of rich, explicit and well-defined reasoning nodes for each question.

SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing

no code implementations28 Nov 2024 Rong-Cheng Tu, Wenhao Sun, Zhao Jin, Jingyi Liao, Jiaxing Huang, DaCheng Tao

While open-source video generation and editing models have made significant progress, individual models are typically limited to specific tasks, failing to meet the diverse needs of users.

Intent Recognition Model Selection +1

A Survey on Vision Autoregressive Model

no code implementations13 Nov 2024 Kai Jiang, Jiaxing Huang

Autoregressive models have demonstrated great performance in natural language processing (NLP) with impressive scalability, adaptability and generalizability.

3D Generation Benchmarking +6

Open-Vocabulary Object Detection via Language Hierarchy

no code implementations27 Oct 2024 Jiaxing Huang, Jingyi Zhang, Kai Jiang, Shijian Lu

LHST expands the image-level labels with language hierarchy and enables co-regularization between the expanded labels and self-training.

Object object-detection +2

Historical Test-time Prompt Tuning for Vision Foundation Models

no code implementations27 Oct 2024 Jingyi Zhang, Jiaxing Huang, Xiaoqin Zhang, Ling Shao, Shijian Lu

Test-time prompt tuning, which learns prompts online with unlabelled test samples during the inference stage, has demonstrated great potential by learning effective prompts on-the-fly without requiring any task-specific annotations.

image-classification Image Classification +4

LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models

1 code implementation13 Oct 2024 Han Qiu, Jiaxing Huang, Peng Gao, Qin Qi, Xiaoqin Zhang, Ling Shao, Shijian Lu

Several benchmarks have been created to gauge the hallucination levels of MLLMs, by either raising discriminative questions about the existence of objects or introducing LLM evaluators to score the generated text from MLLMs.

Hallucination Hallucination Evaluation +1

A Survey on Evaluation of Multimodal Large Language Models

no code implementations28 Aug 2024 Jiaxing Huang, Jingyi Zhang

Multimodal Large Language Models (MLLMs) mimic human perception and reasoning system by integrating powerful Large Language Models (LLMs) with various modality encoders (e. g., vision, audio), positioning LLMs as the "brain" and various modality encoders as sensory organs.

AI Agent Survey

Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures

1 code implementation20 Jul 2024 Jiaxing Huang, Yanfeng Zhou, Yaoru Luo, Guole Liu, Heng Guo, Ge Yang

A fundamental property of such structures is their topological self-similarity, which can be quantified by fractal features such as fractal dimension (FD).

Decoder Segmentation

Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model using 3D Whole-body CT Scans

1 code implementation22 Mar 2024 Heng Guo, Jianfeng Zhang, Jiaxing Huang, Tony C. W. Mok, Dazhou Guo, Ke Yan, Le Lu, Dakai Jin, Minfeng Xu

Therefore, we propose two key technical developments: 1) a progressively and spatially aligned prompt encoding method to effectively encode click prompts in local 3D space; and 2) a cross-patch prompt scheme to capture more 3D spatial context, which is beneficial for reducing the editing workloads when interactively prompting on large organs.

Image Segmentation Interactive Segmentation +3

Masked AutoDecoder is Effective Multi-Task Vision Generalist

1 code implementation CVPR 2024 Han Qiu, Jiaxing Huang, Peng Gao, Lewei Lu, Xiaoqin Zhang, Shijian Lu

Inspired by the success of general-purpose models in NLP, recent studies attempt to unify different vision tasks in the same sequence format and employ autoregressive Transformers for sequence prediction.

LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors

no code implementations7 Feb 2024 Sheng Jin, Xueying Jiang, Jiaxing Huang, Lewei Lu, Shijian Lu

This paper presents DVDet, a Descriptor-Enhanced Open Vocabulary Detector that introduces conditional context prompts and hierarchical textual descriptors that enable precise region-text alignment as well as open-vocabulary detection training in general.

image-classification Image Classification +3

Domain Adaptation for Large-Vocabulary Object Detectors

no code implementations13 Jan 2024 Kai Jiang, Jiaxing Huang, Weiying Xie, Jie Lei, Yunsong Li, Ling Shao, Shijian Lu

Large-vocabulary object detectors (LVDs) aim to detect objects of many categories, which learn super objectness features and can locate objects accurately while applied to various downstream data.

Domain Adaptation Knowledge Graphs +2

Learning to Prompt Segment Anything Models

no code implementations9 Jan 2024 Jiaxing Huang, Kai Jiang, Jingyi Zhang, Han Qiu, Lewei Lu, Shijian Lu, Eric Xing

SAMs work with two types of prompts including spatial prompts (e. g., points) and semantic prompts (e. g., texts), which work together to prompt SAMs to segment anything on downstream datasets.

Image Segmentation Prompt Learning +2

Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey

no code implementations27 Dec 2023 Jiaxing Huang, Jingyi Zhang, Kai Jiang, Han Qiu, Shijian Lu

Traditional computer vision generally solves each single task independently by a dedicated model with the task instruction implicitly designed in the model architecture, arising two limitations: (1) it leads to task-specific models, which require multiple models for different tasks and restrict the potential synergies from diverse tasks; (2) it leads to a pre-defined and fixed model interface that has limited interactivity and adaptability in following user' task instructions.

Instruction Following Survey

Domain Generalization via Balancing Training Difficulty and Model Capability

no code implementations ICCV 2023 Xueying Jiang, Jiaxing Huang, Sheng Jin, Shijian Lu

Despite its recent progress, most existing work suffers from the misalignment between the difficulty level of training samples and the capability of contemporarily trained models, leading to over-fitting or under-fitting in the trained generalization model.

Data Augmentation Domain Generalization

Prompt Ensemble Self-training for Open-Vocabulary Domain Adaptation

no code implementations29 Jun 2023 Jiaxing Huang, Jingyi Zhang, Han Qiu, Sheng Jin, Shijian Lu

Traditional domain adaptation assumes the same vocabulary across source and target domains, which often struggles with limited transfer flexibility and efficiency while handling target domains with different vocabularies.

Unsupervised Domain Adaptation

Vision-Language Models for Vision Tasks: A Survey

1 code implementation3 Apr 2023 Jingyi Zhang, Jiaxing Huang, Sheng Jin, Shijian Lu

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks (DNNs) training, and they usually train a DNN for each single visual recognition task, leading to a laborious and time-consuming visual recognition paradigm.

Benchmarking Knowledge Distillation +2

3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds

1 code implementation CVPR 2023 Aoran Xiao, Jiaxing Huang, Weihao Xuan, Ruijie Ren, Kangcheng Liu, Dayan Guan, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing

In addition, we design a domain randomization technique that alternatively randomizes the geometry styles of point clouds and aggregates their embeddings, ultimately leading to a generalizable model that can improve 3DSS under various adverse weather effectively.

3D Semantic Segmentation Autonomous Driving

Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion

1 code implementation28 Jul 2022 Gongjie Zhang, Zhipeng Luo, Jiaxing Huang, Shijian Lu, Eric P. Xing

The recently proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection.

Object object-detection +1

Contextual Text Block Detection towards Scene Text Understanding

no code implementations26 Jul 2022 Chuhui Xue, Jiaxing Huang, Shijian Lu, Changhu Wang, Song Bai

We formulate the new setup by a dual detection task which first detects integral text units and then groups them into a CTB.

text-classification Text Classification +2

Domain Adaptive Video Segmentation via Temporal Pseudo Supervision

1 code implementation6 Jul 2022 Yun Xing, Dayan Guan, Jiaxing Huang, Shijian Lu

Specifically, we design cross-frame pseudo labelling to provide pseudo supervision from previous video frames while learning from the augmented current video frames.

Segmentation Semantic Segmentation +2

Unbiased Subclass Regularization for Semi-Supervised Semantic Segmentation

1 code implementation CVPR 2022 Dayan Guan, Jiaxing Huang, Aoran Xiao, Shijian Lu

We build the balanced subclass distributions by clustering pixels of each original class into multiple subclasses of similar sizes, which provide class-balanced pseudo supervision to regularize the class-biased segmentation.

Segmentation Semi-Supervised Semantic Segmentation

Unsupervised Point Cloud Representation Learning with Deep Neural Networks: A Survey

1 code implementation28 Feb 2022 Aoran Xiao, Jiaxing Huang, Dayan Guan, Xiaoqin Zhang, Shijian Lu, Ling Shao

The convergence of point cloud and DNNs has led to many deep point cloud models, largely trained under the supervision of large-scale and densely-labelled point cloud data.

Autonomous Driving Representation Learning

GenCo: Generative Co-training for Generative Adversarial Networks with Limited Data

1 code implementation4 Oct 2021 Kaiwen Cui, Jiaxing Huang, Zhipeng Luo, Gongjie Zhang, Fangneng Zhan, Shijian Lu

Specifically, we design GenCo, a Generative Co-training network that mitigates the discriminator over-fitting issue by introducing multiple complementary discriminators that provide diverse supervision from multiple distinctive views in training.

Data Augmentation Image Generation

Contextual Text Detection

no code implementations29 Sep 2021 Chuhui Xue, Jiaxing Huang, Wenqing Zhang, Shijian Lu, Song Bai, Changhu Wang

This paper presents Contextual Text Detection, a new setup that detects contextual text blocks for better understanding of texts in scenes.

Text Detection

Domain Adaptive Video Segmentation via Temporal Consistency Regularization

1 code implementation ICCV 2021 Dayan Guan, Jiaxing Huang, Aoran Xiao, Shijian Lu

This paper presents DA-VSN, a domain adaptive video segmentation network that addresses domain gaps in videos by temporal consistency regularization (TCR) for consecutive frames of target-domain videos.

Segmentation Unsupervised Domain Adaptation +1

Transfer Learning from Synthetic to Real LiDAR Point Cloud for Semantic Segmentation

1 code implementation12 Jul 2021 Aoran Xiao, Jiaxing Huang, Dayan Guan, Fangneng Zhan, Shijian Lu

Extensive experiments show that SynLiDAR provides a high-quality data source for studying 3D transfer and the proposed PCT achieves superior point cloud translation consistently across the three setups.

3D Unsupervised Domain Adaptation Data Augmentation +5

Spectral Unsupervised Domain Adaptation for Visual Recognition

no code implementations CVPR 2022 Jingyi Zhang, Jiaxing Huang, Zichen Tian, Shijian Lu

Second, it introduces multi-view spectral learning that learns useful unsupervised representations by maximizing mutual information among multiple ST-generated spectral views of each target sample.

image-classification Image Classification +4

RDA: Robust Domain Adaptation via Fourier Adversarial Attacking

1 code implementation ICCV 2021 Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu

With FAA-generated samples, the training can continue the 'random walk' and drift into an area with a flat loss landscape, leading to more robust domain adaptation.

Unsupervised Domain Adaptation

Semi-Supervised Domain Adaptation via Adaptive and Progressive Feature Alignment

no code implementations5 Jun 2021 Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu

We position the few labeled target samples as references that gauge the similarity between source and target features and guide adaptive inter-domain alignment for learning more similar source features.

Domain Adaptation image-classification +5

Category Contrast for Unsupervised Domain Adaptation in Visual Tasks

1 code implementation CVPR 2022 Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu, Ling Shao

In this work, we explore the idea of instance contrastive learning in unsupervised domain adaptation (UDA) and propose a novel Category Contrast technique (CaCo) that introduces semantic priors on top of instance discrimination for visual UDA tasks.

Contrastive Learning Representation Learning +1

I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition

no code implementations18 May 2021 Chuhui Xue, Jiaxing Huang, Wenqing Zhang, Shijian Lu, Changhu Wang, Song Bai

The first task focuses on image-to-character (I2C) mapping which detects a set of character candidates from images based on different alignments of visual features in an non-sequential way.

Decoder Scene Text Recognition

DA-DETR: Domain Adaptive Detection Transformer with Information Fusion

no code implementations CVPR 2023 Jingyi Zhang, Jiaxing Huang, Zhipeng Luo, Gongjie Zhang, Xiaoqin Zhang, Shijian Lu

DA-DETR introduces a novel CNN-Transformer Blender (CTBlender) that fuses the CNN features and Transformer features ingeniously for effective feature alignment and knowledge transfer across domains.

Domain Adaptation Object +3

MLAN: Multi-Level Adversarial Network for Domain Adaptive Semantic Segmentation

no code implementations24 Mar 2021 Jiaxing Huang, Dayan Guan, Shijian Lu, Aoran Xiao

Recent progresses in domain adaptive semantic segmentation demonstrate the effectiveness of adversarial learning (AL) in unsupervised domain adaptation.

Image-to-Image Translation Semantic Segmentation +2

FSDR: Frequency Space Domain Randomization for Domain Generalization

1 code implementation CVPR 2021 Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu

It has been studied widely by domain randomization that transfers source images to different styles in spatial space for learning domain-agnostic features.

Domain Generalization

Cross-View Regularization for Domain Adaptive Panoptic Segmentation

1 code implementation CVPR 2021 Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu

The inter-task regularization exploits the complementary nature of instance segmentation and semantic segmentation and uses it as a constraint for better feature alignment across domains.

Domain Adaptation Instance Segmentation +2

FPS-Net: A Convolutional Fusion Network for Large-Scale LiDAR Point Cloud Segmentation

1 code implementation1 Mar 2021 Aoran Xiao, Xiaofei Yang, Shijian Lu, Dayan Guan, Jiaxing Huang

Specifically, we design a residual dense block with multiple receptive fields as a building block in the encoder which preserves detailed information in each modality and learns hierarchical modality-specific and fused features effectively.

3D Semantic Segmentation Decoder +3

Uncertainty-Aware Unsupervised Domain Adaptation in Object Detection

3 code implementations27 Feb 2021 Dayan Guan, Jiaxing Huang, Aoran Xiao, Shijian Lu, Yanpeng Cao

Specifically, we design an uncertainty metric that assesses the alignment of each sample and adjusts the strength of adversarial learning for well-aligned and poorly-aligned samples adaptively.

Object object-detection +2

Contextual-Relation Consistent Domain Adaptation for Semantic Segmentation

1 code implementation ECCV 2020 Jiaxing Huang, Shijian Lu, Dayan Guan, Xiaobing Zhang

Recent advances in unsupervised domain adaptation for semantic segmentation have shown great potentials to relieve the demand of expensive per-pixel annotations.

Relation Segmentation +2

Hierarchy Composition GAN for High-fidelity Image Synthesis

no code implementations12 May 2019 Fangneng Zhan, Jiaxing Huang, Shijian Lu

Despite the rapid progress of generative adversarial networks (GANs) in image synthesis in recent years, the existing image synthesis approaches work in either geometry domain or appearance domain alone which often introduces various synthesis artifacts.

Image Generation Vocal Bursts Intensity Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.