Search Results for author: JianGuo Zhang

Found 66 papers, 30 papers with code

h-calibration: Rethinking Classifier Recalibration with Probabilistic Error-Bounded Objective

1 code implementation22 Jun 2025 Wenjian Huang, Guiping Cao, Jiahao Xia, Jingkun Chen, Hao Wang, JianGuo Zhang

In this study, we summarize and categorize previous works into three general strategies: intuitively designed methods, binning-based methods, and methods based on formulations of ideal calibration.

scoring rule

LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback

no code implementations2 Jun 2025 Thai Hoang, Kung-Hsiang Huang, Shirley Kokane, JianGuo Zhang, Zuxin Liu, Ming Zhu, Jake Grigsby, Tian Lan, Michael S Ryoo, Chien-Sheng Wu, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

Large Action Models (LAMs) for AI Agents offer incredible potential but face challenges due to the need for high-quality training data, especially for multi-steps tasks that involve planning, executing tool calls, and responding to feedback.

Large Language Model

WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation

no code implementations2 May 2025 Daoan Zhang, Che Jiang, Ruoshi Xu, Biaoxiang Chen, Zijian Jin, Yutian Lu, JianGuo Zhang, Liang Yong, Jiebo Luo, Shengda Luo

Recent advances in text-to-image (T2I) generation have achieved impressive results, yet existing models still struggle with prompts that require rich world knowledge and implicit reasoning: both of which are critical for producing semantically accurate, coherent, and contextually appropriate images in real-world scenarios.

Text to Image Generation Text-to-Image Generation +1

Mitigating Knowledge Discrepancies among Multiple Datasets for Task-agnostic Unified Face Alignment

no code implementations28 Mar 2025 Jiahao Xia, Min Xu, Wenjian Huang, JianGuo Zhang, Haimin Zhang, Chunxia Xiao

To explicitly align these mean shapes on an interpretable plane based on their semantics, each shape is then incorporated with a group of semantic alignment embeddings.

Face Alignment

Unsupervised Patch-GAN with Targeted Patch Ranking for Fine-Grained Novelty Detection in Medical Imaging

no code implementations29 Jan 2025 Jingkun Chen, Guang Yang, Xiao Zhang, Jingchao Peng, Tianlu Zhang, JianGuo Zhang, Jungong Han, Vicente Grau

Detecting novel anomalies in medical imaging is challenging due to the limited availability of labeled data for rare abnormalities, which often display high variability and subtlety.

Novelty Detection

Continual Learning for Segment Anything Model Adaptation

1 code implementation9 Dec 2024 Jinglong Yang, Yichen Wu, Jun Cen, Wenjian Huang, Hong Wang, JianGuo Zhang

Driven by the practical need, in this paper, we first propose a novel Continual SAM adaptation (CoSAM) benchmark with 8 different task domains and carefully analyze the limitations of the existing SAM one-step adaptation methods in the continual segmentation scenario.

Continual Learning model

TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

1 code implementation7 Dec 2024 Zixian Ma, JianGuo Zhang, Zhiwei Liu, Jieyu Zhang, Juntao Tan, Manli Shu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Caiming Xiong, Ranjay Krishna, Silvio Savarese

While open-source multi-modal language models perform well on simple question answering tasks, they often fail on complex questions that require multiple capabilities, such as fine-grained recognition, visual grounding, and reasoning, and that demand multi-step solutions.

Depth Estimation Mathematical Reasoning +4

Spatial-Temporal Search for Spiking Neural Networks

no code implementations24 Oct 2024 Kaiwei Che, Zhaokun Zhou, Li Yuan, JianGuo Zhang, Yonghong Tian, Luziwei Leng

Drawing inspiration from the heterogeneity of biological neural networks, we propose a differentiable approach to optimize SNN on both spatial and temporal dimensions.

image-classification Image Classification +1

PRACT: Optimizing Principled Reasoning and Acting of LLM Agent

no code implementations24 Oct 2024 Zhiwei Liu, Weiran Yao, JianGuo Zhang, Rithesh Murthy, Liangwei Yang, Zuxin Liu, Tian Lan, Ming Zhu, Juntao Tan, Shirley Kokane, Thai Hoang, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong

We introduce the Principled Reasoning and Acting (PRAct) framework, a novel method for learning and enforcing action principles from trajectory data.

SPikE-SSM: A Sparse, Precise, and Efficient Spiking State Space Model for Long Sequences Learning

no code implementations7 Oct 2024 Yan Zhong, Ruoyu Zhao, Chao Wang, Qinghai Guo, JianGuo Zhang, Zhichao Lu, Luziwei Leng

However, applying the highly capable SSMs to SNNs for long sequences learning poses three major challenges: (1) The membrane potential is determined by the past spiking history of the neuron, leading to reduced efficiency for sequence modeling in parallel computing scenarios.

Computational Efficiency State Space Models

An Enhanced Federated Prototype Learning Method under Domain Shift

no code implementations27 Sep 2024 Liang Kuang, Kuangpu Guo, Jian Liang, JianGuo Zhang

Federated Learning (FL) allows collaborative machine learning training without sharing private data.

Federated Learning

Learning Brain Tumor Representation in 3D High-Resolution MR Images via Interpretable State Space Models

1 code implementation12 Sep 2024 Qingqiao Hu, Daoan Zhang, Jiebo Luo, Zhenyu Gong, Benedikt Wiestler, JianGuo Zhang, Hongwei Bran Li

Learning meaningful and interpretable representations from high-dimensional volumetric magnetic resonance (MR) images is essential for advancing personalized medicine.

Self-Supervised Learning State Space Models

xLAM: A Family of Large Action Models to Empower AI Agent Systems

1 code implementation5 Sep 2024 JianGuo Zhang, Tian Lan, Ming Zhu, Zuxin Liu, Thai Hoang, Shirley Kokane, Weiran Yao, Juntao Tan, Akshara Prabhakar, Haolin Chen, Zhiwei Liu, Yihao Feng, Tulika Awalgaonkar, Rithesh Murthy, Eric Hu, Zeyuan Chen, ran Xu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong

By releasing the xLAM series, we aim to advance the performance of open-source LLMs for autonomous AI agents, potentially accelerating progress and democratizing access to high-performance models for agent tasks.

AI Agent

Unsupervised Part Discovery via Dual Representation Alignment

1 code implementation15 Aug 2024 Jiahao Xia, Wenjian Huang, Min Xu, JianGuo Zhang, Haimin Zhang, Ziyu Sheng, Dong Xu

Object parts serve as crucial intermediate representations in various downstream tasks, but part-level representation learning still has not received as much attention as other vision tasks.

Representation Learning Unsupervised Part Discovery

Personalized Multi-task Training for Recommender System

no code implementations31 Jul 2024 Liangwei Yang, Zhiwei Liu, JianGuo Zhang, Rithesh Murthy, Shelby Heinecke, Huan Wang, Caiming Xiong, Philip S. Yu

In the vast landscape of internet information, recommender systems (RecSys) have become essential for guiding users through a sea of choices aligned with their preferences.

Multi-Task Learning Recommendation Systems +1

Evolutionary Spiking Neural Networks: A Survey

no code implementations18 Jun 2024 Shuaijie Shen, Rui Zhang, Chao Wang, Renzhuo Huang, Aiersi Tuerhong, Qinghai Guo, Zhichao Lu, JianGuo Zhang, Luziwei Leng

Spiking neural networks (SNNs) are gaining increasing attention as potential computationally efficient alternatives to traditional artificial neural networks(ANNs).

Survey

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

no code implementations12 Jun 2024 Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, ran Xu, Sarah Tan, JianGuo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese

The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization.

Benchmarking Model Compression +1

Gradient-Guided Modality Decoupling for Missing-Modality Robustness

1 code implementation26 Feb 2024 Hao Wang, Shengda Luo, Guosheng Hu, JianGuo Zhang

In aid of this indicator, we present a novel Gradient-guided Modality Decoupling (GMD) method to decouple the dependency on dominating modalities.

Sentiment Analysis

AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System

1 code implementation23 Feb 2024 Zhiwei Liu, Weiran Yao, JianGuo Zhang, Liangwei Yang, Zuxin Liu, Juntao Tan, Prafulla K. Choubey, Tian Lan, Jason Wu, Huan Wang, Shelby Heinecke, Caiming Xiong, Silvio Savarese

Thus, we open-source a new AI agent library, AgentLite, which simplifies this process by offering a lightweight, user-friendly platform for innovating LLM agent reasoning, architectures, and applications with ease.

AI Agent

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

2 code implementations23 Feb 2024 JianGuo Zhang, Tian Lan, Rithesh Murthy, Zhiwei Liu, Weiran Yao, Ming Zhu, Juntao Tan, Thai Hoang, Zuxin Liu, Liangwei Yang, Yihao Feng, Shirley Kokane, Tulika Awalgaonkar, Juan Carlos Niebles, Silvio Savarese, Shelby Heinecke, Huan Wang, Caiming Xiong

It meticulously standardizes and unifies these trajectories into a consistent format, streamlining the creation of a generic data loader optimized for agent training.

Using Left and Right Brains Together: Towards Vision and Language Planning

no code implementations16 Feb 2024 Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, Jinglong Yang, Qifeng Chen, Nan Duan, JianGuo Zhang

Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks.

Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit

no code implementations30 Dec 2023 Yao Wan, Yang He, Zhangqian Bi, JianGuo Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin, Philip S. Yu

We also benchmark several state-of-the-art neural models for code intelligence, and provide an open-source toolkit tailored for the rapid prototyping of deep-learning-based code intelligence models.

Deep Learning Representation Learning +1

Video Understanding with Large Language Models: A Survey

1 code implementation29 Dec 2023 Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Pinxin Liu, Mingqian Feng, Feng Zheng, JianGuo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu

With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly.

Survey Video Understanding

DRDT: Dynamic Reflection with Divergent Thinking for LLM-based Sequential Recommendation

no code implementations18 Dec 2023 Yu Wang, Zhiwei Liu, JianGuo Zhang, Weiran Yao, Shelby Heinecke, Philip S. Yu

With our principle, we managed to outperform GPT-Turbo-3. 5 on three datasets using 7b models e. g., Vicuna-7b and Openchat-7b on NDCG@10.

In-Context Learning Sequential Recommendation

Semi-supervised Semantic Segmentation via Boosting Uncertainty on Unlabeled Data

no code implementations30 Nov 2023 Daoan Zhang, Yunhao Luo, JianGuo Zhang

We first figure out that the distribution gap between labeled and unlabeled datasets cannot be ignored, even though the two datasets are sampled from the same distribution.

Segmentation Semi-Supervised Semantic Segmentation

Enhancing Performance on Seen and Unseen Dialogue Scenarios using Retrieval-Augmented End-to-End Task-Oriented System

no code implementations16 Aug 2023 JianGuo Zhang, Stephen Roller, Kun Qian, Zhiwei Liu, Rui Meng, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong

End-to-end task-oriented dialogue (TOD) systems have achieved promising performance by leveraging sophisticated natural language understanding and natural language generation capabilities of pre-trained models.

Natural Language Understanding Retrieval +1

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

1 code implementation4 Aug 2023 Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, JianGuo Zhang, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

This demonstrates that using policy gradient optimization to improve language agents, for which we believe our work is one of the first, seems promising and can be applied to optimize other models in the agent architecture to enhance agent performances over time.

Language Modelling

Cross Contrasting Feature Perturbation for Domain Generalization

2 code implementations ICCV 2023 Chenming Li, Daoan Zhang, Wenjian Huang, JianGuo Zhang

Domain generalization (DG) aims to learn a robust model from source domains that generalize well on unseen target domains.

Domain Generalization

Strip-MLP: Efficient Token Interaction for Vision MLP

1 code implementation ICCV 2023 Guiping Cao, Shengda Luo, Wenjian Huang, Xiangyuan Lan, Dongmei Jiang, YaoWei Wang, JianGuo Zhang

Finally, based on the Strip MLP layer, we propose a novel \textbf{L}ocal \textbf{S}trip \textbf{M}ixing \textbf{M}odule (LSMM) to boost the token interaction power in the local region.

DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI

1 code implementation19 Jul 2023 JianGuo Zhang, Kun Qian, Zhiwei Liu, Shelby Heinecke, Rui Meng, Ye Liu, Zhou Yu, Huan Wang, Silvio Savarese, Caiming Xiong

Despite advancements in conversational AI, language models encounter challenges to handle diverse conversational tasks, and existing dialogue dataset collections often lack diversity and comprehensiveness.

Conversational Recommendation Diversity +4

DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks

no code implementations11 Jul 2023 Daoan Zhang, Weitong Zhang, Yu Zhao, JianGuo Zhang, Bing He, Chenchen Qin, Jianhua Yao

Pre-trained large language models demonstrate potential in extracting information from DNA sequences, yet adapting to a variety of tasks and data modalities remains a challenge.

Binary Classification DNA analysis +1

When SAM Meets Sonar Images

1 code implementation25 Jun 2023 Lin Wang, Xiufen Ye, Liqiang Zhu, Weijie Wu, JianGuo Zhang, Huiming Xing, Chao Hu

Notably, there is a lack of research on the application of SAM to sonar imaging.

Segmentation Semantic Segmentation

Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference

1 code implementation21 Jun 2023 Boyan Li, Luziwei Leng, Shuaijie Shen, Kaixuan Zhang, JianGuo Zhang, Jianxing Liao, Ran Cheng

As a result, we establish an efficient multi-stage spiking MLP network that blends effectively global receptive fields with local feature extraction for comprehensive spike-based computation.

image-classification Image Classification

Zero-shot Item-based Recommendation via Multi-task Product Knowledge Graph Pre-Training

no code implementations12 May 2023 Ziwei Fan, Zhiwei Liu, Shelby Heinecke, JianGuo Zhang, Huan Wang, Caiming Xiong, Philip S. Yu

This paper presents a novel paradigm for the Zero-Shot Item-based Recommendation (ZSIR) task, which pre-trains a model on product knowledge graph (PKG) to refine the item features from PLMs.

Recommendation Systems

Feature Alignment and Uniformity for Test Time Adaptation

1 code implementation CVPR 2023 Shuai Wang, Daoan Zhang, Zipei Yan, JianGuo Zhang, Rui Li

Test time adaptation (TTA) aims to adapt deep neural networks when receiving out of distribution test domain samples.

Domain Generalization Image Segmentation +3

Bootstrap The Original Latent: Learning a Private Model from a Black-box Model

no code implementations7 Mar 2023 Shuai Wang, Daoan Zhang, JianGuo Zhang, Weiwei Zhang, Rui Li

In this paper, considering the balance of data/model privacy of model owners and user needs, we propose a new setting called Back-Propagated Black-Box Adaptation (BPBA) for users to better train their private models via the guidance of the back-propagated results of a Black-box foundation/source model.

model

Aggregation of Disentanglement: Reconsidering Domain Variations in Domain Generalization

no code implementations5 Feb 2023 Daoan Zhang, Mingkai Chen, Chenming Li, Lingyun Huang, JianGuo Zhang

Different from learning domain invariant features from source domains, we decouple the input images into Domain Expert Features and noise.

Contrastive Learning Disentanglement +1

A Domain-specific Perceptual Metric via Contrastive Self-supervised Representation: Applications on Natural and Medical Images

no code implementations3 Dec 2022 Hongwei Bran Li, Chinmay Prabhakar, Suprosanna Shit, Johannes Paetzold, Tamaz Amiranashvili, JianGuo Zhang, Daniel Rueckert, Juan Eugenio Iglesias, Benedikt Wiestler, Bjoern Menze

We find that in the natural image domain, CSR behaves on par with the supervised one on several perceptual tests as a metric, and in the medical domain, CSR better quantifies perceptual similarity concerning the experts' ratings.

Image Generation

Rethinking Alignment and Uniformity in Unsupervised Semantic Segmentation

no code implementations26 Nov 2022 Daoan Zhang, Chenming Li, Haoquan Li, Wenjian Huang, Lingyun Huang, JianGuo Zhang

Experimental results on multiple semantic segmentation benchmarks show that our unsupervised segmentation framework specializes in catching semantic representations, which outperforms all the unpretrained and even several pretrained methods.

Representation Learning Segmentation +2

Partial Least Square Regression via Three-factor SVD-type Manifold Optimization for EEG Decoding

no code implementations9 Aug 2022 Wanguang Yin, Zhichao Liang, JianGuo Zhang, Quanying Liu

To this end, we propose a new method to solve the partial least square regression, named PLSR via optimization on bi-Grassmann manifold (PLSRbiGr).

EEG Eeg Decoding +3

Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning

1 code implementation CVPR 2022 Jiahao Xia, Weiwei qu, Wenjian Huang, JianGuo Zhang, Xi Wang, Min Xu

The SLPT generates the representation of each single landmark from a local patch and aggregates them by an adaptive inherent relation based on the attention mechanism.

Face Alignment Relation +1

Discrete Time Convolution for Fast Event-Based Stereo

1 code implementation CVPR 2022 Kaixuan Zhang, Kaiwei Che, JianGuo Zhang, Jie Cheng, Ziyang Zhang, Qinghai Guo, Luziwei Leng

Inspired by continuous dynamics of biological neuron models, we propose a novel encoding method for sparse events - continuous time convolution (CTC) - which learns to model the spatial feature of the data with intrinsic dynamics.

Depth Estimation Stereo Matching

Detect Faces Efficiently: A Survey and Evaluations

3 code implementations3 Dec 2021 Yuantao Feng, Shiqi Yu, Hanyang Peng, Yan-ran Li, JianGuo Zhang

However, with the tremendous increase in images and videos with variations in face scale, appearance, expression, occlusion and pose, traditional face detectors are challenged to detect various "in the wild" faces.

Deep Learning Face Detection +5

Enriching Non-Autoregressive Transformer with Syntactic and Semantic Structures for Neural Machine Translation

no code implementations EACL 2021 Ye Liu, Yao Wan, JianGuo Zhang, Wenting Zhao, Philip Yu

In this paper, we claim that the syntactic and semantic structures among natural language are critical for non-autoregressive machine translation and can further improve the performance.

Machine Translation Translation

Deep Class-Specific Affinity-Guided Convolutional Network for Multimodal Unpaired Image Segmentation

no code implementations5 Jan 2021 Jingkun Chen, Wenqi Li, Hongwei Li, JianGuo Zhang

Our affinity matrix does not depend on spatial alignments of the visual features and thus allows us to train with unpaired, multimodal inputs.

Image Segmentation Medical Image Segmentation +2

Sign-Agnostic Implicit Learning of Surface Self-Similarities for Shape Modeling and Reconstruction from Raw Point Clouds

no code implementations CVPR 2021 Wenbin Zhao, Jiabao Lei, Yuxin Wen, JianGuo Zhang, Kui Jia

Motivated from a universal phenomenon that self-similar shape patterns of local surface patches repeat across the entire surface of an object, we aim to push forward the data-driven strategies and propose to learn a local implicit surface network for a shared, adaptive modeling of the entire surface for a direct surface reconstruction from raw point cloud; we also enhance the leveraging of surface self-similarities by improving correlations among the optimized latent codes of individual surface patches.

Surface Reconstruction

Deep CNNs for HEp-2 Cells Classification : A Cross-specimen Analysis

no code implementations20 Apr 2016 Hongwei Li, Wei-Shi Zheng, JianGuo Zhang

Automatic classification of Human Epithelial Type-2 (HEp-2) cells staining patterns is an important and yet a challenging problem.

Classification General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.