Search Results for author: Tong Zhang

Found 460 papers, 159 papers with code

Speeding up Transformer Decoding via an Attention Refinement Network

1 code implementation COLING 2022 Kaixin Wu, Yue Zhang, Bojie Hu, Tong Zhang

Extensive experiments on ten WMT machine translation tasks show that the proposed model yields an average of 1. 35x faster (with almost no decrease in BLEU) over the state-of-the-art inference implementation.

Machine Translation NMT +1

Toward Knowledge-Enriched Conversational Recommendation Systems

no code implementations NLP4ConvAI (ACL) 2022 Tong Zhang, Yong liu, Boyang Li, Peixiang Zhong, Chen Zhang, Hao Wang, Chunyan Miao

Conversational Recommendation Systems recommend items through language based interactions with users. In order to generate naturalistic conversations and effectively utilize knowledge graphs (KGs) containing background information, we propose a novel Bag-of-Entities loss, which encourages the generated utterances to mention concepts related to the item being recommended, such as the genre or director of a movie.

Conversational Recommendation Knowledge Graphs +2

Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training

1 code implementation13 May 2025 Yangyi Chen, Hao Peng, Tong Zhang, Heng Ji

PRIOR introduces a reference model-a text-only large language model (LLM) trained on the captions without image inputs, to weight each token based on its probability for LVLMs training.

Hallucination Large Language Model

HuB: Learning Extreme Humanoid Balance

no code implementations12 May 2025 Tong Zhang, Boyuan Zheng, Ruiqian Nai, Yingdong Hu, Yen-Jen Wang, Geng Chen, Fanqi Lin, Jiongye Li, Chuye Hong, Koushil Sreenath, Yang Gao

The human body demonstrates exceptional motor capabilities-such as standing steadily on one foot or performing a high kick with the leg raised over 1. 5 meters-both requiring precise balance control.

Humanoid Control

DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems

no code implementations11 May 2025 Tong Zhang, Fenghua Shao, Runsheng Zhang, Yifan Zhuang, Liuqingqing Yang

It can accurately capture and track the user's gesture trajectory and is superior to traditional tracking methods in terms of real-time and accuracy.

Gesture Recognition multimodal interaction +1

RM-R1: Reward Modeling as Reasoning

1 code implementation5 May 2025 Xiusi Chen, Gaotang Li, Ziqi Wang, Bowen Jin, Cheng Qian, Yu Wang, Hongru Wang, Yu Zhang, Denghui Zhang, Tong Zhang, Hanghang Tong, Heng Ji

The training of M-R1 consists of two key stages: (1) distillation of high-quality reasoning chains and (2) reinforcement learning with verifiable rewards.

Math Reinforcement Learning (RL)

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

1 code implementation5 May 2025 Jiarui Yao, Yifan Hao, Hanning Zhang, Hanze Dong, Wei Xiong, Nan Jiang, Tong Zhang

Chain-of-thought (CoT) reasoning in large language models (LLMs) can be formalized as a latent variable problem, where the model needs to generate intermediate reasoning steps.

Mathematical Reasoning

Multi-Step Consistency Models: Fast Generation with Theoretical Guarantees

no code implementations2 May 2025 Nishant Jain, Xunpeng Huang, Yian Ma, Tong Zhang

Additionally, under minimal assumptions on the data distribution an increasingly common setting in recent diffusion model analyses we show that a similar KL convergence guarantee can be obtained, with the number of steps scaling as $ O\left(d \log\left(\frac{d}{\varepsilon}\right)\right) $.

Capturing Conditional Dependence via Auto-regressive Diffusion Models

no code implementations30 Apr 2025 Xunpeng Huang, Yujin Han, Difan Zou, Yian Ma, Tong Zhang

On the other hand, when there is no obvious conditional dependence across patches of the data, AR diffusion does not outperform DDPM.

Video Generation

STCL:Curriculum learning Strategies for deep learning image steganography models

1 code implementation24 Apr 2025 Fengchun Liu, Tong Zhang, Chunying Zhang

Aiming at the problems of poor quality of steganographic images and slow network convergence of image steganography models based on deep learning, this paper proposes a Steganography Curriculum Learning training strategy (STCL) for deep learning image steganography models.

Deep Learning Image Steganography +2

MMHCL: Multi-Modal Hypergraph Contrastive Learning for Recommendation

1 code implementation23 Apr 2025 Xu Guo, Tong Zhang, Fuyun Wang, Xudong Wang, Xiaoya Zhang, Xin Liu, Zhen Cui

For a comprehensive information exploration from user-product relations, we construct two hypergraphs, i. e. a user-to-user (u2u) hypergraph and an item-to-item (i2i) hypergraph, to mine shared preferences among users and intricate multimodal semantic resemblance among items, respectively.

Contrastive Learning Hypergraph Contrastive Learning +1

CLPSTNet: A Progressive Multi-Scale Convolutional Steganography Model Integrating Curriculum Learning

1 code implementation23 Apr 2025 Fengchun Liu, Tong Zhang, Chunying Zhang

In recent years, a large number of works have introduced Convolutional Neural Networks (CNNs) into image steganography, which transform traditional steganography methods such as hand-crafted features and prior knowledge design into steganography methods that neural networks autonomically learn information embedding.

Image Steganography SSIM +1

EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery

no code implementations17 Apr 2025 Wei zhang, Miaoxin Cai, Yaqian Ning, Tong Zhang, Yin Zhuang, He Chen, Jun Li, Xuerui Mao

Recent advances in the visual-language area have developed natural multi-modal large language models (MLLMs) for spatial reasoning through visual prompting.

Large Language Model Multi-Task Learning +2

Multi-Modal Hypergraph Enhanced LLM Learning for Recommendation

no code implementations13 Apr 2025 Xu Guo, Tong Zhang, Yuanzhi Wang, Chenxu Wang, Fuyun Wang, Xudong Wang, Xiaoya Zhang, Xin Liu, Zhen Cui

To this end, we propose a novel framework, Hypergraph Enhanced LLM Learning for multimodal Recommendation (HeLLM), designed to equip LLMs with the capability to capture intricate higher-order semantic correlations by fusing graph-level contextual signals with sequence-level behavioral patterns.

Contrastive Learning Multimodal Recommendation

AdaptRec: A Self-Adaptive Framework for Sequential Recommendations with Large Language Models

no code implementations6 Apr 2025 Tong Zhang

AdaptRec employs a two-phase user selection mechanism -- User Similarity Retrieval and Self-Adaptive User Selection -- to efficiently identify relevant user sequences in large-scale datasets from multi-metric evaluation.

Retrieval Sequential Recommendation

Refining CLIP's Spatial Awareness: A Visual-Centric Perspective

no code implementations3 Apr 2025 Congpei Qiu, Yanhao Wu, Wei Ke, Xiuxiu Bai, Tong Zhang

Contrastive Language-Image Pre-training (CLIP) excels in global alignment with language but exhibits limited sensitivity to spatial information, leading to strong performance in zero-shot classification tasks but underperformance in tasks requiring precise spatial understanding.

Zero-Shot Learning

VGRP-Bench: Visual Grid Reasoning Puzzle Benchmark for Large Vision-Language Models

no code implementations29 Mar 2025 Yufan Ren, Konstantinos Tertikas, Shalini Maiti, Junlin Han, Tong Zhang, Sabine Süsstrunk, Filippos Kokkinos

Our results reveal that even the state-of-the-art LVLMs struggle with these puzzles, highlighting fundamental limitations in their puzzle-solving capabilities.

Logical Reasoning

ASGO: Adaptive Structured Gradient Optimization

no code implementations26 Mar 2025 Kang An, Yuxing Liu, Rui Pan, Shiqian Ma, Donald Goldfarb, Tong Zhang

Training deep neural networks (DNNs) is a structured optimization problem, because the parameters are naturally represented by matrices and tensors rather than simple vectors.

Language Modeling Language Modelling

FACE: Few-shot Adapter with Cross-view Fusion for Cross-subject EEG Emotion Recognition

no code implementations24 Mar 2025 Haiqi Liu, C. L. Philip Chen, Tong Zhang

This article introduces the few-shot adapter with a cross-view fusion method called FACE for cross-subject EEG emotion recognition, which leverages dynamic multi-view fusion and effective subject-specific adaptation.

Domain Adaptation EEG +2

Dynamic Topic Analysis in Academic Journals using Convex Non-negative Matrix Factorization Method

no code implementations23 Mar 2025 Yang Yang, Tong Zhang, Jian Wu, Lijie Su

In Stage 2, a convex optimization algorithm refines the dynamic topic structure using the convex NMF (cNMF) model, further enhancing topic integration and stability.

Generating Multimodal Driving Scenes via Next-Scene Prediction

no code implementations19 Mar 2025 Yanhao Wu, Haoyang Zhang, Tianwei Lin, Lichao Huang, Shujie Luo, Rui Wu, Congpei Qiu, Wei Ke, Tong Zhang

Generative models in Autonomous Driving (AD) enable diverse scene creation, yet existing methods fall short by only capturing a limited range of modalities, restricting the capability of generating controllable scenes for comprehensive evaluation of AD systems.

Autonomous Driving multimodal generation +2

Monte Carlo Diffusion for Generalizable Learning-Based RANSAC

no code implementations12 Mar 2025 Jiale Wang, Chen Zhao, Wei Ke, Tong Zhang

Random Sample Consensus (RANSAC) is a fundamental approach for robustly estimating parametric models from noisy data.

Diversity

ROCM: RLHF on consistency models

no code implementations8 Mar 2025 Shivanshu Shekhar, Tong Zhang

Diffusion models have revolutionized generative modeling in continuous domains like image, audio, and video synthesis.

Policy Gradient Methods

Integrating network pharmacology, metabolomics, and gut microbiota analysis to explore the effects of Jinhong tablets on chronic superficial gastritis

no code implementations6 Mar 2025 Lihao Xiao, Tingyu Zhang, Yun Liu, Chayanis Sutcharitchan, Qingyuan Liu, Xiaoxue Fan, Jian Feng, Huifang Gao, Tong Zhang, Shao Li

Differential metabolites in plasma were determined by untargeted metabolomics, and gut microbiota diversity/composition in fecal and cecal samples was assessed via 16S rRNA sequencing.

SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection

no code implementations5 Mar 2025 Yi-Fan Lu, Xian-Ling Mao, Tian Lan, Tong Zhang, Yu-Shi Zhu, Heyan Huang

To address these two problems above, we propose a scalable and reliable Semantic-level Evaluation framework for Open domain Event detection (SEOE) by constructing a more representative evaluation benchmark and introducing a semantic evaluation metric.

Event Detection Semantic Similarity +1

MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving

1 code implementation5 Mar 2025 Ruida Wang, Rui Pan, Yuxin Li, Jipeng Zhang, Yizhen Jia, Shizhe Diao, Renjie Pi, Junjie Hu, Tong Zhang

To solve these issues, we propose MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought framework, (to the best of our knowledge), the first multi-agent framework for Lean4 theorem proving that balance high-level NL reasoning and FL verification in Long CoT.

Automated Theorem Proving Transfer Learning

FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4

no code implementations5 Mar 2025 Jiarui Yao, Ruida Wang, Tong Zhang

To the best of our knowledge, it is the first framework that utilizes Lean4 to enhance LLMs' NL math reasoning ability.

Answer Selection Math +1

AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation

no code implementations4 Mar 2025 Songming Zhang, Xue Zhang, Tong Zhang, Bojie Hu, Yufeng Chen, Jinan Xu

However, in most existing methods for LLM alignment, all tokens in the response are optimized using a sparse, response-level reward or preference annotation.

Language Modeling Language Modelling

MICINet: Multi-Level Inter-Class Confusing Information Removal for Reliable Multimodal Classification

no code implementations27 Feb 2025 Tong Zhang, Shu Shen, C. L. Philip Chen

MICINet achieves the reliable removal of both types of noise by unifying them into the concept of Inter-class Confusing Information (\textit{ICI}) and eliminating it at both global and individual levels.

Self-rewarding correction for mathematical reasoning

1 code implementation26 Feb 2025 Wei Xiong, Hanning Zhang, Chenlu Ye, Lichang Chen, Nan Jiang, Tong Zhang

We study self-rewarding reasoning large language models (LLMs), which can simultaneously generate step-by-step reasoning and evaluate the correctness of their outputs during the inference time-without external feedback.

Mathematical Reasoning

Reverse Markov Learning: Multi-Step Generative Models for Complex Distributions

no code implementations19 Feb 2025 Xinwei Shen, Nicolai Meinshausen, Tong Zhang

We propose a framework that defines a general forward process transitioning from the target distribution to a known distribution (e. g., Gaussian) and then learns a reverse Markov process using multiple engression models.

Dimensionality Reduction

Crime Forecasting: A Spatio-temporal Analysis with Deep Learning Models

no code implementations11 Feb 2025 Li Mao, Wei Du, Shuo Wen, Qi Li, Tong Zhang, Wei Zhong

We conducted a comparative analysis to access the effects of various data sequences, including raw and binned data, on the prediction errors of four deep learning forecasting models.

Deep Learning Prediction

Logarithmic Regret for Online KL-Regularized Reinforcement Learning

no code implementations11 Feb 2025 Heyang Zhao, Chenlu Ye, Wei Xiong, Quanquan Gu, Tong Zhang

Recent advances in Reinforcement Learning from Human Feedback (RLHF) have shown that KL-regularization plays a pivotal role in improving the efficiency of RL fine-tuning for large language models (LLMs).

reinforcement-learning Reinforcement Learning

Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability

no code implementations9 Feb 2025 Qingyue Zhao, Kaixuan Ji, Heyang Zhao, Tong Zhang, Quanquan Gu

KL-regularized policy optimization has become a workhorse in learning-based decision making, while its theoretical understanding is still very limited.

Multi-Armed Bandits

Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

no code implementations5 Feb 2025 Boyao Wang, Rui Pan, Shizhe Diao, Xingyuan Pan, Jipeng Zhang, Renjie Pi, Tong Zhang

Small language models (SLMs) have attracted considerable attention from both academia and industry due to their broad range of applications in edge devices.

Language Modeling Language Modelling +2

Volumetric Temporal Texture Synthesis for Smoke Stylization using Neural Cellular Automata

no code implementations5 Feb 2025 Dongqing Wang, Ehsan Pajouheshgar, Yitao Xu, Tong Zhang, Sabine Süsstrunk

Artistic stylization of 3D volumetric smoke data is still a challenge in computer graphics due to the difficulty of ensuring spatiotemporal consistency given a reference style image, and that within reasonable time and computational resources.

Style Transfer Texture Synthesis

Catoni Contextual Bandits are Robust to Heavy-tailed Rewards

no code implementations4 Feb 2025 Chenlu Ye, Yujia Jin, Alekh Agarwal, Tong Zhang

When the variance of the reward at each round is known, we use a variance-weighted regression approach and establish a regret bound that depends only on the cumulative reward variance and logarithmically on the reward range $R$ as well as the number of rounds $T$.

Multi-Armed Bandits

Reformulation is All You Need: Addressing Malicious Text Features in DNNs

no code implementations2 Feb 2025 Yi Jiang, Oubo Ma, Yong Yang, Tong Zhang, Shouling Ji

Human language encompasses a wide range of intricate and diverse implicit features, which attackers can exploit to launch adversarial or backdoor attacks, compromising DNN models for NLP tasks.

All

Online-BLS: An Accurate and Efficient Online Broad Learning System for Data Stream Classification

no code implementations28 Jan 2025 Chunyu Lei, Guang-Ze Chen, C. L. Philip Chen, Tong Zhang

Different from employing existing incremental broad learning algorithms for online learning tasks, which tend to incur degraded accuracy and expensive online update overhead, we design an effective weight estimation algorithm and an efficient online updating strategy to remedy the above two deficiencies, respectively.

Divergence-Augmented Policy Optimization

1 code implementation NeurIPS 2019 Qing Wang, Yingru Li, Jiechao Xiong, Tong Zhang

In deep reinforcement learning, policy optimization methods need to deal with issues such as function approximation and the reuse of off-policy data.

Atari Games Deep Reinforcement Learning +2

RAG-Reward: Optimizing RAG with Reward Modeling and RLHF

no code implementations22 Jan 2025 Hanning Zhang, Juntong Song, Juno Zhu, Yuanhao Wu, Tong Zhang, Cheng Niu

Using \textbf{RAG-Reward}, we train reward models and apply reinforcement learning with human feedback (RLHF) to improve LLMs' effectiveness in RAG.

Benchmarking Hallucination +4

Multi-QuAD: Multi-Level Quality-Adaptive Dynamic Network for Reliable Multimodal Classification

no code implementations19 Dec 2024 Shu Shen, C. L. Philip Chen, Tong Zhang

This paper finds that existing reliable multimodal classification methods not only fail to provide robust estimation of data quality, but also lack dynamic networks for sample-specific depth and parameters to achieve reliable inference.

Informativeness Parameter Prediction

Entropy-Regularized Process Reward Model

1 code implementation15 Dec 2024 Hanning Zhang, Pengcheng Wang, Shizhe Diao, Yong Lin, Rui Pan, Hanze Dong, Dylan Zhang, Pavlo Molchanov, Tong Zhang

Our theoretical analysis shows that we could derive the optimal reward model from the initial policy sampling.

GSM8K Math +3

MatchDiffusion: Training-free Generation of Match-cuts

1 code implementation27 Nov 2024 Alejandro Pardo, Fabio Pizzati, Tong Zhang, Alexander Pondaven, Philip Torr, Juan Camilo Perez, Bernard Ghanem

Match-cuts are powerful cinematic tools that create seamless transitions between scenes, delivering strong visual and metaphorical connections.

Denoising

Scaling Mesh Generation via Compressive Tokenization

1 code implementation11 Nov 2024 Haohan Weng, Zibo Zhao, Biwen Lei, Xianghui Yang, Jian Liu, Zeqiang Lai, Zhuo Chen, Yuhong Liu, Jie Jiang, Chunchao Guo, Tong Zhang, Shenghua Gao, C. L. Philip Chen

We propose a compressive yet effective mesh representation, Blocked and Patchified Tokenization (BPT), facilitating the generation of meshes exceeding 8k faces.

8k

Fox-1 Technical Report

no code implementations8 Nov 2024 Zijian Hu, Jipeng Zhang, Rui Pan, Zhaozhuo Xu, Shanshan Han, Han Jin, Alay Dilipbhai Shah, Dimitris Stripelis, Yuhang Yao, Salman Avestimehr, Chaoyang He, Tong Zhang

Aiming to improve the pre-training efficiency, Fox-1-1. 6B model introduces a novel 3-stage data curriculum across all the training data with 2K-8K sequence length.

2k 8k +1

Alopex: A Computational Framework for Enabling On-Device Function Calls with LLMs

no code implementations7 Nov 2024 Yide Ran, Zhaozhuo Xu, Yuhang Yao, Zijian Hu, Shanshan Han, Han Jin, Alay Dilipbhai Shah, Jipeng Zhang, Dimitris Stripelis, Tong Zhang, Salman Avestimehr, Chaoyang He

The rapid advancement of Large Language Models (LLMs) has led to their increased integration into mobile devices for personalized assistance, which enables LLMs to call external API functions to enhance their performance.

Sharp Analysis for KL-Regularized Contextual Bandits and RLHF

no code implementations7 Nov 2024 Heyang Zhao, Chenlu Ye, Quanquan Gu, Tong Zhang

To understand the fundamental distinction between policy learning objectives with KL-regularization and ones without KL-regularization, we are the first to theoretically demonstrate the power of KL-regularization by providing a sharp analysis for KL-regularized contextual bandits and RLHF, revealing an $\mathcal{O}(1 / \epsilon)$ sample complexity when $\epsilon$ is sufficiently small.

Multi-Armed Bandits Reinforcement Learning (RL)

SEE-DPO: Self Entropy Enhanced Direct Preference Optimization

no code implementations6 Nov 2024 Shivanshu Shekhar, Shreyas Singh, Tong Zhang

Direct Preference Optimization (DPO) has been successfully used to align large language models (LLMs) according to human preferences, and more recently it has also been applied to improving the quality of text-to-image diffusion models.

Diversity Image Generation +1

Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis

1 code implementation31 Oct 2024 Chen Zhao, Xuan Wang, Tong Zhang, Saqib Javed, Mathieu Salzmann

In this paper, we address this overfitting issue by introducing Self-Ensembling Gaussian Splatting (SE-GS).

3DGS Novel View Synthesis

Unlocking Comics: The AI4VA Dataset for Visual Understanding

1 code implementation27 Oct 2024 Peter Grönquist, Deblina Bhattacharjee, Bahar Aydemir, Baran Ozaydin, Tong Zhang, Mathieu Salzmann, Sabine Süsstrunk

This dataset is a crucial component of the AI4VA Workshop Challenges~\url{https://sites. google. com/view/ai4vaeccv2024}, where we specifically explore depth and saliency.

Depth Estimation Saliency Detection +1

Fully First-Order Methods for Decentralized Bilevel Optimization

no code implementations25 Oct 2024 Xiaoyu Wang, Xuxing Chen, Shiqian Ma, Tong Zhang

This paper focuses on decentralized stochastic bilevel optimization (DSBO) where agents only communicate with their neighbors.

Bilevel Optimization

Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code

no code implementations24 Oct 2024 Jipeng Zhang, Jianshu Zhang, Yuanzhe Li, Renjie Pi, Rui Pan, Runtao Liu, Ziqiang Zheng, Tong Zhang

The underlying cause of this issue is the gap between natural language to programming language gap (NL-PL Gap), which is especially pronounced in LRPLs due to limited aligned data.

General Knowledge In-Context Learning

MPDS: A Movie Posters Dataset for Image Generation with Diffusion Model

no code implementations22 Oct 2024 Meng Xu, Tong Zhang, Fuyun Wang, Yi Lei, Xin Liu, Zhen Cui

As dedicated to posters, MPDS stands out as the first image-text pair dataset to our knowledge, composing of 373k+ image-text pairs and 8k+ actor images (covering 4k+ actors).

4k 8k +1

Transcriptome and Redox Proteome Reveal Temporal Scales of Carbon Metabolism Regulation in Model Cyanobacteria Under Light Disturbance

no code implementations12 Oct 2024 Connah G. M. Johnson, Zachary Johnson, Liam S. Mackey, Xiaolu Li, Natalie C. Sadler, Tong Zhang, Wei-Jun Qian, Pavlo Bohutskyi, Song Feng, Margaret S. Cheung

We develop a systems approach based on an energy-landscape concept to differentiate interactions involving redox activities and conformational changes of proteins and nucleic acids interactions in multi-layered protein-DNA regulatory networks under light disturbance.

Physics-informed machine learning

Personalized Visual Instruction Tuning

1 code implementation9 Oct 2024 Renjie Pi, Jianshu Zhang, Tianyang Han, Jipeng Zhang, Rui Pan, Tong Zhang

In this paper, we introduce Personalized Visual Instruction Tuning (PVIT), a novel data curation and training framework designed to enable MLLMs to identify target individuals within an image and engage in personalized and coherent dialogues.

Image Generation

Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference

no code implementations30 Sep 2024 Ke Yi, Zengke Liu, Jianwei Zhang, Chengyuan Li, Tong Zhang, Junyang Lin, Jingren Zhou

Based on observing activations from large language models, outliers can be classified into channel-wise and spike outliers.

Quantization

From Lists to Emojis: How Format Bias Affects Model Alignment

no code implementations18 Sep 2024 Xuanchang Zhang, Wei Xiong, Lichang Chen, Tianyi Zhou, Heng Huang, Tong Zhang

In this work, we extend the study of biases in preference learning beyond the commonly recognized length bias, offering a comprehensive analysis of a wider range of format biases.

Chatbot

Data Augmentation via Latent Diffusion for Saliency Prediction

1 code implementation11 Sep 2024 Bahar Aydemir, Deblina Bhattacharjee, Tong Zhang, Mathieu Salzmann, Sabine Süsstrunk

We propose a novel data augmentation method for deep saliency prediction that edits natural images while preserving the complexity and variability of real-world scenes.

Data Augmentation Diversity +2

On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization

no code implementations5 Sep 2024 Yong Lin, Skyler Seto, Maartje ter Hoeve, Katherine Metcalf, Barry-John Theobald, Xuan Wang, Yizhe Zhang, Chen Huang, Tong Zhang

These findings highlight that DPORM has limited generalization ability and substantiates the integration of an explicit reward model in iterative DPO approaches.

Building Math Agents with Multi-Turn Iterative Preference Learning

no code implementations4 Sep 2024 Wei Xiong, Chengshuai Shi, Jiaming Shen, Aviv Rosenberg, Zhen Qin, Daniele Calandriello, Misha Khalman, Rishabh Joshi, Bilal Piot, Mohammad Saleh, Chi Jin, Tong Zhang, Tianqi Liu

Recent studies have shown that large language models' (LLMs) mathematical problem-solving capabilities can be enhanced by integrating external tools, such as code interpreters, and employing multi-turn Chain-of-Thought (CoT) reasoning.

GSM8K Math +3

A Homogeneous Graph Neural Network for Precoding and Power Allocation in Scalable Wireless Networks

no code implementations30 Aug 2024 Mingjun Sun, Shaochuan Wu, Haojie Wang, Yuanwei Liu, Guoyu Li, Tong Zhang

Lastly, using ICGNN as the core algorithm, we tailor the neural network's input and output for specific problem requirements and validate its performance in two scenarios: 1) in cellular networks, we develop a matrix-inverse-free multi-user multi-input multi-output (MU-MIMO) precoding scheme using the conjugate gradient (CG) method, adaptable to varying user and antenna numbers; 2) in a cell-free network, facing dynamic variations in the number of users served by APs, the number of APs serving each user, and the number of antennas per AP, we propose a universal power allocation scheme.

Graph Neural Network

Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic

2 code implementations24 Aug 2024 Yifei He, Yuzheng Hu, Yong Lin, Tong Zhang, Han Zhao

Our algorithm works in two steps: i) Localization: identify tiny ($1\%$ of the total parameters) localized regions in the finetuned models containing essential skills for the downstream tasks, and ii) Stitching: reintegrate only these essential regions back into the pretrained model for task synergy.

Model Compression Task Arithmetic

Practical Video Object Detection via Feature Selection and Aggregation

1 code implementation29 Jul 2024 Yuheng Shi, Tong Zhang, Xiaojie Guo

In principle, the detection in a certain frame of a video can benefit from information in other frames.

 Ranked #1 on Video Object Detection on ImageNet VID (using extra training data)

feature selection Object +2

Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning

no code implementations24 Jul 2024 Shuang Qiu, Dake Zhang, Rui Yang, Boxiang Lyu, Tong Zhang

This paper investigates multi-objective reinforcement learning (MORL), which focuses on learning Pareto optimal policies in the presence of multiple reward functions.

Multi-Objective Reinforcement Learning reinforcement-learning +1

SINDER: Repairing the Singular Defects of DINOv2

1 code implementation23 Jul 2024 Haoqi Wang, Tong Zhang, Mathieu Salzmann

Vision Transformer models trained on large-scale datasets, although effective, often exhibit artifacts in the patch token they extract.

Depth Estimation

TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data

1 code implementation21 Jul 2024 Jipeng Zhang, Yaxuan Qin, Renjie Pi, Weizhong Zhang, Rui Pan, Tong Zhang

Achieving this goal poses non-trivial challenges: 1) data selection requires accurate data representations that reflect the training samples' quality, 2) considering the diverse nature of instruction datasets, and 3) ensuring the efficiency of the coreset selection algorithm for large models.

EarthMarker: A Visual Prompting Multi-modal Large Language Model for Remote Sensing

1 code implementation18 Jul 2024 Wei zhang, Miaoxin Cai, Tong Zhang, Jun Li, Yin Zhuang, Xuerui Mao

Specifically, a shared visual encoding method is developed to establish the spatial pattern interpretation relationships between the multi-scale representations of input images and various visual prompts.

Instruction Following Language Modeling +4

SmartQuant: CXL-based AI Model Store in Support of Runtime Configurable Weight Quantization

no code implementations17 Jul 2024 Rui Xie, Asad Ul Haq, Linsen Ma, Krystal Sun, Sanchari Sen, Swagath Venkataramani, Liu Liu, Tong Zhang

Recent studies have revealed that, during the inference on generative AI models such as transformer, the importance of different weights exhibits substantial context-dependent variations.

Quantization

Enhancing Stochastic Optimization for Statistical Efficiency Using ROOT-SGD with Diminishing Stepsize

no code implementations15 Jul 2024 Tong Zhang, Chris Junchi Li

In this paper, we revisit \textsf{ROOT-SGD}, an innovative method for stochastic optimization to bridge the gap between stochastic optimization and statistical efficiency.

Computational Efficiency Stochastic Optimization

Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density

no code implementations11 Jul 2024 Shuangqi Li, Chen Liu, Tong Zhang, Hieu Le, Sabine Süsstrunk, Mathieu Salzmann

We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with either enhanced fidelity or increased diversity.

Diversity

Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning

no code implementations10 Jul 2024 Dake Zhang, Boxiang Lyu, Shuang Qiu, Mladen Kolar, Tong Zhang

We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes.

Decision Making Offline RL +3

Coherent and Multi-modality Image Inpainting via Latent Space Optimization

1 code implementation10 Jul 2024 Lingzhi Pan, Tong Zhang, Bingyuan Chen, Qi Zhou, Wei Ke, Sabine Süsstrunk, Mathieu Salzmann

Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.

Denoising Image Inpainting

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

1 code implementation3 Jul 2024 Ruida Wang, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, Tong Zhang

However, due to the scarcity of aligned NL and Formal Language (FL) theorem-proving data most modern LLMs exhibit suboptimal performance. This scarcity results in a paucity of methodologies for training LLMs and techniques to fully utilize their capabilities in composing formal proofs.

Automated Theorem Proving Code Generation +2

ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting

no code implementations28 Jun 2024 Rui Pan, Jipeng Zhang, Xingyuan Pan, Renjie Pi, Xiaoyu Wang, Tong Zhang

Bilevel optimization has shown its utility across various machine learning settings, yet most algorithms in practice require second-order information, making it challenging to scale them up.

Bilevel Optimization

AdaGrad under Anisotropic Smoothness

no code implementations21 Jun 2024 Yuxing Liu, Rui Pan, Tong Zhang

Despite the huge success in practice, their theoretical advantages over classical gradient methods with uniform step sizes across all coordinates (e. g. SGD) have not been fully understood, especially in the large batch-size setting commonly used in practice.

Instruction Following

Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation

1 code implementation15 Jun 2024 Tong Zhang, Yingdong Hu, Jiacheng You, Yang Gao

SGRv2 excels in RLBench tasks with keyframe control using merely 5 demonstrations and surpasses the RVT baseline in 23 of 26 tasks.

Imitation Learning Inductive Bias +1

Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs

2 code implementations14 Jun 2024 Rui Yang, Ruomeng Ding, Yong Lin, huan zhang, Tong Zhang

Reward models trained on human preference data have been proven to effectively align Large Language Models (LLMs) with human intent within the framework of reinforcement learning from human feedback (RLHF).

Language Modeling Language Modelling +1

AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer

no code implementations12 Jun 2024 Yitao Xu, Tong Zhang, Sabine Süsstrunk

In this paper, we propose Adaptor Neural Cellular Automata (AdaNCA) for Vision Transformers that uses NCA as plug-and-play adaptors between ViT layers, thus enhancing ViT's performance and robustness against adversarial samples as well as out-of-distribution inputs.

Image Classification

Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

1 code implementation11 Jun 2024 Renjie Pi, Jianshu Zhang, Jipeng Zhang, Rui Pan, Zhekai Chen, Tong Zhang

Image description datasets play a crucial role in the advancement of various applications such as image understanding, text-to-image generation, and text-image retrieval.

Hallucination Image Retrieval +1

PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance

no code implementations27 May 2024 Haohan Weng, Yikai Wang, Tong Zhang, C. L. Philip Chen, Jun Zhu

Generating compact and sharply detailed 3D meshes poses a significant challenge for current 3D generative models.

Faster Sampling via Stochastic Gradient Proximal Sampler

no code implementations27 May 2024 Xunpeng Huang, Difan Zou, Yi-An Ma, Hanze Dong, Tong Zhang

Stochastic gradients have been widely integrated into Langevin-based methods to improve their scalability and efficiency in solving large-scale sampling problems.

Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

no code implementations26 May 2024 Xunpeng Huang, Difan Zou, Hanze Dong, Yi Zhang, Yi-An Ma, Tong Zhang

To generate data from trained diffusion models, most inference algorithms, such as DDPM, DDIM, and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs.

Denoising

Gaussian Time Machine: A Real-Time Rendering Methodology for Time-Variant Appearances

no code implementations22 May 2024 Licheng Shen, Ho Ngai Chow, Lingyun Wang, Tong Zhang, Mengqiu Wang, Yuxing Han

In this paper, we present Gaussian Time Machine (GTM) which models the time-dependent attributes of Gaussian primitives with discrete time embedding vectors decoded by a lightweight Multi-Layer-Perceptron(MLP).

3DGS 3D Reconstruction +2

RLHF Workflow: From Reward Modeling to Online RLHF

3 code implementations13 May 2024 Hanze Dong, Wei Xiong, Bo Pang, Haoxiang Wang, Han Zhao, Yingbo Zhou, Nan Jiang, Doyen Sahoo, Caiming Xiong, Tong Zhang

We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature.

Chatbot HumanEval +3

SC-HVPPNet: Spatial and Channel Hybrid-Attention Video Post-Processing Network with CNN and Transformer

no code implementations23 Apr 2024 Tong Zhang, Wenxue Cui, Shaohui Liu, Feng Jiang

Convolutional Neural Network (CNN) and Transformer have attracted much attention recently for video post-processing (VPP).

Video Restoration

Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange

1 code implementation CVPR 2024 Yanhao Wu, Tong Zhang, Wei Ke, Congpei Qiu, Sabine Susstrunk, Mathieu Salzmann

Subsequently, we introduce a context-aware feature learning strategy, which encodes object patterns without relying on their specific context by aggregating object features across various scenes.

Object Scene Understanding +1

Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm

no code implementations4 Apr 2024 Miao Lu, Han Zhong, Tong Zhang, Jose Blanchet

Unlike previous work, which relies on a generative model or a pre-collected offline dataset enjoying good coverage of the deployment environment, we tackle robust RL via interactive data collection, where the learner interacts with the training environment only and refines the policy through trial and error.

Reinforcement Learning (RL)

On the Benefits of Over-parameterization for Out-of-Distribution Generalization

no code implementations26 Mar 2024 Yifan Hao, Yong Lin, Difan Zou, Tong Zhang

We demonstrate that in this scenario, further increasing the model's parameterization can significantly reduce the OOD loss.

Out-of-Distribution Generalization

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

1 code implementation26 Mar 2024 Rui Pan, Xiang Liu, Shizhe Diao, Renjie Pi, Jipeng Zhang, Chi Han, Tong Zhang

Attempting to complement this deficiency, we investigate the layerwise properties of LoRA on fine-tuning tasks and observe an unexpected but consistent skewness of weight norms across different layers.

GSM8K Language Modeling +5

DVMNet++: Rethinking Relative Pose Estimation for Unseen Objects

1 code implementation CVPR 2024 Chen Zhao, Tong Zhang, Zheng Dang, Mathieu Salzmann

Determining the relative pose of a previously unseen object between two images is pivotal to the success of generalizable object pose estimation.

Natural Language Understanding Object +1

A Sober Look at the Robustness of CLIPs to Spurious Features

no code implementations18 Mar 2024 Qizhou Wang, Yong Lin, Yongqiang Chen, Ludwig Schmidt, Bo Han, Tong Zhang

Large vision language models, such as CLIP, demonstrate impressive robustness to spurious features than single-modal models trained on ImageNet.

Benchmarking

Gradient based Feature Attribution in Explainable AI: A Technical Review

no code implementations15 Mar 2024 Yongjie Wang, Tong Zhang, Xu Guo, Zhiqi Shen

Due to the lack of a rigorous definition of explainable AI (XAI), a plethora of research related to explainability, interpretability, and transparency has been developed to explain and analyze the model from various perspectives.

Autonomous Driving

Desigen: A Pipeline for Controllable Design Template Generation

1 code implementation CVPR 2024 Haohan Weng, Danqing Huang, Yu Qiao, Zheng Hu, Chin-Yew Lin, Tong Zhang, C. L. Philip Chen

In this paper, we present Desigen, an automatic template creation pipeline which generates background images as well as harmonious layout elements over the background.

Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization

no code implementations13 Mar 2024 Renjie Pi, Tianyang Han, Wei Xiong, Jipeng Zhang, Runtao Liu, Rui Pan, Tong Zhang

To mitigate this issue, we propose Bootstrapped Preference Optimization (BPO), which conducts preference learning with datasets containing negative responses bootstrapped from the model itself.

Language Modeling Language Modelling +3

Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation

no code implementations13 Mar 2024 ZiCheng Zhang, Tong Zhang, Yi Zhu, Jianzhuang Liu, Xiaodan Liang, Qixiang Ye, Wei Ke

To mitigate these issues, we propose a Language-Driven Visual Consensus (LDVC) approach, fostering improved alignment of semantic and visual information. Specifically, we leverage class embeddings as anchors due to their discrete and abstract nature, steering vision features toward class embeddings.

Decoder Language Modeling +3

Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation

no code implementations11 Mar 2024 Tong Zhang, Chen Huang, Yang Deng, Hongru Liang, Jia Liu, Zujie Wen, Wenqiang Lei, Tat-Seng Chua

We investigate non-collaborative dialogue agents, which are expected to engage in strategic conversations with diverse users, for securing a mutual agreement that leans favorably towards the system's objectives.

User Simulation

An Improved Analysis of Langevin Algorithms with Prior Diffusion for Non-Log-Concave Sampling

no code implementations10 Mar 2024 Xunpeng Huang, Hanze Dong, Difan Zou, Tong Zhang

Along this line, Freund et al. (2022) suggest that the modified Langevin algorithm with prior diffusion is able to converge dimension independently for strongly log-concave target distributions.

Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards

1 code implementation28 Feb 2024 Haoxiang Wang, Yong Lin, Wei Xiong, Rui Yang, Shizhe Diao, Shuang Qiu, Han Zhao, Tong Zhang

Additionally, DPA models user preferences as directions (i. e., unit vectors) in the reward space to achieve user-dependent preference control.

EntailE: Introducing Textual Entailment in Commonsense Knowledge Graph Completion

no code implementations15 Feb 2024 Ying Su, Tianqing Fang, Huiru Xiao, Weiqi Wang, Yangqiu Song, Tong Zhang, Lei Chen

In this paper, we propose to adopt textual entailment to find implicit entailment relations between CSKG nodes, to effectively densify the subgraph connecting nodes within the same conceptual class, which indicates a similar level of plausibility.

graph construction Knowledge Graph Embedding +1

Online Iterative Reinforcement Learning from Human Feedback with General Preference Model

1 code implementation11 Feb 2024 Chenlu Ye, Wei Xiong, Yuheng Zhang, Hanze Dong, Nan Jiang, Tong Zhang

We investigate Reinforcement Learning from Human Feedback (RLHF) in the context of a general preference oracle.

The Instinctive Bias: Spurious Images lead to Illusion in MLLMs

1 code implementation6 Feb 2024 Tianyang Han, Qing Lian, Rui Pan, Renjie Pi, Jipeng Zhang, Shizhe Diao, Yong Lin, Tong Zhang

In this paper, we identify a typical class of inputs that baffles MLLMs, which consist of images that are highly relevant but inconsistent with answers, causing MLLMs to suffer from visual illusion.

Hallucination

PipeNet: Question Answering with Semantic Pruning over Knowledge Graphs

1 code implementation31 Jan 2024 Ying Su, Jipeng Zhang, Yangqiu Song, Tong Zhang

To facilitate the evaluation of pruned subgraphs, we also propose a graph attention network (GAT) based module to reason with the subgraph data.

Graph Attention Knowledge Graphs +1

EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain

1 code implementation30 Jan 2024 Wei zhang, Miaoxin Cai, Tong Zhang, Yin Zhuang, Xuerui Mao

Multi-modal large language models (MLLMs) have demonstrated remarkable success in vision and visual-language tasks within the natural image domain.

Image Comprehension Instruction Following +3

General Flow as Foundation Affordance for Scalable Robot Learning

1 code implementation21 Jan 2024 Chengbo Yuan, Chuan Wen, Tong Zhang, Yang Gao

Therefore, we propose to utilize 3D flow, which represents the future trajectories of 3D points on objects of interest, as an ideal prediction target.

Prediction

The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness

no code implementations19 Jan 2024 Yifan Hao, Tong Zhang

Recent empirical and theoretical studies have established the generalization capabilities of large machine learning models that are trained to (approximately or exactly) fit noisy data.

Adversarial Robustness

Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo

no code implementations12 Jan 2024 Xunpeng Huang, Difan Zou, Hanze Dong, Yian Ma, Tong Zhang

Specifically, DMC follows the reverse SDE of a diffusion process that transforms the target distribution to the standard Gaussian, utilizing a non-parametric score estimation.

MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance

1 code implementation5 Jan 2024 Renjie Pi, Tianyang Han, Jianshu Zhang, Yueqi Xie, Rui Pan, Qing Lian, Hanze Dong, Jipeng Zhang, Tong Zhang

The deployment of multimodal large language models (MLLMs) has brought forth a unique vulnerability: susceptibility to malicious attacks through visual inputs.

Safety Alignment

RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models

3 code implementations31 Dec 2023 Cheng Niu, Yuanhao Wu, Juno Zhu, Siliang Xu, Kashun Shum, Randy Zhong, Juntong Song, Tong Zhang

Retrieval-augmented generation (RAG) has become a main technique for alleviating hallucinations in large language models (LLMs).

Hallucination RAG +1

Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise

no code implementations22 Dec 2023 Rui Pan, Yuxing Liu, Xiaoyu Wang, Tong Zhang

This means SGD with heavy-ball momentum is useful in the large-batch settings such as distributed machine learning or federated learning, where a smaller number of iterations can significantly reduce the number of communication rounds, leading to acceleration in practice.

Federated Learning

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint

3 code implementations18 Dec 2023 Wei Xiong, Hanze Dong, Chenlu Ye, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, Tong Zhang

We investigate its behavior in three distinct settings -- offline, online, and hybrid -- and propose efficient algorithms with finite-sample theoretical guarantees.

Language Modeling Language Modelling +1

Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning

no code implementations29 Nov 2023 Yingdong Hu, Fanqi Lin, Tong Zhang, Li Yi, Yang Gao

In this study, we are interested in imbuing robots with the capability of physically-grounded task planning.

Task Planning

Beyond Pixels: Exploring Human-Readable SVG Generation for Simple Images with Vision Language Models

no code implementations27 Nov 2023 Tong Zhang, Haoyang Liu, Peiyan Zhang, Yuxuan Cheng, Haohan Wang

Our method focuses on producing SVGs that are both accurate and simple, aligning with human readability and understanding.

Vector Graphics

R-Tuning: Instructing Large Language Models to Say `I Don't Know'

1 code implementation16 Nov 2023 Hanning Zhang, Shizhe Diao, Yong Lin, Yi R. Fung, Qing Lian, Xingyao Wang, Yangyi Chen, Heng Ji, Tong Zhang

This approach is formalized by first identifying the disparity in knowledge encompassed by pre-trained parameters compared to that of instruction tuning data.

Hallucination Sentence

Plum: Prompt Learning using Metaheuristic

1 code implementation14 Nov 2023 Rui Pan, Shuo Xing, Shizhe Diao, Wenhe Sun, Xiang Liu, Kashun Shum, Renjie Pi, Jipeng Zhang, Tong Zhang

Since the emergence of large language models, prompt learning has become a popular method for optimizing and customizing these models.

Image Generation Prompt Learning

PerceptionGPT: Effectively Fusing Visual Perception into LLM

no code implementations CVPR 2024 Renjie Pi, Lewei Yao, Jiahui Gao, Jipeng Zhang, Tong Zhang

In this paper, we present a novel end-to-end framework named PerceptionGPT, which efficiently and effectively equips the VLLMs with visual perception abilities by leveraging the representation power of LLMs' token embedding.

CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer

1 code implementation11 Nov 2023 Haoyu Ma, Tong Zhang, Shanlin Sun, Xiangyi Yan, Kun Han, Xiaohui Xie

Reconstructing personalized animatable head avatars has significant implications in the fields of AR/VR.

Neural Rendering

Mesh Neural Cellular Automata

no code implementations6 Nov 2023 Ehsan Pajouheshgar, Yitao Xu, Alexander Mordvintsev, Eyvind Niklasson, Tong Zhang, Sabine Süsstrunk

We propose Mesh Neural Cellular Automata (MeshNCA), a method that directly synthesizes dynamic textures on 3D meshes without requiring any UV maps.

Texture Synthesis

Corruption-Robust Offline Reinforcement Learning with General Function Approximation

1 code implementation NeurIPS 2023 Chenlu Ye, Rui Yang, Quanquan Gu, Tong Zhang

Notably, under the assumption of single policy coverage and the knowledge of $\zeta$, our proposed algorithm achieves a suboptimality bound that is worsened by an additive factor of $\mathcal{O}(\zeta (C(\widehat{\mathcal{F}},\mu)n)^{-1})$ due to the corruption.

Offline RL reinforcement-learning +2

Towards Robust Offline Reinforcement Learning under Diverse Data Corruption

2 code implementations19 Oct 2023 Rui Yang, Han Zhong, Jiawei Xu, Amy Zhang, Chongjie Zhang, Lei Han, Tong Zhang

Offline reinforcement learning (RL) presents a promising approach for learning reinforced policies from offline datasets without the need for costly or unsafe interactions with the environment.

Offline RL Q-Learning +3

3D-Aware Hypothesis & Verification for Generalizable Relative Object Pose Estimation

no code implementations5 Oct 2023 Chen Zhao, Tong Zhang, Mathieu Salzmann

Our goal then is to estimate the relative object pose between this reference view and a query image that depicts the object in a different pose.

Object Pose Estimation

Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own

no code implementations4 Oct 2023 Weirui Ye, Yunsheng Zhang, Haoyang Weng, Xianfan Gu, Shengjie Wang, Tong Zhang, Mengchen Wang, Pieter Abbeel, Yang Gao

We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models.

Quantization reinforcement-learning +1

Spurious Feature Diversification Improves Out-of-distribution Generalization

no code implementations29 Sep 2023 Yong Lin, Lu Tan, Yifan Hao, Honam Wong, Hanze Dong, Weizhong Zhang, Yujiu Yang, Tong Zhang

Contrary to the conventional wisdom that focuses on learning invariant features for better OOD performance, our findings suggest that incorporating a large number of diverse spurious features weakens their individual contributions, leading to improved overall OOD generalization performance.

Out-of-Distribution Generalization

May I Ask a Follow-up Question? Understanding the Benefits of Conversations in Neural Network Explainability

no code implementations25 Sep 2023 Tong Zhang, X. Jessie Yang, Boyang Li

With this paper, we investigate if free-form conversations can enhance users' comprehension of static explanations, improve acceptance and trust in the explanation methods, and facilitate human-AI collaboration.

Decision Making

MEDL-U: Uncertainty-aware 3D Automatic Annotation based on Evidential Deep Learning

1 code implementation18 Sep 2023 Helbert Paat, Qing Lian, Weilong Yao, Tong Zhang

In this paper, we present the first approach that addresses the inherent ambiguities present in pseudo labels by introducing an Evidential Deep Learning (EDL) based uncertainty estimation framework.

3D Object Detection object-detection

Mitigating the Alignment Tax of RLHF

1 code implementation12 Sep 2023 Yong Lin, Hangyu Lin, Wei Xiong, Shizhe Diao, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan YAO, Tong Zhang

Building on the analysis and the observation that averaging different layers of the transformer leads to significantly different alignment-forgetting trade-offs, we propose Heterogeneous Model Averaging (HMA) to Heterogeneously find various combination ratios of model layers.

Common Sense Reasoning Continual Learning

UniKG: A Benchmark and Universal Embedding for Large-Scale Knowledge Graphs

1 code implementation11 Sep 2023 Yide Qiu, Shaoxiang Ling, Tong Zhang, Bo Huang, Zhen Cui

To perform effective learning on the large-scale UniKG, two key measures are taken, including (i) the semantic alignment strategy for multi-attribute entities, which projects the feature description of multi-attribute nodes into a common embedding space to facilitate node aggregation in a large receptive field; (ii) proposing a novel plug-and-play anisotropy propagation module (APM) to learn effective multi-hop anisotropy propagation kernels, which extends methods of large-scale homogeneous graphs to heterogeneous graphs.

Attribute Graph Learning +3

Integrated Robotics Networks with Co-optimization of Drone Placement and Air-Ground Communications

no code implementations9 Sep 2023 Menghao Hu, Tong Zhang, Shuai Wang, Guoliang Li, Yingyang Chen, Qiang Li, Gaojie Chen

Terrestrial robots, i. e., unmanned ground vehicles (UGVs), and aerial robots, i. e., unmanned aerial vehicles (UAVs), operate in separate spaces.

Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning

no code implementations5 Sep 2023 Yong Lin, Chen Liu, Chenlu Ye, Qing Lian, Yuan YAO, Tong Zhang

Our proposed method, COPS (unCertainty based OPtimal Sub-sampling), is designed to minimize the expected loss of a model trained on subsampled data.

Active Learning Deep Learning

Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement

1 code implementation16 Aug 2023 Jianyu Wen, Chenhao Wu, Tong Zhang, Yixuan Yu, Piotr Swierczynski

In this paper, we propose a 2-stage low-light image enhancement method called Self-Reference Deep Adaptive Curve Estimation (Self-DACE).

Denoising Low-Light Image Enhancement

Reverse Diffusion Monte Carlo

no code implementations5 Jul 2023 Xunpeng Huang, Hanze Dong, Yifan Hao, Yi-An Ma, Tong Zhang

We propose a Monte Carlo sampler from the reverse diffusion process.

LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

1 code implementation21 Jun 2023 Shizhe Diao, Rui Pan, Hanze Dong, Ka Shun Shum, Jipeng Zhang, Wei Xiong, Tong Zhang

As the number of available foundation models and specialized tasks keeps growing, the job of training scientific language models becomes highly nontrivial.

A Universal Semantic-Geometric Representation for Robotic Manipulation

1 code implementation18 Jun 2023 Tong Zhang, Yingdong Hu, Hanchen Cui, Hang Zhao, Yang Gao

To this end, we present $\textbf{Semantic-Geometric Representation} (\textbf{SGR})$, a universal perception module for robotics that leverages the rich semantic information of large-scale pre-trained 2D models and inherits the merits of 3D spatial reasoning.

3D geometry Robot Manipulation +1

Structure-Sensitive Graph Dictionary Embedding for Graph Classification

no code implementations18 Jun 2023 Guangbu Liu, Tong Zhang, Xudong Wang, Wenting Zhao, Chuanwei Zhou, Zhen Cui

Instead of a plain use of a base graph dictionary, we propose the variational graph dictionary adaptation (VGDA) to generate a personalized dictionary (named adapted graph dictionary) for catering to each input graph.

Graph Classification Variational Inference

Dual Adaptive Representation Alignment for Cross-domain Few-shot Learning

1 code implementation18 Jun 2023 Yifan Zhao, Tong Zhang, Jia Li, Yonghong Tian

Recent progress in this setting assumes that the base knowledge and novel query samples are distributed in the same domains, which are usually infeasible for realistic applications.

cross-domain few-shot learning

Customizing General-Purpose Foundation Models for Medical Report Generation

no code implementations9 Jun 2023 Bang Yang, Asif Raza, Yuexian Zou, Tong Zhang

In this work, we propose customizing off-the-shelf general-purpose large-scale pre-trained models, i. e., foundation models (FMs), in computer vision and natural language processing with a specific focus on medical report generation.

Medical Report Generation Transfer Learning

Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories

1 code implementation8 Jun 2023 Shizhe Diao, Tianyang Xu, Ruijia Xu, Jiawei Wang, Tong Zhang

Pre-trained language models (PLMs) demonstrate excellent abilities to understand texts in the generic domain while struggling in a specific domain.

Domain Adaptation

What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?

1 code implementation30 May 2023 Rui Yang, Yong Lin, Xiaoteng Ma, Hao Hu, Chongjie Zhang, Tong Zhang

In this paper, we study out-of-distribution (OOD) generalization of offline GCRL both theoretically and empirically to identify factors that are important.

Imitation Learning Offline RL

InNeRF360: Text-Guided 3D-Consistent Object Inpainting on 360-degree Neural Radiance Fields

no code implementations CVPR 2024 Dongqing Wang, Tong Zhang, Alaa Abboud, Sabine Süsstrunk

We propose InNeRF360, an automatic system that accurately removes text-specified objects from 360-degree Neural Radiance Fields (NeRF).

3D Inpainting NeRF +1

DetGPT: Detect What You Need via Reasoning

1 code implementation23 May 2023 Renjie Pi, Jiahui Gao, Shizhe Diao, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, Lingpeng Kong, Tong Zhang

Overall, our proposed paradigm and DetGPT demonstrate the potential for more sophisticated and intuitive interactions between humans and machines.

Autonomous Driving Object +2

Adapter Learning in Pretrained Feature Extractor for Continual Learning of Diseases

1 code implementation18 Apr 2023 Wentao Zhang, Yujun Huang, Tong Zhang, Qingsong Zou, Wei-Shi Zheng, Ruixuan Wang

In particular, updating an intelligent diagnosis system with training data of new diseases would cause catastrophic forgetting of old disease knowledge.

Continual Learning

Hierarchical Interactive Reconstruction Network For Video Compressive Sensing

no code implementations15 Apr 2023 Tong Zhang, Wenxue Cui, Chen Hui, Feng Jiang

Deep network-based image and video Compressive Sensing(CS) has attracted increasing attentions in recent years.

Compressive Sensing Video Compressive Sensing

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

1 code implementation13 Apr 2023 Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum, Tong Zhang

Utilizing a reward model and a sufficient number of samples, our approach selects the high-quality samples, discarding those that exhibit undesired behavior, and subsequently enhancing the model by fine-tuning on these filtered samples.

Ethics

Crowd Counting with Sparse Annotation

no code implementations12 Apr 2023 Shiwei Zhang, Zhengzheng Wang, Qing Liu, Fei Wang, Wei Ke, Tong Zhang

This paper presents a new annotation method called Sparse Annotation (SA) for crowd counting, which reduces human labeling efforts by sparsely labeling individuals in an image.

Crowd Counting

ConvBLS: An Effective and Efficient Incremental Convolutional Broad Learning System for Image Classification

no code implementations1 Apr 2023 Chunyu Lei, C. L. Philip Chen, Jifeng Guo, Tong Zhang

Third, the TSMS feature fusion layer is proposed to extract more effective multi-scale features through the integration of CF layers and CE layers.

Image Classification Incremental Learning

De-coupling and De-positioning Dense Self-supervised Learning

1 code implementation29 Mar 2023 Congpei Qiu, Tong Zhang, Wei Ke, Mathieu Salzmann, Sabine Süsstrunk

Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.

Data Augmentation Object +5

NEMTO: Neural Environment Matting for Novel View and Relighting Synthesis of Transparent Objects

no code implementations ICCV 2023 Dongqing Wang, Tong Zhang, Sabine Süsstrunk

We propose NEMTO, the first end-to-end neural rendering pipeline to model 3D transparent objects with complex geometry and unknown indices of refraction.

Image Matting Neural Rendering +2

Environment Invariant Linear Least Squares

1 code implementation6 Mar 2023 Jianqing Fan, Cong Fang, Yihong Gu, Tong Zhang

To the best of our knowledge, this paper is the first to realize statistically efficient invariance learning in the general linear model.

Causal Inference regression +2

PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash Equilibrium

no code implementations2 Mar 2023 Shihong Ding, Hanze Dong, Cong Fang, Zhouchen Lin, Tong Zhang

To circumvent this difficulty, we examine the problem of identifying a mixed Nash equilibrium, where strategies are randomized and characterized by probability distributions over continuous domains.

Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data

2 code implementations24 Feb 2023 Kashun Shum, Shizhe Diao, Tong Zhang

However, most CoT studies rely on carefully designed human-annotated rational chains to prompt LLMs, posing challenges for real-world applications where labeled data is available without rational chains.

Arithmetic Reasoning Language Modelling

Active Prompting with Chain-of-Thought for Large Language Models

2 code implementations23 Feb 2023 Shizhe Diao, Pengcheng Wang, Yong Lin, Rui Pan, Xiang Liu, Tong Zhang

For this purpose, we propose a solution to the key problem of determining which questions are the most important and helpful ones to annotate from a pool of task-specific queries.

Active Learning Zero-Shot Learning

Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

no code implementations21 Feb 2023 Heyang Zhao, Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

We propose a variance-adaptive algorithm for linear mixture MDPs, which achieves a problem-dependent horizon-free regret bound that can gracefully reduce to a nearly constant regret for deterministic MDPs.

Computational Efficiency Decision Making +1

Hashtag-Guided Low-Resource Tweet Classification

1 code implementation20 Feb 2023 Shizhe Diao, Sedrick Scott Keh, Liangming Pan, Zhiliang Tian, Yan Song, Tong Zhang

Social media classification tasks (e. g., tweet sentiment analysis, tweet stance detection) are challenging because social media posts are typically short, informal, and ambiguous.

Classification Sentiment Analysis +1

On the Convergence of Federated Averaging with Cyclic Client Participation

no code implementations6 Feb 2023 Yae Jee Cho, Pranay Sharma, Gauri Joshi, Zheng Xu, Satyen Kale, Tong Zhang

Federated Averaging (FedAvg) and its variants are the most popular optimization algorithms in federated learning (FL).

Federated Learning

History-Aware Hierarchical Transformer for Multi-session Open-domain Dialogue System

no code implementations2 Feb 2023 Tong Zhang, Yong liu, Boyang Li, Zhiwei Zeng, Pengwei Wang, Yuan You, Chunyan Miao, Lizhen Cui

HAHT maintains a long-term memory of history conversations and utilizes history information to understand current conversation context and generate well-informed and context-relevant responses.

ADAPT: Action-aware Driving Caption Transformer

1 code implementation1 Feb 2023 Bu Jin, Xinyu Liu, Yupeng Zheng, Pengfei Li, Hao Zhao, Tong Zhang, Yuhang Zheng, Guyue Zhou, Jingjing Liu

To bridge the gap, we propose an end-to-end transformer-based architecture, ADAPT (Action-aware Driving cAPtion Transformer), which provides user-friendly natural language narrations and reasoning for each decision making step of autonomous vehicular control and action.

Autonomous Driving Decision Making

Learning in POMDPs is Sample-Efficient with Hindsight Observability

no code implementations31 Jan 2023 Jonathan N. Lee, Alekh Agarwal, Christoph Dann, Tong Zhang

POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability.

Decision Making Scheduling

Probabilistic Bilevel Coreset Selection

no code implementations24 Jan 2023 Xiao Zhou, Renjie Pi, Weizhong Zhang, Yong Lin, Tong Zhang

The goal of coreset selection in supervised learning is to produce a weighted subset of data, so that training only on the subset achieves similar performance as training on the entire dataset.

Bilevel Optimization Continual Learning

Model Agnostic Sample Reweighting for Out-of-Distribution Learning

1 code implementation24 Jan 2023 Xiao Zhou, Yong Lin, Renjie Pi, Weizhong Zhang, Renzhe Xu, Peng Cui, Tong Zhang

The overfitting issue is addressed by considering a bilevel formulation to search for the sample reweighting, in which the generalization complexity depends on the search space of sample weights instead of the model size.

TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction

1 code implementation5 Jan 2023 Bahar Aydemir, Ludo Hoffstetter, Tong Zhang, Mathieu Salzmann, Sabine Süsstrunk

Deep saliency prediction algorithms complement the object recognition features, they typically rely on additional information, such as scene context, semantic relationships, gaze direction, and object dissimilarity.

Object Object Recognition +2

TempSAL - Uncovering Temporal Information for Deep Saliency Prediction

1 code implementation CVPR 2023 Bahar Aydemir, Ludo Hoffstetter, Tong Zhang, Mathieu Salzmann, Sabine Süsstrunk

Deep saliency prediction algorithms complement the object recognition features, they typically rely on additional information such as scene context, semantic relationships, gaze direction, and object dissimilarity.

Object Object Recognition +2

DSI2I: Dense Style for Unpaired Image-to-Image Translation

1 code implementation26 Dec 2022 Baran Ozaydin, Tong Zhang, Sabine Süsstrunk, Mathieu Salzmann

To stylize the source content with the exemplar style, we extract unsupervised cross-domain semantic correspondences and warp the exemplar style to the source content.

Image-to-Image Translation Translation

VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction

1 code implementation CVPR 2023 Yufan Ren, Fangjinhua Wang, Tong Zhang, Marc Pollefeys, Sabine Süsstrunk

The success of the Neural Radiance Fields (NeRF) in novel view synthesis has inspired researchers to propose neural implicit scene reconstruction.

NeRF Novel View Synthesis

VO$Q$L: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation

no code implementations12 Dec 2022 Alekh Agarwal, Yujia Jin, Tong Zhang

We study time-inhomogeneous episodic reinforcement learning (RL) under general function approximation and sparse rewards.

Q-Learning regression +1

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

no code implementations12 Dec 2022 Chenlu Ye, Wei Xiong, Quanquan Gu, Tong Zhang

In this paper, we consider the contextual bandit with general function approximation and propose a computationally efficient algorithm to achieve a regret of $\tilde{O}(\sqrt{T}+\zeta)$.

Multi-Armed Bandits Reinforcement Learning (RL)

On Robust Observer Design for System Motion on SE(3) Using Onboard Visual Sensors

no code implementations29 Nov 2022 Tong Zhang, Ying Tan, Xiang Chen, Zike Lei

The key design idea for this observer is to estimate the visible set and identify the mis-identified features from the measurements.

Particle-based Variational Inference with Preconditioned Functional Gradient Flow

no code implementations25 Nov 2022 Hanze Dong, Xi Wang, Yong Lin, Tong Zhang

With the popularity of Stein variational gradient descent (SVGD), the focus of particle-based VI algorithms has been on the properties of functions in Reproducing Kernel Hilbert Space (RKHS) to approximate the gradient flow.

Variational Inference

Normalizing Flow with Variational Latent Representation

1 code implementation21 Nov 2022 Hanze Dong, Shizhe Diao, Weizhong Zhang, Tong Zhang

The resulting method is significantly more powerful than the standard normalization flow approach for generating data distributions with multiple modes.

FAF: A novel multimodal emotion recognition approach integrating face, body and text

no code implementations20 Nov 2022 Zhongyu Fang, Aoyun He, Qihui Yu, Baopeng Gao, Weiping Ding, Tong Zhang, Lei Ma

In this paper, we developed a large multimodal emotion dataset, named "HED" dataset, to facilitate the emotion recognition task, and accordingly propose a multimodal emotion recognition method.

Multimodal Emotion Recognition

GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond

no code implementations3 Nov 2022 Han Zhong, Wei Xiong, Sirui Zheng, LiWei Wang, Zhaoran Wang, Zhuoran Yang, Tong Zhang

The proposed algorithm modifies the standard posterior sampling algorithm in two aspects: (i) we use an optimistic prior distribution that biases towards hypotheses with higher values and (ii) a loglikelihood function is set to be the empirical loss evaluated on the historical data, where the choice of loss function supports both model-free and model-based learning.

Decision Making Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.