Search Results for author: Weitong Zhang

Found 20 papers, 4 papers with code

Settling Constant Regrets in Linear Markov Decision Processes

no code implementations16 Apr 2024 Weitong Zhang, Zhiyuan Fan, Jiafan He, Quanquan Gu

To the best of our knowledge, Cert-LSVI-UCB is the first algorithm to achieve a constant, instance-dependent, high-probability regret bound in RL with linear function approximation for infinite runs without relying on prior distribution assumptions.

Reinforcement Learning (RL)

Causal Graph ODE: Continuous Treatment Effect Modeling in Multi-agent Dynamical Systems

no code implementations29 Feb 2024 Zijie Huang, Jeehyun Hwang, Junkai Zhang, Jinwoo Baik, Weitong Zhang, Dominik Wodarz, Yizhou Sun, Quanquan Gu, Wei Wang

Real-world multi-agent systems are often dynamic and continuous, where the agents co-evolve and undergo changes in their trajectories and interactions over time.

counterfactual Decision Making

Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance

no code implementations13 Feb 2024 Linxi Zhao, Yihe Deng, Weitong Zhang, Quanquan Gu

The advancement of Large Vision-Language Models (LVLMs) has increasingly highlighted the critical issue of their tendency to hallucinate non-existing objects in the images.

Hallucination

Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves

3 code implementations7 Nov 2023 Yihe Deng, Weitong Zhang, Zixiang Chen, Quanquan Gu

While it is widely acknowledged that the quality of a prompt, such as a question, significantly impacts the quality of the response provided by LLMs, a systematic method for crafting questions that LLMs can better comprehend is still underdeveloped.

M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization

1 code implementation17 Jul 2023 Che Liu, Sibo Cheng, Chen Chen, Mengyun Qiao, Weitong Zhang, Anand Shah, Wenjia Bai, Rossella Arcucci

The proposed method, named Medical vision-language pre-training with Frozen language models and Latent spAce Geometry optimization (M-FLAG), leverages a frozen language model for training stability and efficiency and introduces a novel orthogonality loss to harmonize the latent space geometry.

Image Classification Language Modelling +3

DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks

no code implementations11 Jul 2023 Daoan Zhang, Weitong Zhang, Yu Zhao, JianGuo Zhang, Bing He, Chenchen Qin, Jianhua Yao

Pre-trained large language models demonstrate potential in extracting information from DNA sequences, yet adapting to a variety of tasks and data modalities remains a challenge.

Binary Classification DNA analysis +1

Pay Attention to the Atlas: Atlas-Guided Test-Time Adaptation Method for Robust 3D Medical Image Segmentation

no code implementations2 Jul 2023 Jingjie Guo, Weitong Zhang, Matthew Sinclair, Daniel Rueckert, Chen Chen

In addition, different from most existing TTA methods which restrict the adaptation to batch normalization blocks in the segmentation network only, we further exploit the use of channel and spatial attention blocks for improved adaptability at test time.

Image Segmentation Medical Image Segmentation +4

Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

no code implementations15 May 2023 Kaixuan Ji, Qingyue Zhao, Jiafan He, Weitong Zhang, Quanquan Gu

Recent studies have shown that episodic reinforcement learning (RL) is no harder than bandits when the total reward is bounded by $1$, and proved regret bounds that have a polylogarithmic dependence on the planning horizon $H$.

Open-Ended Question Answering reinforcement-learning +1

DynamicKD: An Effective Knowledge Distillation via Dynamic Entropy Correction-Based Distillation for Gap Optimizing

no code implementations9 May 2023 Songling Zhu, Ronghua Shang, Bo Yuan, Weitong Zhang, Yangyang Li, Licheng Jiao

This paper proposes a novel knowledge distillation algorithm based on dynamic entropy correction to reduce the gap by adjusting the student instead of the teacher.

Knowledge Distillation

A Multi-objective Complex Network Pruning Framework Based on Divide-and-conquer and Global Performance Impairment Ranking

no code implementations28 Mar 2023 Ronghua Shang, Songling Zhu, Yinan Wu, Weitong Zhang, Licheng Jiao, Songhua Xu

To this end, a multi-objective complex network pruning framework based on divide-and-conquer and global performance impairment ranking (EMO-DIR) is proposed in this paper.

Model Compression Network Pruning

Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs

no code implementations17 Mar 2023 Junkai Zhang, Weitong Zhang, Quanquan Gu

The sample complexity of our algorithm only has a polylogarithmic dependence on the planning horizon and therefore is "horizon-free".

Reinforcement Learning (RL)

On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

no code implementations16 Mar 2023 Weitong Zhang, Jiafan He, Zhiyuan Fan, Quanquan Gu

We show that, when the misspecification level $\zeta$ is dominated by $\tilde O (\Delta / \sqrt{d})$ with $\Delta$ being the minimal sub-optimality gap and $d$ being the dimension of the contextual vectors, our algorithm enjoys the same gap-dependent regret bound $\tilde O (d^2/\Delta)$ as in the well-specified setting up to logarithmic factors.

Multi-Armed Bandits

Learning Neural Contextual Bandits Through Perturbed Rewards

no code implementations ICLR 2022 Yiling Jia, Weitong Zhang, Dongruo Zhou, Quanquan Gu, Hongning Wang

Thanks to the power of representation learning, neural contextual bandit algorithms demonstrate remarkable performance improvement against their classical counterparts.

Computational Efficiency Multi-Armed Bandits +1

Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

no code implementations NeurIPS 2021 Weitong Zhang, Dongruo Zhou, Quanquan Gu

By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least $\tilde \Omega(H^2d\epsilon^{-2})$ episodes to obtain an $\epsilon$-optimal policy.

Model-based Reinforcement Learning reinforcement-learning +1

Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL

no code implementations22 Jun 2021 Weitong Zhang, Jiafan He, Dongruo Zhou, Amy Zhang, Quanquan Gu

For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space and achieves gap-dependent sample complexity.

Offline RL reinforcement-learning +2

A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods

no code implementations NeurIPS 2020 Yue Wu, Weitong Zhang, Pan Xu, Quanquan Gu

In this work, we provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i. i. d.

Vocal Bursts Valence Prediction

Neural Thompson Sampling

2 code implementations ICLR 2021 Weitong Zhang, Dongruo Zhou, Lihong Li, Quanquan Gu

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems.

Multi-Armed Bandits Thompson Sampling

A Finite Time Analysis of Two Time-Scale Actor Critic Methods

no code implementations4 May 2020 Yue Wu, Weitong Zhang, Pan Xu, Quanquan Gu

In this work, we provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i. i. d.

Vocal Bursts Valence Prediction

Characters Detection on Namecard with faster RCNN

no code implementations27 Jul 2018 Weitong Zhang

We apply Faster R-CNN to the detection of characters in namecard, in order to solve the problem of a small amount of data and the inbalance between different class, we designed the data augmentation and the 'fake' data generalizer to generate more data for the training of network.

Data Augmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.