Search Results for author: Ruiqi Zhang

Found 11 papers, 1 papers with code

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning

no code implementations8 Apr 2024 Ruiqi Zhang, Licong Lin, Yu Bai, Song Mei

LLM unlearning aims to eliminate the influence of undesirable data from the pre-trained model while preserving the model's utilities on other tasks.

Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement

no code implementations24 Feb 2024 Ruiqi Zhang, Yuexiang Zhai, Andrea Zanette

Surprisingly, in this work, we demonstrate that even in such a data-starved setting it may still be possible to find a policy competitive with the optimal one.

Decision Making Multi-Armed Bandits

In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization

no code implementations22 Feb 2024 Ruiqi Zhang, Jingfeng Wu, Peter L. Bartlett

We study the \emph{in-context learning} (ICL) ability of a \emph{Linear Transformer Block} (LTB) that combines a linear attention component and a linear multi-layer perceptron (MLP) component.

In-Context Learning

AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition

no code implementations18 Feb 2024 Zhaorun Chen, Zhuokai Zhao, Zhihong Zhu, Ruiqi Zhang, Xiang Li, Bhiksha Raj, Huaxiu Yao

Recent advancements in large language models (LLMs) have shown promise in multi-step reasoning tasks, yet their reliance on extensive manual labeling to provide procedural feedback remains a significant impediment.

Spreeze: High-Throughput Parallel Reinforcement Learning Framework

no code implementations11 Dec 2023 Jing Hou, Guang Chen, Ruiqi Zhang, Zhijun Li, Shangding Gu, Changjun Jiang

While existing parallel RL frameworks encompass a variety of RL algorithms and parallelization techniques, the excessively burdensome communication frameworks hinder the attainment of the hardware's limit for final throughput and training effects on a single desktop.

reinforcement-learning Reinforcement Learning (RL)

Explicifying Neural Implicit Fields for Efficient Dynamic Human Avatar Modeling via a Neural Explicit Surface

no code implementations7 Aug 2023 Ruiqi Zhang, Jie Chen, Qiang Wang

This paper proposes a technique for efficiently modeling dynamic humans by explicifying the implicit neural fields via a Neural Explicit Surface (NES).

Computational Efficiency

Trained Transformers Learn Linear Models In-Context

no code implementations16 Jun 2023 Ruiqi Zhang, Spencer Frei, Peter L. Bartlett

We show that although gradient flow succeeds at finding a global minimum in this setting, the trained transformer is still brittle under mild covariate shifts.

In-Context Learning regression

NDF: Neural Deformable Fields for Dynamic Human Modelling

1 code implementation19 Jul 2022 Ruiqi Zhang, Jie Chen

However, the learned canonical representation is static and the current design of the deformation fields is not able to represent large movements or detailed geometry changes.

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

no code implementations10 Feb 2022 Ruiqi Zhang, Xuezhou Zhang, Chengzhuo Ni, Mengdi Wang

We approach this problem using the Z-estimation theory and establish the following results: The FQE estimation error is asymptotically normal with explicit variance determined jointly by the tangent space of the function class at the ground truth, the reward structure, and the distribution shift due to off-policy learning; The finite-sample FQE error bound is dominated by the same variance term, and it can also be bounded by function class-dependent divergence, which measures how the off-policy distribution shift intertwines with the function approximator.

Off-policy evaluation

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

no code implementations31 Jan 2022 Chengzhuo Ni, Ruiqi Zhang, Xiang Ji, Xuezhou Zhang, Mengdi Wang

Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy.

A paradigm system for strong correlation and charge transfer competition

no code implementations4 Mar 2021 James W Furness, Ruiqi Zhang, Jianwei Sun

In chemistry and condensed matter physics the solution of simple paradigm systems, such as the hydrogen atom and the uniform electron gas, plays a critical role in understanding electron behaviors and developing electronic structure methods.

Chemical Physics

Cannot find the paper you are looking for? You can Submit a new open access paper.