Search Results for author: Nan Jiang

Found 129 papers, 42 papers with code

An Exact Solver for Satisfiability Modulo Counting with Probabilistic Circuits

no code implementations2 Mar 2025 Jinzhao Li, Nan Jiang, Yexiang Xue

We propose KOCO-SMC, an integrated exact SMC solver that efficiently tracks lower and upper bounds in the probabilistic inference process.

Computational Efficiency

Self-rewarding correction for mathematical reasoning

1 code implementation26 Feb 2025 Wei Xiong, Hanning Zhang, Chenlu Ye, Lichang Chen, Nan Jiang, Tong Zhang

We study self-rewarding reasoning large language models (LLMs), which can simultaneously generate step-by-step reasoning and evaluate the correctness of their outputs during the inference time-without external feedback.

Mathematical Reasoning

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent

no code implementations24 Feb 2025 Yuheng Zhang, Dian Yu, Tao Ge, Linfeng Song, Zhichen Zeng, Haitao Mi, Nan Jiang, Dong Yu

Reinforcement learning from human feedback (RLHF) has demonstrated remarkable effectiveness in aligning large language models (LLMs) with human preferences.

RLOMM: An Efficient and Robust Online Map Matching Framework with Reinforcement Learning

no code implementations5 Feb 2025 Minxiao Chen, Haitao Yuan, Nan Jiang, Zhihan Zheng, Sai Wu, Ao Zhou, Shangguang Wang

To improve efficiency, we begin by modeling the online map matching problem as an Online Markov Decision Process (OMDP) based on its inherent characteristics.

Contrastive Learning Representation Learning

MLLM-as-a-Judge for Image Safety without Human Labeling

no code implementations31 Dec 2024 Zhenting Wang, Shuming Hu, Shiyu Zhao, Xiaowen Lin, Felix Juefei-Xu, Zhuowei Li, Ligong Han, Harihar Subramanyam, Li Chen, Jianfa Chen, Nan Jiang, Lingjuan Lyu, Shiqing Ma, Dimitris N. Metaxas, Ankit Jain

To address these challenges, we propose a MLLM-based method includes objectifying safety rules, assessing the relevance between rules and images, making quick judgments based on debiased token probabilities with logically complete yet simplified precondition chains for safety rules, and conducting more in-depth reasoning with cascaded chain-of-thought processes if necessary.

Image Generation

GAS: Generative Auto-bidding with Post-training Search

no code implementations22 Dec 2024 Yewen Li, Shuai Mao, Jingtong Gao, Nan Jiang, Yunjian Xu, Qingpeng Cai, Fei Pan, Peng Jiang, Bo An

We use weak-to-strong search alignment by training small critics for different preferences and an MCTS-inspired search to refine the model's output.

Computational Efficiency Sequential Decision Making

GameArena: Evaluating LLM Reasoning through Live Computer Games

no code implementations9 Dec 2024 Lanxiang Hu, Qiyu Li, Anze Xie, Nan Jiang, Ion Stoica, Haojian Jin, Hao Zhang

For the first time, GameArena enables the collection of step-by-step LLM reasoning data in the wild.

Chatbot

Commit0: Library Generation from Scratch

1 code implementation2 Dec 2024 Wenting Zhao, Nan Jiang, Celine Lee, Justin T Chiu, Claire Cardie, Matthias Gallé, Alexander M Rush

As a benchmark, Commit0 is designed to move beyond static one-shot code generation towards agents that must process long-form natural language specifications, adapt to multi-stage feedback, and generate code with complex dependencies.

Benchmarking Code Generation

Can Language Models Replace Programmers? REPOCOD Says 'Not Yet'

1 code implementation29 Oct 2024 Shanchao Liang, Yiran Hu, Nan Jiang, Lin Tan

In our evaluations of ten LLMs, none of the models achieve more than 30% pass@1 on REPOCOD, indicating the necessity of building stronger LLMs that can help developers in real-world software development.

Code Completion Code Generation +1

WAFFLE: Multi-Modal Model for Automated Front-End Development

1 code implementation24 Oct 2024 Shanchao Liang, Nan Jiang, Shangshu Qian, Lin Tan

Web development involves turning UI designs into functional webpages, which can be difficult for both beginners and experienced developers due to the complexity of HTML's hierarchical structures and styles.

Code Generation SSIM

Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity

no code implementations23 Oct 2024 Philip Amortila, Dylan J. Foster, Nan Jiang, Akshay Krishnamurthy, Zakaria Mhammedi

Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations, but the underlying (''latent'') dynamics are comparatively simple.

reinforcement-learning Reinforcement Learning

Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in Code

no code implementations13 Oct 2024 Nan Jiang, Qi Li, Lin Tan, Tianyi Zhang

Despite their success, large language models (LLMs) face the critical challenge of hallucinations, generating plausible but incorrect content.

Code Generation Hallucination +3

Autonomous Character-Scene Interaction Synthesis from Text Instruction

no code implementations4 Oct 2024 Nan Jiang, Zimo He, Zi Wang, Hongjie Li, Yixin Chen, Siyuan Huang, Yixin Zhu

Synthesizing human motions in 3D environments, particularly those with complex activities such as locomotion, hand-reaching, and human-object interaction, presents substantial demands for user-defined waypoints and stage transitions.

Human-Object Interaction Detection

SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models

no code implementations28 Sep 2024 Yi Wu, Zikang Xiong, Yiran Hu, Shreyash S. Iyengar, Nan Jiang, Aniket Bera, Lin Tan, Suresh Jagannathan

We demonstrate the effectiveness and generalizability of SELP across different robot agents and tasks, including drone navigation and robot manipulation.

Drone navigation Robot Manipulation +1

Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection

1 code implementation26 Sep 2024 Pengfei Cai, Yan Song, Nan Jiang, Qing Gu, Ian McLoughlin

A significant challenge in sound event detection (SED) is the effective utilization of unlabeled data, given the limited availability of labeled data due to high annotation costs.

Event Detection Representation Learning +2

LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement

no code implementations21 Sep 2024 Nan Jiang, Shanchao Liang, Chengxiao Wang, Jiannan Wang, Lin Tan

Portable Document Format (PDF) files are dominantly used for storing and disseminating scientific research, legal documents, and tax information.

Fault localization

Active Symbolic Discovery of Ordinary Differential Equations via Phase Portrait Sketching

1 code implementation2 Sep 2024 Nan Jiang, Md Nasim, Yexiang Xue

The symbolic discovery of Ordinary Differential Equations (ODEs) from trajectory data plays a pivotal role in AI-driven scientific discovery.

Active Learning scientific discovery

A Tighter Convergence Proof of Reverse Experience Replay

1 code implementation30 Aug 2024 Nan Jiang, Jinzhao Li, Yexiang Xue

In reinforcement learning, Reverse Experience Replay (RER) is a recently proposed algorithm that attains better sample complexity than the classic experience replay method.

Urban Traffic Accident Risk Prediction Revisited: Regionality, Proximity, Similarity and Sparsity

1 code implementation29 Jul 2024 Minxiao Chen, Haitao Yuan, Nan Jiang, Zhifeng Bao, Shangguang Wang

In particular, it should adequately consider the regional background, accurately capture both spatial proximity and semantic similarity, and effectively address the sparsity of traffic accidents.

Semantic Similarity Semantic Textual Similarity

F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

no code implementations17 Jul 2024 Jie Yang, Xuesong Niu, Nan Jiang, Ruimao Zhang, Siyuan Huang

Existing 3D human object interaction (HOI) datasets and models simply align global descriptions with the long HOI sequence, while lacking a detailed understanding of intermediate states and the transitions between states.

Human-Object Interaction Detection Language Modelling +1

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

no code implementations30 Jun 2024 Yuheng Zhang, Dian Yu, Baolin Peng, Linfeng Song, Ye Tian, Mingyue Huo, Nan Jiang, Haitao Mi, Dong Yu

Specifically, we formulate the problem as a two-player game and propose a novel online algorithm, iterative Nash policy optimization (INPO).

LeDex: Training LLMs to Better Self-Debug and Explain Code

no code implementations28 May 2024 Nan Jiang, Xiaopeng Li, Shiqi Wang, Qiang Zhou, Soneya Binta Hossain, Baishakhi Ray, Varun Kumar, Xiaofei Ma, Anoop Deoras

We thus propose an automated pipeline to collect a high-quality dataset for code explanation and refinement by generating a number of explanations and refinement trajectories from the LLM itself or a larger teacher model and filtering via execution verification.

Code Generation Reinforcement Learning (RL)

RLHF Workflow: From Reward Modeling to Online RLHF

3 code implementations13 May 2024 Hanze Dong, Wei Xiong, Bo Pang, Haoxiang Wang, Han Zhao, Yingbo Zhou, Nan Jiang, Doyen Sahoo, Caiming Xiong, Tong Zhang

We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature.

Chatbot HumanEval +3

Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

no code implementations11 May 2024 Yang Yang, Nan Jiang, Yi Xu, De-Chuan Zhan

Open-set Semi-supervised Learning (OSSL) holds a realistic setting that unlabeled data may come from classes unseen in the labeled set, i. e., out-of-distribution (OOD) data, which could cause performance degradation in conventional SSL models.

Learning Theory

PhyRecon: Physically Plausible Neural Scene Reconstruction

no code implementations25 Apr 2024 Junfeng Ni, Yixin Chen, Bohan Jing, Nan Jiang, Bin Wang, Bo Dai, Puhao Li, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

In this paper, we introduce PHYRECON, the first approach to leverage both differentiable rendering and differentiable physics simulation to learn implicit surface representations.

3D Reconstruction Multi-View 3D Reconstruction

A Note on Loss Functions and Error Compounding in Model-based Reinforcement Learning

no code implementations15 Apr 2024 Nan Jiang

This note clarifies some confusions (and perhaps throws out more) around model-based reinforcement learning and their theoretical understanding in the context of deep RL.

Model-based Reinforcement Learning

STMGF: An Effective Spatial-Temporal Multi-Granularity Framework for Traffic Forecasting

no code implementations8 Apr 2024 Zhengyang Zhao, Haitao Yuan, Nan Jiang, Minxiao Chen, Ning Liu, Zengxiang Li

Accurate Traffic Prediction is a challenging task in intelligent transportation due to the spatial-temporal aspects of road networks.

Traffic Prediction

Towards Effective Next POI Prediction: Spatial and Semantic Augmentation with Remote Sensing Data

no code implementations22 Mar 2024 Nan Jiang, Haitao Yuan, Jianing Si, Minxiao Chen, Shangguang Wang

The next point-of-interest (POI) prediction is a significant task in location-based services, yet its complexity arises from the consolidation of spatial and semantic intent.

Prediction

RouterBench: A Benchmark for Multi-LLM Routing System

2 code implementations18 Mar 2024 Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, Shriyash Kaustubh Upadhyay

To bridge this gap, we present RouterBench, a novel evaluation framework designed to systematically assess the efficacy of LLM routing systems, along with a comprehensive dataset comprising over 405k inference outcomes from representative LLMs to support the development of routing strategies.

Scaling Up Dynamic Human-Scene Interaction Modeling

no code implementations CVPR 2024 Nan Jiang, Zhiyuan Zhang, Hongjie Li, Xiaoxuan Ma, Zan Wang, Yixin Chen, Tengyu Liu, Yixin Zhu, Siyuan Huang

Confronting the challenges of data scarcity and advanced motion synthesis in human-scene interaction modeling, we introduce the TRUMANS dataset alongside a novel HSI motion synthesis method.

Motion Synthesis

On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation

no code implementations22 Feb 2024 Yuheng Zhang, Nan Jiang

We study off-policy evaluation (OPE) in partially observable environments with complex observations, with the goal of developing estimators whose guarantee avoids exponential dependence on the horizon.

Off-policy evaluation

Online Iterative Reinforcement Learning from Human Feedback with General Preference Model

1 code implementation11 Feb 2024 Chenlu Ye, Wei Xiong, Yuheng Zhang, Hanze Dong, Nan Jiang, Tong Zhang

We investigate Reinforcement Learning from Human Feedback (RLHF) in the context of a general preference oracle.

Vertical Symbolic Regression via Deep Policy Gradient

1 code implementation1 Feb 2024 Nan Jiang, Md Nasim, Yexiang Xue

We propose Vertical Symbolic Regression using Deep Policy Gradient (VSR-DPG) and demonstrate that VSR-DPG can recover ground-truth equations involving multiple input variables, significantly beyond both deep reinforcement learning-based approaches and previous VSR variants.

Decision Making Deep Reinforcement Learning +3

Harnessing Density Ratios for Online Reinforcement Learning

no code implementations18 Jan 2024 Philip Amortila, Dylan J. Foster, Nan Jiang, Ayush Sekhari, Tengyang Xie

The theories of offline and online reinforcement learning, despite having evolved in parallel, have begun to show signs of the possibility for a unification, with algorithms and analysis techniques for one setting often having natural counterparts in the other.

Offline RL reinforcement-learning +1

Vertical Symbolic Regression

no code implementations19 Dec 2023 Nan Jiang, Md Nasim, Yexiang Xue

The first few steps in vertical discovery are significantly cheaper than the horizontal path, as their search is in reduced hypothesis spaces involving a small set of variables.

regression scientific discovery +1

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint

3 code implementations18 Dec 2023 Wei Xiong, Hanze Dong, Chenlu Ye, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, Tong Zhang

We investigate its behavior in three distinct settings -- offline, online, and hybrid -- and propose efficient algorithms with finite-sample theoretical guarantees.

Language Modeling Language Modelling +1

Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning

no code implementations22 Nov 2023 Nan Jiang, Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Lin Tan, Xiangyu Zhang, Petr Babkin

Binary code analysis is the foundation of crucial tasks in the security domain; thus building effective binary analysis techniques is more important than ever.

Code Translation Compiler Optimization +3

Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture

no code implementations1 Nov 2023 Yixin Chen, Junfeng Ni, Nan Jiang, Yaowei Zhang, Yixin Zhu, Siyuan Huang

Reconstructing detailed 3D scenes from single-view images remains a challenging task due to limitations in existing approaches, which primarily focus on geometric shape recovery, overlooking object appearances and fine shape details.

3D Object Reconstruction 3D Reconstruction +5

Is attention required for ICL? Exploring the Relationship Between Model Architecture and In-Context Learning Ability

1 code implementation12 Oct 2023 Ivan Lee, Nan Jiang, Taylor Berg-Kirkpatrick

We also measure each architecture's predisposition towards in-context learning when presented with the option to memorize rather than leverage in-context examples.

Causal Language Modeling In-Context Learning +2

Solving Satisfiability Modulo Counting for Symbolic and Statistical AI Integration With Provable Guarantees

1 code implementation16 Sep 2023 Jinzhao Li, Nan Jiang, Yexiang Xue

Solving SMC is challenging because of its highly intractable nature($\text{NP}^{\text{PP}}$-complete), incorporating statistical inference and symbolic reasoning.

Decision Making

Racing Control Variable Genetic Programming for Symbolic Regression

1 code implementation13 Sep 2023 Nan Jiang, Yexiang Xue

A selection scheme similar to that used in selecting good symbolic equations in the genetic programming process is implemented to ensure that promising experiment schedules eventually win over the average ones.

Deep Reinforcement Learning regression +1

Mitigating the Alignment Tax of RLHF

1 code implementation12 Sep 2023 Yong Lin, Hangyu Lin, Wei Xiong, Shizhe Diao, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan YAO, Tong Zhang

Building on the analysis and the observation that averaging different layers of the transformer leads to significantly different alignment-forgetting trade-offs, we propose Heterogeneous Model Averaging (HMA) to Heterogeneously find various combination ratios of model layers.

Common Sense Reasoning Continual Learning

Marginalized Importance Sampling for Off-Environment Policy Evaluation

no code implementations4 Sep 2023 Pulkit Katdare, Nan Jiang, Katherine Driggs-Campbell

This paper proposes a new approach to evaluate the real-world performance of agent policies prior to deploying them in the real world.

Reinforcement Learning (RL)

The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation

no code implementations25 Jul 2023 Philip Amortila, Nan Jiang, Csaba Szepesvári

Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation.

Off-policy evaluation

How Effective Are Neural Networks for Fixing Security Vulnerabilities

1 code implementation29 May 2023 Yi Wu, Nan Jiang, Hung Viet Pham, Thibaud Lutellier, Jordan Davis, Lin Tan, Petr Babkin, Sameena Shah

The results call for innovations to enhance automated Java vulnerability repair such as creating larger vulnerability repair training data, tuning LLMs with such data, and applying code simplification transformation to facilitate vulnerability repair.

Code Completion Program Repair

Symbolic Regression via Control Variable Genetic Programming

1 code implementation25 May 2023 Nan Jiang, Yexiang Xue

CVGP starts by fitting simple expressions involving a small set of independent variables using genetic programming, under controlled experiments where other variables are held as constants.

regression scientific discovery +1

Word Embeddings Are Steers for Language Models

1 code implementation22 May 2023 Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek Abdelzaher, Heng Ji

In this work, we theoretically and empirically revisit output word embeddings and find that their linear transformations are equivalent to steering language model generation styles.

Language Modeling Language Modelling +1

Explaining RL Decisions with Trajectories

2 code implementations6 May 2023 Shripad Vilasrao Deshmukh, Arpan Dasgupta, Balaji Krishnamurthy, Nan Jiang, Chirag Agarwal, Georgios Theocharous, Jayakumar Subramanian

To do so, we encode trajectories in offline training data individually as well as collectively (encoding a set of trajectories).

Attribute continuous-control +5

Adversarial Model for Offline Reinforcement Learning

no code implementations NeurIPS 2023 Mohak Bhardwaj, Tengyang Xie, Byron Boots, Nan Jiang, Ching-An Cheng

We propose a novel model-based offline Reinforcement Learning (RL) framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage.

model reinforcement-learning +2

Offline Learning in Markov Games with General Function Approximation

no code implementations6 Feb 2023 Yuheng Zhang, Yu Bai, Nan Jiang

We study offline multi-agent reinforcement learning (RL) in Markov games, where the goal is to learn an approximate equilibrium -- such as Nash equilibrium and (Coarse) Correlated Equilibrium -- from an offline dataset pre-collected from the game.

Multi-agent Reinforcement Learning Reinforcement Learning (RL)

Reinforcement Learning in Low-Rank MDPs with Density Features

no code implementations4 Feb 2023 Audrey Huang, Jinglin Chen, Nan Jiang

As a central technical challenge, the additive error of occupancy estimation is incompatible with the multiplicative definition of data coverage.

reinforcement-learning Reinforcement Learning +2

KNOD: Domain Knowledge Distilled Tree Decoder for Automated Program Repair

1 code implementation3 Feb 2023 Nan Jiang, Thibaud Lutellier, Yiling Lou, Lin Tan, Dan Goldwasser, Xiangyu Zhang

KNOD has two major novelties, including (1) a novel three-stage tree decoder, which directly generates Abstract Syntax Trees of patched code according to the inherent tree structure, and (2) a novel domain-rule distillation, which leverages syntactic and semantic rules and teacher-student distributions to explicitly inject the domain knowledge into the decoding procedure during both the training and inference phases.

Decoder Program Repair

Full-Body Articulated Human-Object Interaction

1 code implementation ICCV 2023 Nan Jiang, Tengyu Liu, Zhexuan Cao, Jieming Cui, Zhiyuan Zhang, Yixin Chen, He Wang, Yixin Zhu, Siyuan Huang

By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation to tackle the estimation of articulated object poses and shapes during whole-body interactions.

Action Recognition Human-Object Interaction Detection +3

Information Bottleneck-Inspired Type Based Multiple Access for Remote Estimation in IoT Systems

no code implementations19 Dec 2022 Meiyi Zhu, Chunyan Feng, Caili Guo, Nan Jiang, Osvaldo Simeone

Type-based multiple access (TBMA) is a semantics-aware multiple access protocol for remote inference.

Decoder

Learning Markov Random Fields for Combinatorial Structures via Sampling through Lovász Local Lemma

1 code implementation1 Dec 2022 Nan Jiang, Yi Gu, Yexiang Xue

Contrastive divergence is then applied to separate these samples from those in the training set.

LEMMA valid

ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data

no code implementations8 Nov 2022 Tengyang Xie, Mohak Bhardwaj, Nan Jiang, Ching-An Cheng

We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary baseline policy regardless of data coverage.

Offline RL

Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions

no code implementations27 Oct 2022 Audrey Huang, Nan Jiang

Off-policy evaluation often refers to two related tasks: estimating the expected return of a policy and estimating its value function (or other functions of interest, such as density ratios).

Off-policy evaluation

The Role of Coverage in Online Reinforcement Learning

no code implementations9 Oct 2022 Tengyang Xie, Dylan J. Foster, Yu Bai, Nan Jiang, Sham M. Kakade

Coverage conditions -- which assert that the data logging distribution adequately covers the state space -- play a fundamental role in determining the sample complexity of offline reinforcement learning.

Efficient Exploration Offline RL +3

On the Value of Behavioral Representations for Dense Retrieval

no code implementations11 Aug 2022 Nan Jiang, Dhivya Eswaran, Choon Hui Teo, Yexiang Xue, Yesh Dattatreya, Sujay Sanghavi, Vishy Vishwanathan

We consider text retrieval within dense representational space in real-world settings such as e-commerce search where (a) document popularity and (b) diversity of queries associated with a document have a skewed distribution.

Text Retrieval

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

1 code implementation NeurIPS 2023 Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun

Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.

Off-policy evaluation

A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation

no code implementations18 Jul 2022 Philip Amortila, Nan Jiang, Dhruv Madeka, Dean P. Foster

Towards establishing the minimal amount of expert queries needed, we show that, in the same setting, any learner whose exploration budget is polynomially-bounded (in terms of $d, H,$ and $|\mathcal{A}|$) will require at least $\tilde\Omega(\sqrt{d})$ oracle calls to recover a policy competing with the expert's value function.

Imitation Learning Reinforcement Learning (RL)

On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL

no code implementations21 Jun 2022 Jinglin Chen, Aditya Modi, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal

We study reward-free reinforcement learning (RL) under general non-linear function approximation, and establish sample efficiency and hardness results under various standard structural assumptions.

Reinforcement Learning (RL)

Interaction-Grounded Learning with Action-inclusive Feedback

no code implementations16 Jun 2022 Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford

Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies.

Brain Computer Interface

Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret

1 code implementation25 May 2022 Jiawei Huang, Li Zhao, Tao Qin, Wei Chen, Nan Jiang, Tie-Yan Liu

We propose a new learning framework that captures the tiered structure of many real-world user-interaction applications, where the users can be divided into two groups based on their different tolerance on exploration risks and should be treated separately.

reinforcement-learning Reinforcement Learning (RL)

Adaptable Semantic Compression and Resource Allocation for Task-Oriented Communications

no code implementations19 Apr 2022 Chuanhong Liu, Caili Guo, Yang Yang, Nan Jiang

To solve the problem, both compression ratio and resource allocation are optimized for the task-oriented communication system to maximize the success probability of tasks.

Semantic Compression

Offline Reinforcement Learning Under Value and Density-Ratio Realizability: The Power of Gaps

no code implementations25 Mar 2022 Jinglin Chen, Nan Jiang

We consider a challenging theoretical problem in offline reinforcement learning (RL): obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under only realizability-type assumptions for the function approximators.

Offline RL Reinforcement Learning (RL)

VRConvMF: Visual Recurrent Convolutional Matrix Factorization for Movie Recommendation

no code implementations16 Feb 2022 Zhu Wang, Honglong Chen, Zhe Li, Kai Lin, Nan Jiang, Feng Xia

Fortunately, context-aware recommender systems can alleviate the sparsity problem by making use of some auxiliary information, such as the information of both the users and items.

Descriptive Movie Recommendation +1

Offline Reinforcement Learning with Realizability and Single-policy Concentrability

no code implementations9 Feb 2022 Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e. g., Bellman-completeness) and the data coverage (e. g., all-policy concentrability).

Offline RL reinforcement-learning +2

Adversarially Trained Actor Critic for Offline Reinforcement Learning

3 code implementations5 Feb 2022 Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal

We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism.

continuous-control Continuous Control +5

A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes

1 code implementation12 Nov 2021 Chengchun Shi, Masatoshi Uehara, Jiawei Huang, Nan Jiang

In this work, we first propose novel identification methods for OPE in POMDPs with latent confounders, by introducing bridge functions that link the target policy's value and the observed data distribution.

Off-policy evaluation

Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning

1 code implementation NeurIPS 2021 Siyuan Zhang, Nan Jiang

How to select between policies and value functions produced by different training algorithms in offline reinforcement learning (RL) -- which is crucial for hyperpa-rameter tuning -- is an important open question.

Off-policy evaluation Open-Ended Question Answering +2

A Fast Randomized Algorithm for Massive Text Normalization

no code implementations6 Oct 2021 Nan Jiang, Chen Luo, Vihan Lakshman, Yesh Dattatreya, Yexiang Xue

In addition, FLAN does not require any annotated data or supervised learning.

A Spectral Approach to Off-Policy Evaluation for POMDPs

no code implementations22 Sep 2021 Yash Nair, Nan Jiang

We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes, where the evaluation policy depends only on observable variables but the behavior policy depends on latent states (Tennenholtz et al. (2020a)).

Causal Identification Off-policy evaluation

Bellman-consistent Pessimism for Offline Reinforcement Learning

no code implementations NeurIPS 2021 Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal

The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning.

reinforcement-learning Reinforcement Learning +1

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

no code implementations NeurIPS 2021 Tengyang Xie, Nan Jiang, Huan Wang, Caiming Xiong, Yu Bai

This offline result is the first that matches the sample complexity lower bound in this setting, and resolves a recent open question in offline RL.

Offline RL Open-Ended Question Answering +3

On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction

no code implementations2 Jun 2021 Jiawei Huang, Nan Jiang

In this paper, we study the convergence properties of off-policy policy improvement algorithms with state-action density ratio correction under function approximation setting, where the objective function is formulated as a max-max-min optimization problem.

Minimax Model Learning

no code implementations2 Mar 2021 Cameron Voloshin, Nan Jiang, Yisong Yue

We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning.

model Model-based Reinforcement Learning +2

CURE: Code-Aware Neural Machine Translation for Automatic Program Repair

1 code implementation26 Feb 2021 Nan Jiang, Thibaud Lutellier, Lin Tan

Finally, CURE uses a subword tokenization technique to generate a smaller search space that contains more correct fixes.

Machine Translation NMT +2

SM+: Refined Scale Match for Tiny Person Detection

no code implementations6 Feb 2021 Nan Jiang, Xuehui Yu, Xiaoke Peng, Yuqi Gong, Zhenjun Han

Detecting tiny objects ( e. g., less than 20 x 20 pixels) in large-scale images is an important yet open problem.

Human Detection

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency

no code implementations5 Feb 2021 Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie

We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and $q$-functions when these are estimated using recent minimax methods.

Off-policy evaluation reinforcement-learning

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function

no code implementations3 Feb 2021 Gellért Weisz, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, Csaba Szepesvári

We consider local planning in fixed-horizon MDPs with a generative model under the assumption that the optimal value function lies close to the span of a feature map.

Open-Ended Question Answering

Experimental demonstration of memory-enhanced scaling for entanglement connection of quantum repeater segments

no code implementations21 Jan 2021 Yunfei Pu, Sheng Zhang, Yukai Wu, Nan Jiang, Wei Chang, Chang Li, Luming Duan

The experimental realization of entanglement connection of two quantum repeater segments with an efficient memory-enhanced scaling demonstrates a key advantage of the quantum repeater protocol, which makes a cornerstone towards future large-scale quantum networks.

Quantum Physics

Anti-UAV: A Large Multi-Modal Benchmark for UAV Tracking

1 code implementation21 Jan 2021 Nan Jiang, Kuiran Wang, Xiaoke Peng, Xuehui Yu, Qiang Wang, Junliang Xing, Guorong Li, Jian Zhao, Guodong Guo, Zhenjun Han

The releasing of such a large-scale dataset could be a useful initial step in research of tracking UAVs.

Quantifying Spatial Homogeneity of Urban Road Networks via Graph Neural Networks

1 code implementation1 Jan 2021 Jiawei Xue, Nan Jiang, Senwei Liang, Qiyuan Pang, Takahiro Yabe, Satish V. Ukkusuri, Jianzhu Ma

We apply the method to 11, 790 urban road networks across 30 cities worldwide to measure the spatial homogeneity of road networks within each city and across different cities.

When Counterpoint Meets Chinese Folk Melodies

1 code implementation NeurIPS 2020 Nan Jiang, Sheng Jin, Zhiyao Duan, ChangShui Zhang

An interaction reward model is trained on the duets formed from outer parts of Bach chorales to model counterpoint interaction, while a style reward model is trained on monophonic melodies of Chinese folk songs to model melodic patterns.

A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting

no code implementations2 Nov 2020 Philip Amortila, Nan Jiang, Tengyang Xie

Recently, Wang et al. (2020) showed a highly intriguing hardness result for batch reinforcement learning (RL) with linearly realizable value function and good feature coverage in the finite-horizon case.

reinforcement-learning Reinforcement Learning +1

The 1st Tiny Object Detection Challenge:Methods and Results

1 code implementation16 Sep 2020 Xuehui Yu, Zhenjun Han, Yuqi Gong, Nan Jiang, Jian Zhao, Qixiang Ye, Jie Chen, Yuan Feng, Bin Zhang, Xiaodi Wang, Ying Xin, Jingwei Liu, Mingyuan Mao, Sheng Xu, Baochang Zhang, Shumin Han, Cheng Gao, Wei Tang, Lizuo Jin, Mingbo Hong, Yuchao Yang, Shuiwang Li, Huan Luo, Qijun Zhao, Humphrey Shi

The 1st Tiny Object Detection (TOD) Challenge aims to encourage research in developing novel and accurate methods for tiny object detection in images which have wide views, with a current focus on tiny person detection.

Human Detection Object +2

Analysis of Random Access in NB-IoT Networks with Three Coverage Enhancement Groups: A Stochastic Geometry Approach

no code implementations14 Sep 2020 Yan Liu, Yansha Deng, Nan Jiang, Maged Elkashlan, Arumugam Nallanathan

NarrowBand-Internet of Things (NB-IoT) is a new 3GPP radio access technology designed to provide better coverage for Low Power Wide Area (LPWA) networks.

Batch Value-function Approximation with Only Realizability

1 code implementation11 Aug 2020 Tengyang Xie, Nan Jiang

We make progress in a long-standing problem of batch reinforcement learning (RL): learning $Q^\star$ from an exploratory and polynomial-sized dataset, using a realizable and otherwise arbitrary function class.

Model Selection Reinforcement Learning (RL)

A Question Type Driven and Copy Loss Enhanced Frameworkfor Answer-Agnostic Neural Question Generation

no code implementations WS 2020 Xiuyu Wu, Nan Jiang, Yunfang Wu

The answer-agnostic question generation is a significant and challenging task, which aims to automatically generate questions for a given sentence but without an answer.

Question Generation Question-Generation +2

Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison

no code implementations9 Mar 2020 Tengyang Xie, Nan Jiang

We prove performance guarantees of two algorithms for approximating $Q^\star$ in batch reinforcement learning.

reinforcement-learning Reinforcement Learning +1

RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

no code implementations8 Feb 2020 Nan Jiang, Sheng Jin, Zhiyao Duan, Chang-Shui Zhang

We cast this as a reinforcement learning problem, where the generation agent learns a policy to generate a musical note (action) based on previously generated context (state).

Deep Reinforcement Learning Music Generation +2

Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

no code implementations NeurIPS 2020 Nan Jiang, Jiawei Huang

By slightly altering the derivation of previous methods (one from each style; Uehara et al., 2020), we unify them into a single value interval that comes with a special type of double robustness: when either the value-function or the importance-weight class is well specified, the interval is valid and its length quantifies the misspecification of the other class.

Efficient Exploration Off-policy evaluation +1

Scale Match for Tiny Person Detection

2 code implementations23 Dec 2019 Xuehui Yu, Yuqi Gong, Nan Jiang, Qixiang Ye, Zhenjun Han

In this paper, we introduce a new benchmark, referred to as TinyPerson, opening up a promising directionfor tiny object detection in a long distance and with mas-sive backgrounds.

Human Detection Object +2

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

3 code implementations15 Nov 2019 Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue

We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications.

Benchmarking Diversity +4

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

no code implementations ICML 2020 Masatoshi Uehara, Jiawei Huang, Nan Jiang

We provide theoretical investigations into off-policy evaluation in reinforcement learning using function approximators for (marginalized) importance weights and value functions.

Off-policy evaluation Reinforcement Learning

Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles

no code implementations23 Oct 2019 Aditya Modi, Nan Jiang, Ambuj Tewari, Satinder Singh

As an extension, we also consider the more challenging problem of model selection, where the state features are unknown and can be chosen from a large candidate set.

Model Selection reinforcement-learning +2

From Importance Sampling to Doubly Robust Policy Gradient

1 code implementation ICML 2020 Jiawei Huang, Nan Jiang

We show that on-policy policy gradient (PG) and its variance reduction variants can be derived by taking finite difference of function evaluations supplied by estimators from the importance sampling (IS) family for off-policy evaluation (OPE).

Off-policy evaluation

Quantum Communication between Multiplexed Atomic Quantum Memories

no code implementations5 Sep 2019 Chang Li, Nan Jiang, Yukai Wu, Wei Chang, Yunfei Pu, Sheng Zhang, Lu-Ming Duan

The use of multiplexed atomic quantum memories (MAQM) can significantly enhance the efficiency to establish entanglement in a quantum network.

Quantum Physics

On Value Functions and the Agent-Environment Boundary

no code implementations30 May 2019 Nan Jiang

When function approximation is deployed in reinforcement learning (RL), the same problem may be formulated in different ways, often by treating a pre-processing step as a part of the environment or as part of the agent.

Imitation Learning reinforcement-learning +2

Provably Efficient Q-Learning with Low Switching Cost

no code implementations NeurIPS 2019 Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang

We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization.

Q-Learning

Information-Theoretic Considerations in Batch Reinforcement Learning

no code implementations1 May 2019 Jinglin Chen, Nan Jiang

Value-function approximation methods that operate in batch mode have foundational importance to reinforcement learning (RL).

reinforcement-learning Reinforcement Learning +1

Provably efficient RL with Rich Observations via Latent State Decoding

1 code implementation25 Jan 2019 Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford

We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states.

Clustering Q-Learning +1

Completing State Representations using Spectral Learning

no code implementations NeurIPS 2018 Nan Jiang, Alex Kulesza, Satinder Singh

A central problem in dynamical system modeling is state discovery—that is, finding a compact summary of the past that captures the information needed to predict the future.

Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches

no code implementations21 Nov 2018 Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford

We study the sample complexity of model-based reinforcement learning (henceforth RL) in general contextual decision processes that require strategic exploration to find a near-optimal policy.

model Model-based Reinforcement Learning +1

LSDSCC: a Large Scale Domain-Specific Conversational Corpus for Response Generation with Diversity Oriented Evaluation Metrics

no code implementations NAACL 2018 Zhen Xu, Nan Jiang, Bingquan Liu, Wenge Rong, Bowen Wu, Baoxun Wang, Zhuoran Wang, Xiaolong Wang

The experimental results have shown that our proposed corpus can be taken as a new benchmark dataset for the NRG task, and the presented metrics are promising to guide the optimization of NRG models by quantifying the diversity of the generated responses reasonably.

Diversity Machine Translation +1

Image Classification Based on Quantum KNN Algorithm

no code implementations16 May 2018 Yijie Dang, Nan Jiang, Hao Hu, Zhuoxiao Ji, Wenyin Zhang

However, the usually used classification method --- the K Nearest-Neighbor algorithm has high complexity, because its two main processes: similarity computing and searching are time-consuming.

Classification General Classification +1

Markov Decision Processes with Continuous Side Information

no code implementations15 Nov 2017 Aditya Modi, Nan Jiang, Satinder Singh, Ambuj Tewari

Because our lower bound has an exponential dependence on the dimension, we consider a tractable linear setting where the context is used to create linear combinations of a finite set of MDPs.

PAC learning Reinforcement Learning +1

Repeated Inverse Reinforcement Learning

no code implementations NeurIPS 2017 Kareem Amin, Nan Jiang, Satinder Singh

We introduce a novel repeated Inverse Reinforcement Learning problem: the agent has to act on behalf of a human in a sequence of tasks and wishes to minimize the number of tasks that it surprises the human by acting suboptimally with respect to how the human would have acted.

Imitation Learning reinforcement-learning +2

Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

no code implementations ICML 2017 Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally small for many well-studied reinforcement learning settings.

Efficient Exploration reinforcement-learning +2

Neural Network Architecture Optimization through Submodularity and Supermodularity

no code implementations1 Sep 2016 Junqi Jin, Ziang Yan, Kun fu, Nan Jiang, Chang-Shui Zhang

Deep learning models' architectures, including depth and width, are key factors influencing models' performance, such as test accuracy and computation time.

Optimizing Recurrent Neural Networks Architectures under Time Constraints

no code implementations29 Aug 2016 Junqi Jin, Ziang Yan, Kun fu, Nan Jiang, Chang-Shui Zhang

A greedy algorithm with bounds is suggested to solve the transformed problem.

Word Embedding based Correlation Model for Question/Answer Matching

no code implementations15 Nov 2015 Yikang Shen, Wenge Rong, Nan Jiang, Baolin Peng, Jie Tang, Zhang Xiong

With the development of community based question answering (Q&A) services, a large scale of Q&A archives have been accumulated and are an important information and knowledge resource on the web.

Question Answering Translation

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

2 code implementations11 Nov 2015 Nan Jiang, Lihong Li

We study the problem of off-policy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy.

Decision Making reinforcement-learning +3

Unifying Spatial and Attribute Selection for Distracter-Resilient Tracking

no code implementations CVPR 2014 Nan Jiang, Ying Wu

This paper presents a novel method to jointly determine the best spatial location and the optimal metric.

Attribute

Cannot find the paper you are looking for? You can Submit a new open access paper.