2 code implementations • 11 Nov 2015 • Nan Jiang, Lihong Li
We study the problem of off-policy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy.
2 code implementations • 23 Dec 2019 • Xuehui Yu, Yuqi Gong, Nan Jiang, Qixiang Ye, Zhenjun Han
In this paper, we introduce a new benchmark, referred to as TinyPerson, opening up a promising directionfor tiny object detection in a long distance and with mas-sive backgrounds.
1 code implementation • 16 Sep 2020 • Xuehui Yu, Zhenjun Han, Yuqi Gong, Nan Jiang, Jian Zhao, Qixiang Ye, Jie Chen, Yuan Feng, Bin Zhang, Xiaodi Wang, Ying Xin, Jingwei Liu, Mingyuan Mao, Sheng Xu, Baochang Zhang, Shumin Han, Cheng Gao, Wei Tang, Lizuo Jin, Mingbo Hong, Yuchao Yang, Shuiwang Li, Huan Luo, Qijun Zhao, Humphrey Shi
The 1st Tiny Object Detection (TOD) Challenge aims to encourage research in developing novel and accurate methods for tiny object detection in images which have wide views, with a current focus on tiny person detection.
1 code implementation • 21 Jan 2021 • Nan Jiang, Kuiran Wang, Xiaoke Peng, Xuehui Yu, Qiang Wang, Junliang Xing, Guorong Li, Jian Zhao, Guodong Guo, Zhenjun Han
The releasing of such a large-scale dataset could be a useful initial step in research of tracking UAVs.
3 code implementations • 5 Feb 2022 • Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal
We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism.
1 code implementation • 1 Jan 2021 • Jiawei Xue, Nan Jiang, Senwei Liang, Qiyuan Pang, Takahiro Yabe, Satish V. Ukkusuri, Jianzhu Ma
We apply the method to 11, 790 urban road networks across 30 cities worldwide to measure the spatial homogeneity of road networks within each city and across different cities.
1 code implementation • ICCV 2023 • Nan Jiang, Tengyu Liu, Zhexuan Cao, Jieming Cui, Zhiyuan Zhang, Yixin Chen, He Wang, Yixin Zhu, Siyuan Huang
By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation to tackle the estimation of articulated object poses and shapes during whole-body interactions.
3 code implementations • 15 Nov 2019 • Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue
We offer an experimental benchmark and empirical study for off-policy policy evaluation (OPE) in reinforcement learning, which is a key problem in many safety critical applications.
1 code implementation • 18 Mar 2024 • Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, Shriyash Kaustubh Upadhyay
To bridge this gap, we present RouterBench, a novel evaluation framework designed to systematically assess the efficacy of LLM routing systems, along with a comprehensive dataset comprising over 405k inference outcomes from representative LLMs to support the development of routing strategies.
1 code implementation • 26 Feb 2021 • Nan Jiang, Thibaud Lutellier, Lin Tan
Finally, CURE uses a subword tokenization technique to generate a smaller search space that contains more correct fixes.
1 code implementation • 25 Jan 2019 • Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford
We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states.
1 code implementation • 3 Feb 2023 • Nan Jiang, Thibaud Lutellier, Yiling Lou, Lin Tan, Dan Goldwasser, Xiangyu Zhang
KNOD has two major novelties, including (1) a novel three-stage tree decoder, which directly generates Abstract Syntax Trees of patched code according to the inherent tree structure, and (2) a novel domain-rule distillation, which leverages syntactic and semantic rules and teacher-student distributions to explicitly inject the domain knowledge into the decoding procedure during both the training and inference phases.
1 code implementation • 29 May 2023 • Yi Wu, Nan Jiang, Hung Viet Pham, Thibaud Lutellier, Jordan Davis, Lin Tan, Petr Babkin, Sameena Shah
The results call for innovations to enhance automated Java vulnerability repair such as creating larger vulnerability repair training data, tuning LLMs with such data, and applying code simplification transformation to facilitate vulnerability repair.
1 code implementation • NeurIPS 2020 • Nan Jiang, Sheng Jin, Zhiyao Duan, ChangShui Zhang
An interaction reward model is trained on the duets formed from outer parts of Bach chorales to model counterpoint interaction, while a style reward model is trained on monophonic melodies of Chinese folk songs to model melodic patterns.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Maosen Zhang, Nan Jiang, Lei LI, Yexiang Xue
Generating natural language under complex constraints is a principled formulation towards controllable text generation.
1 code implementation • 11 Aug 2020 • Tengyang Xie, Nan Jiang
We make progress in a long-standing problem of batch reinforcement learning (RL): learning $Q^\star$ from an exploratory and polynomial-sized dataset, using a realizable and otherwise arbitrary function class.
1 code implementation • NeurIPS 2021 • Siyuan Zhang, Nan Jiang
How to select between policies and value functions produced by different training algorithms in offline reinforcement learning (RL) -- which is crucial for hyperpa-rameter tuning -- is an important open question.
1 code implementation • 8 Mar 2016 • Byeongkeun Kang, Kar-Han Tan, Nan Jiang, Hung-Shuo Tai, Daniel Tretter, Truong Q. Nguyen
Thus, we propose hand segmentation method for hand-object interaction using only a depth map.
1 code implementation • NeurIPS 2023 • Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun
Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.
1 code implementation • 12 Nov 2021 • Chengchun Shi, Masatoshi Uehara, Jiawei Huang, Nan Jiang
In this work, we first propose novel identification methods for OPE in POMDPs with latent confounders, by introducing bridge functions that link the target policy's value and the observed data distribution.
1 code implementation • ICML 2020 • Jiawei Huang, Nan Jiang
We show that on-policy policy gradient (PG) and its variance reduction variants can be derived by taking finite difference of function evaluations supplied by estimators from the importance sampling (IS) family for off-policy evaluation (OPE).
1 code implementation • 1 Dec 2022 • Nan Jiang, Yi Gu, Yexiang Xue
Contrastive divergence is then applied to separate these samples from those in the training set.
1 code implementation • 9 Jul 2023 • Yuanheng Zhang, Nan Jiang, Zhaoheng Xie, Junying Cao, Yueyang Teng
Accurately annotated ultrasonic images are vital components of a high-quality medical report.
1 code implementation • 12 Oct 2023 • Ivan Lee, Nan Jiang, Taylor Berg-Kirkpatrick
We also measure each architecture's predisposition towards in-context learning when presented with the option to memorize rather than leverage in-context examples.
1 code implementation • 25 May 2022 • Jiawei Huang, Li Zhao, Tao Qin, Wei Chen, Nan Jiang, Tie-Yan Liu
We propose a new learning framework that captures the tiered structure of many real-world user-interaction applications, where the users can be divided into two groups based on their different tolerance on exploration risks and should be treated separately.
1 code implementation • 6 May 2023 • Shripad Vilasrao Deshmukh, Arpan Dasgupta, Balaji Krishnamurthy, Nan Jiang, Chirag Agarwal, Georgios Theocharous, Jayakumar Subramanian
To do so, we encode trajectories in offline training data individually as well as collectively (encoding a set of trajectories).
no code implementations • ICML 2018 • Hoang M. Le, Nan Jiang, Alekh Agarwal, Miroslav Dudík, Yisong Yue, Hal Daumé III
We study how to effectively leverage expert feedback to learn sequential decision-making policies.
no code implementations • NeurIPS 2018 • Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire
We study the computational tractability of PAC reinforcement learning with rich observations.
no code implementations • 16 May 2018 • Yijie Dang, Nan Jiang, Hao Hu, Zhuoxiao Ji, Wenyin Zhang
However, the usually used classification method --- the K Nearest-Neighbor algorithm has high complexity, because its two main processes: similarity computing and searching are time-consuming.
no code implementations • 29 Aug 2016 • Junqi Jin, Ziang Yan, Kun fu, Nan Jiang, Chang-Shui Zhang
A greedy algorithm with bounds is suggested to solve the transformed problem.
no code implementations • 1 Sep 2016 • Junqi Jin, Ziang Yan, Kun fu, Nan Jiang, Chang-Shui Zhang
Deep learning models' architectures, including depth and width, are key factors influencing models' performance, such as test accuracy and computation time.
no code implementations • 15 Nov 2017 • Aditya Modi, Nan Jiang, Satinder Singh, Ambuj Tewari
Because our lower bound has an exponential dependence on the dimension, we consider a tractable linear setting where the context is used to create linear combinations of a finite set of MDPs.
no code implementations • NeurIPS 2017 • Kareem Amin, Nan Jiang, Satinder Singh
We introduce a novel repeated Inverse Reinforcement Learning problem: the agent has to act on behalf of a human in a sequence of tasks and wishes to minimize the number of tasks that it surprises the human by acting suboptimally with respect to how the human would have acted.
no code implementations • ICML 2017 • Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire
Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally small for many well-studied reinforcement learning settings.
no code implementations • 15 Nov 2015 • Yikang Shen, Wenge Rong, Nan Jiang, Baolin Peng, Jie Tang, Zhang Xiong
With the development of community based question answering (Q&A) services, a large scale of Q&A archives have been accumulated and are an important information and knowledge resource on the web.
no code implementations • ICLR 2019 • Bowen Wu, Nan Jiang, Zhifeng Gao, Mengyuan Li, Zongsheng Wang, Suke Li, Qihang Feng, Wenge Rong, Baoxun Wang
Recent advances in sequence-to-sequence learning reveal a purely data-driven approach to the response generation task.
no code implementations • 21 Nov 2018 • Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford
We study the sample complexity of model-based reinforcement learning (henceforth RL) in general contextual decision processes that require strategic exploration to find a near-optimal policy.
no code implementations • NAACL 2018 • Zhen Xu, Nan Jiang, Bingquan Liu, Wenge Rong, Bowen Wu, Baoxun Wang, Zhuoran Wang, Xiaolong Wang
The experimental results have shown that our proposed corpus can be taken as a new benchmark dataset for the NRG task, and the presented metrics are promising to guide the optimization of NRG models by quantifying the diversity of the generated responses reasonably.
no code implementations • NeurIPS 2018 • Nan Jiang, Alex Kulesza, Satinder Singh
A central problem in dynamical system modeling is state discovery—that is, finding a compact summary of the past that captures the information needed to predict the future.
no code implementations • CVPR 2014 • Nan Jiang, Ying Wu
This paper presents a novel method to jointly determine the best spatial location and the optimal metric.
no code implementations • 1 May 2019 • Jinglin Chen, Nan Jiang
Value-function approximation methods that operate in batch mode have foundational importance to reinforcement learning (RL).
no code implementations • NeurIPS 2019 • Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang
We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization.
no code implementations • 30 May 2019 • Nan Jiang
When function approximation is deployed in reinforcement learning (RL), the same problem may be formulated in different ways, often by treating a pre-processing step as a part of the environment or as part of the agent.
no code implementations • 23 Oct 2019 • Aditya Modi, Nan Jiang, Ambuj Tewari, Satinder Singh
As an extension, we also consider the more challenging problem of model selection, where the state features are unknown and can be chosen from a large candidate set.
no code implementations • ICML 2020 • Masatoshi Uehara, Jiawei Huang, Nan Jiang
We provide theoretical investigations into off-policy evaluation in reinforcement learning using function approximators for (marginalized) importance weights and value functions.
no code implementations • 5 Sep 2019 • Chang Li, Nan Jiang, Yukai Wu, Wei Chang, Yunfei Pu, Sheng Zhang, Lu-Ming Duan
The use of multiplexed atomic quantum memories (MAQM) can significantly enhance the efficiency to establish entanglement in a quantum network.
Quantum Physics
no code implementations • NeurIPS 2020 • Nan Jiang, Jiawei Huang
By slightly altering the derivation of previous methods (one from each style; Uehara et al., 2020), we unify them into a single value interval that comes with a special type of double robustness: when either the value-function or the importance-weight class is well specified, the interval is valid and its length quantifies the misspecification of the other class.
no code implementations • 8 Feb 2020 • Nan Jiang, Sheng Jin, Zhiyao Duan, Chang-Shui Zhang
We cast this as a reinforcement learning problem, where the generation agent learns a policy to generate a musical note (action) based on previously generated context (state).
no code implementations • 9 Mar 2020 • Tengyang Xie, Nan Jiang
We prove performance guarantees of two algorithms for approximating $Q^\star$ in batch reinforcement learning.
no code implementations • WS 2020 • Xiuyu Wu, Nan Jiang, Yunfang Wu
The answer-agnostic question generation is a significant and challenging task, which aims to automatically generate questions for a given sentence but without an answer.
no code implementations • 23 Oct 2020 • Priyank Agrawal, Jinglin Chen, Nan Jiang
This paper studies regret minimization with randomized value functions in reinforcement learning.
no code implementations • 2 Nov 2020 • Philip Amortila, Nan Jiang, Tengyang Xie
Recently, Wang et al. (2020) showed a highly intriguing hardness result for batch reinforcement learning (RL) with linearly realizable value function and good feature coverage in the finite-horizon case.
no code implementations • 14 Sep 2020 • Yan Liu, Yansha Deng, Nan Jiang, Maged Elkashlan, Arumugam Nallanathan
NarrowBand-Internet of Things (NB-IoT) is a new 3GPP radio access technology designed to provide better coverage for Low Power Wide Area (LPWA) networks.
no code implementations • 21 Jan 2021 • Yunfei Pu, Sheng Zhang, Yukai Wu, Nan Jiang, Wei Chang, Chang Li, Luming Duan
The experimental realization of entanglement connection of two quantum repeater segments with an efficient memory-enhanced scaling demonstrates a key advantage of the quantum repeater protocol, which makes a cornerstone towards future large-scale quantum networks.
Quantum Physics
no code implementations • 3 Feb 2021 • Gellért Weisz, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, Csaba Szepesvári
We consider local planning in fixed-horizon MDPs with a generative model under the assumption that the optimal value function lies close to the span of a feature map.
no code implementations • 5 Feb 2021 • Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie
We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and $q$-functions when these are estimated using recent minimax methods.
no code implementations • 6 Feb 2021 • Nan Jiang, Xuehui Yu, Xiaoke Peng, Yuqi Gong, Zhenjun Han
Detecting tiny objects ( e. g., less than 20 x 20 pixels) in large-scale images is an important yet open problem.
no code implementations • 14 Feb 2021 • Aditya Modi, Jinglin Chen, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal
In this work, we present the first model-free representation learning algorithms for low rank MDPs.
no code implementations • 2 Mar 2021 • Cameron Voloshin, Nan Jiang, Yisong Yue
We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning.
no code implementations • 2 Jun 2021 • Jiawei Huang, Nan Jiang
In this paper, we study the convergence properties of off-policy policy improvement algorithms with state-action density ratio correction under function approximation setting, where the objective function is formulated as a max-max-min optimization problem.
no code implementations • NeurIPS 2021 • Tengyang Xie, Nan Jiang, Huan Wang, Caiming Xiong, Yu Bai
This offline result is the first that matches the sample complexity lower bound in this setting, and resolves a recent open question in offline RL.
no code implementations • NeurIPS 2021 • Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal
The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning.
no code implementations • 22 Sep 2021 • Yash Nair, Nan Jiang
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes, where the evaluation policy depends only on observable variables but the behavior policy depends on latent states (Tennenholtz et al. (2020a)).
no code implementations • 6 Oct 2021 • Nan Jiang, Chen Luo, Vihan Lakshman, Yesh Dattatreya, Yexiang Xue
In addition, FLAN does not require any annotated data or supervised learning.
no code implementations • 9 Feb 2022 • Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee
Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e. g., Bellman-completeness) and the data coverage (e. g., all-policy concentrability).
no code implementations • ICLR 2022 • Jiawei Huang, Jinglin Chen, Li Zhao, Tao Qin, Nan Jiang, Tie-Yan Liu
Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL).
no code implementations • 16 Feb 2022 • Zhu Wang, Honglong Chen, Zhe Li, Kai Lin, Nan Jiang, Feng Xia
Fortunately, context-aware recommender systems can alleviate the sparsity problem by making use of some auxiliary information, such as the information of both the users and items.
no code implementations • 25 Mar 2022 • Jinglin Chen, Nan Jiang
We consider a challenging theoretical problem in offline reinforcement learning (RL): obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under only realizability-type assumptions for the function approximators.
no code implementations • 19 Apr 2022 • Chuanhong Liu, Caili Guo, Yang Yang, Nan Jiang
To solve the problem, both compression ratio and resource allocation are optimized for the task-oriented communication system to maximize the success probability of tasks.
no code implementations • 16 Jun 2022 • Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford
Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies.
no code implementations • 21 Jun 2022 • Jinglin Chen, Aditya Modi, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal
We study reward-free reinforcement learning (RL) under general non-linear function approximation, and establish sample efficiency and hardness results under various standard structural assumptions.
no code implementations • 18 Jul 2022 • Philip Amortila, Nan Jiang, Dhruv Madeka, Dean P. Foster
Towards establishing the minimal amount of expert queries needed, we show that, in the same setting, any learner whose exploration budget is polynomially-bounded (in terms of $d, H,$ and $|\mathcal{A}|$) will require at least $\tilde\Omega(\sqrt{d})$ oracle calls to recover a policy competing with the expert's value function.
no code implementations • 11 Aug 2022 • Nan Jiang, Dhivya Eswaran, Choon Hui Teo, Yexiang Xue, Yesh Dattatreya, Sujay Sanghavi, Vishy Vishwanathan
We consider text retrieval within dense representational space in real-world settings such as e-commerce search where (a) document popularity and (b) diversity of queries associated with a document have a skewed distribution.
no code implementations • 9 Oct 2022 • Tengyang Xie, Dylan J. Foster, Yu Bai, Nan Jiang, Sham M. Kakade
Coverage conditions -- which assert that the data logging distribution adequately covers the state space -- play a fundamental role in determining the sample complexity of offline reinforcement learning.
no code implementations • 27 Oct 2022 • Audrey Huang, Nan Jiang
Off-policy evaluation often refers to two related tasks: estimating the expected return of a policy and estimating its value function (or other functions of interest, such as density ratios).
no code implementations • 8 Nov 2022 • Tengyang Xie, Mohak Bhardwaj, Nan Jiang, Ching-An Cheng
We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary baseline policy regardless of data coverage.
no code implementations • 7 Dec 2022 • Adnan Aijaz, Nan Jiang, Aftab Khan
The paper articulates some of the key system design aspects of multi-service edge-intelligence.
no code implementations • 19 Dec 2022 • Meiyi Zhu, Chunyan Feng, Caili Guo, Nan Jiang, Osvaldo Simeone
Type-based multiple access (TBMA) is a semantics-aware multiple access protocol for remote inference.
no code implementations • 6 Feb 2023 • Yuheng Zhang, Yu Bai, Nan Jiang
We study offline multi-agent reinforcement learning (RL) in Markov games, where the goal is to learn an approximate equilibrium -- such as Nash equilibrium and (Coarse) Correlated Equilibrium -- from an offline dataset pre-collected from the game.
Multi-agent Reinforcement Learning Reinforcement Learning (RL)
no code implementations • 4 Feb 2023 • Audrey Huang, Jinglin Chen, Nan Jiang
As a central technical challenge, the additive error of occupancy estimation is incompatible with the multiplicative definition of data coverage.
no code implementations • NeurIPS 2023 • Mohak Bhardwaj, Tengyang Xie, Byron Boots, Nan Jiang, Ching-An Cheng
We propose a novel model-based offline Reinforcement Learning (RL) framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage.
no code implementations • 22 May 2023 • Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek Abdelzaher, Heng Ji
As pre-training and fine-tuning are costly and might negatively impact model performance, it is desired to efficiently adapt an existing model to different conditions such as styles, sentiments or narratives, when facing different audiences or scenarios.
1 code implementation • 25 May 2023 • Nan Jiang, Yexiang Xue
CVGP starts by fitting simple expressions involving a small set of independent variables using genetic programming, under controlled experiments where other variables are held as constants.
no code implementations • 25 Jul 2023 • Philip Amortila, Nan Jiang, Csaba Szepesvári
Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation.
no code implementations • 4 Sep 2023 • Pulkit Katdare, Nan Jiang, Katherine Driggs-Campbell
This paper proposes a new approach to evaluate the real-world performance of agent policies prior to deploying them in the real world.
no code implementations • 12 Sep 2023 • Yong Lin, Hangyu Lin, Wei Xiong, Shizhe Diao, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan YAO, Tong Zhang
Building on the analysis and the observation that averaging different layers of the transformer leads to significantly different reward-tax trade-offs, we propose Adaptive Model Averaging (AMA) to adaptively find various combination ratios of model layers.
1 code implementation • 13 Sep 2023 • Nan Jiang, Yexiang Xue
A selection scheme similar to that used in selecting good symbolic equations in the genetic programming process is implemented to ensure that promising experiment schedules eventually win over the average ones.
no code implementations • 16 Sep 2023 • Jinzhao Li, Nan Jiang, Yexiang Xue
Solving SMC is challenging because of its highly intractable nature($\text{NP}^{\text{PP}}$-complete), incorporating statistical inference and symbolic reasoning.
no code implementations • 1 Nov 2023 • Yixin Chen, Junfeng Ni, Nan Jiang, Yaowei Zhang, Yixin Zhu, Siyuan Huang
Reconstructing detailed 3D scenes from single-view images remains a challenging task due to limitations in existing approaches, which primarily focus on geometric shape recovery, overlooking object appearances and fine shape details.
no code implementations • 22 Nov 2023 • Nan Jiang, Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Lin Tan, Xiangyu Zhang
We build Nova$^+$ to further boost Nova using two new pre-training tasks, i. e., optimization generation and optimization level prediction, which are designed to learn binary optimization and align equivalent binaries.
no code implementations • 18 Dec 2023 • Wei Xiong, Hanze Dong, Chenlu Ye, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, Tong Zhang
This includes an iterative version of the Direct Preference Optimization (DPO) algorithm for online settings, and a multi-step rejection sampling strategy for offline scenarios.
no code implementations • 19 Dec 2023 • Nan Jiang, Md Nasim, Yexiang Xue
The first few steps in vertical discovery are significantly cheaper than the horizontal path, as their search is in reduced hypothesis spaces involving a small set of variables.
no code implementations • 18 Jan 2024 • Philip Amortila, Dylan J. Foster, Nan Jiang, Ayush Sekhari, Tengyang Xie
The theories of offline and online reinforcement learning, despite having evolved in parallel, have begun to show signs of the possibility for a unification, with algorithms and analysis techniques for one setting often having natural counterparts in the other.
1 code implementation • 1 Feb 2024 • Nan Jiang, Md Nasim, Yexiang Xue
We propose Vertical Symbolic Regression using Deep Policy Gradient (VSR-DPG) and demonstrate that VSR-DPG can recover ground-truth equations involving multiple input variables, significantly beyond both deep reinforcement learning-based approaches and previous VSR variants.
no code implementations • 11 Feb 2024 • Chenlu Ye, Wei Xiong, Yuheng Zhang, Nan Jiang, Tong Zhang
In this work, we provide theoretical insights for a recently proposed learning paradigm, Nash learning from human feedback (NLHF), which considered a general preference model and formulated the alignment process as a game between two competitive LLMs.
no code implementations • 22 Feb 2024 • Yuheng Zhang, Nan Jiang
We study off-policy evaluation (OPE) in partially observable environments with complex observations, with the goal of developing estimators whose guarantee avoids exponential dependence on the horizon.
no code implementations • 13 Mar 2024 • Nan Jiang, Zhiyuan Zhang, Hongjie Li, Xiaoxuan Ma, Zan Wang, Yixin Chen, Tengyu Liu, Yixin Zhu, Siyuan Huang
Confronting the challenges of data scarcity and advanced motion synthesis in human-scene interaction modeling, we introduce the TRUMANS dataset alongside a novel HSI motion synthesis method.
no code implementations • 8 Apr 2024 • Zhengyang Zhao, Haitao Yuan, Nan Jiang, Minxiao Chen, Ning Liu, Zengxiang Li
Accurate Traffic Prediction is a challenging task in intelligent transportation due to the spatial-temporal aspects of road networks.
no code implementations • 22 Mar 2024 • Nan Jiang, Haitao Yuan, Jianing Si, Minxiao Chen, Shangguang Wang
The next point-of-interest (POI) prediction is a significant task in location-based services, yet its complexity arises from the consolidation of spatial and semantic intent.
no code implementations • 15 Apr 2024 • Nan Jiang
This note clarifies some confusions (and perhaps throws out more) around model-based reinforcement learning and their theoretical understanding in the context of deep RL.