Search Results for author: Puyang Xu

Found 6 papers, 2 papers with code

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

1 code implementation22 May 2025 Zhepei Wei, Wenlin Yao, Yao Liu, Weizhi Zhang, Qin Lu, Liang Qiu, Changlong Yu, Puyang Xu, Chao Zhang, Bing Yin, Hyokun Yun, Lihong Li

While reinforcement learning (RL) has demonstrated remarkable success in enhancing large language models (LLMs), it has primarily focused on single-turn tasks such as solving math problems.

Math Reinforcement Learning (RL)

Improving Tool Retrieval by Leveraging Large Language Models for Query Generation

no code implementations17 Nov 2024 Mohammad Kachuee, Sarthak Ahuja, Vaibhav Kumar, Puyang Xu, Xiaohu Liu

By conducting extensive experiments on a dataset covering complex and multi-tool scenarios, we show that leveraging LLMs for query generation improves the retrieval for in-domain (seen tools) and out-of-domain (unseen tools) settings.

Common Sense Reasoning In-Context Learning +2

Open World Classification with Adaptive Negative Samples

no code implementations9 Mar 2023 Ke Bai, Guoyin Wang, Jiwei Li, Sunghyun Park, Sungjin Lee, Puyang Xu, Ricardo Henao, Lawrence Carin

Open world classification is a task in natural language processing with key practical relevance and impact.

Classification

Ranking-Enhanced Unsupervised Sentence Representation Learning

1 code implementation9 Sep 2022 Yeon Seonwoo, Guoyin Wang, Changmin Seo, Sajal Choudhary, Jiwei Li, Xiang Li, Puyang Xu, Sunghyun Park, Alice Oh

In this work, we show that the semantic meaning of a sentence is also determined by nearest-neighbor sentences that are similar to the input sentence.

Contrastive Learning Data Augmentation +5

An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking

no code implementations ACL 2018 Puyang Xu, Qi Hu

We highlight a practical yet rarely discussed problem in dialogue state tracking (DST), namely handling unknown slot values.

Dialogue State Tracking Spoken Language Understanding

A Model for Temporal Dependencies in Event Streams

no code implementations NeurIPS 2011 Asela Gunawardana, Christopher Meek, Puyang Xu

We introduce the Piecewise-Constant Conditional Intensity Model, a model for learning temporal dependencies in event streams.

model

Cannot find the paper you are looking for? You can Submit a new open access paper.