Search Results for author: Austin Xu

Found 8 papers, 2 papers with code

Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators

1 code implementation21 Apr 2025 Yilun Zhou, Austin Xu, Peifeng Wang, Caiming Xiong, Shafiq Joty

Scaling test-time computation, or affording a generator large language model (LLM) extra compute during inference, typically employs the help of external non-generative evaluators (i. e., reward models).

Code Generation Instruction Following +2

Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings

1 code implementation19 Mar 2025 Austin Xu, Srijan Bansal, Yifei Ming, Semih Yavuz, Shafiq Joty

While judge models -- LLMs finetuned to specialize in assessing and critiquing model outputs -- have been touted as general purpose evaluators, they are typically evaluated only on non-contextual scenarios, such as instruction following.

Instruction Following RAG

Direct Judgement Preference Optimization

no code implementations23 Sep 2024 Peifeng Wang, Austin Xu, Yilun Zhou, Caiming Xiong, Shafiq Joty

Auto-evaluation is crucial for assessing response quality and offering feedback for model development.

SFR-RAG: Towards Contextually Faithful LLMs

no code implementations16 Sep 2024 Xuan-Phi Nguyen, Shrey Pandit, Senthil Purushwalkam, Austin Xu, Hailin Chen, Yifei Ming, Zixuan Ke, Silvio Savarese, Caiming Xong, Shafiq Joty

Retrieval Augmented Generation (RAG), a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance, has emerged as a pivotal area in generative AI.

counterfactual Hallucination +3

Large Language Model Augmented Exercise Retrieval for Personalized Language Learning

no code implementations8 Feb 2024 Austin Xu, Will Monroe, Klinton Bicknell

We study the problem of zero-shot exercise retrieval in the context of online language learning, to give learners the ability to explicitly request personalized exercises via natural language.

Information Retrieval Language Modeling +5

Active metric learning and classification using similarity queries

no code implementations4 Feb 2022 Namrata Nadagouda, Austin Xu, Mark A. Davenport

Motivated by this, we propose a novel unified query framework that can be applied to any problem in which a key component is learning a representation of the data that reflects similarity.

Active Learning Classification +3

Simultaneous Preference and Metric Learning from Paired Comparisons

no code implementations NeurIPS 2020 Austin Xu, Mark A. Davenport

The underlying assumption in this model is that a smaller distance between $\mathbf{u}$ and an item $\mathbf{x_j}$ indicates a stronger preference for $\mathbf{x_j}$.

Metric Learning Recommendation Systems

Cannot find the paper you are looking for? You can Submit a new open access paper.