1 code implementation • 17 Apr 2025 • Negar Arabzadeh, Charles L. A. Clarke
In addition to a traditional comparison based on system rankings using Kendall correlations, we also examine how well LLM judgments align with human preferences, as inferred from relevance grades.
1 code implementation • 16 Apr 2025 • Negar Arabzadeh, Charles L. A . Clarke
In addition to investigating the impact of prompt variations on agreement with human labels, we compare human- and LLM-generated prompts and analyze differences among different LLMs as judges.
1 code implementation • 11 Feb 2025 • Sajad Ebrahimi, Sara Salamat, Negar Arabzadeh, Mahdi Bashari, Ebrahim Bagheri
The peer review process is crucial for ensuring the quality and reliability of scholarly work, yet assigning suitable reviewers remains a significant challenge.
1 code implementation • 20 Dec 2024 • Amin Bigdeli, Negar Arabzadeh, Ebrahim Bagheri, Charles L. A. Clarke
Adversarial attacks seek to manipulate the ranking of documents, with the intention of exposing users to targeted content.
1 code implementation • 22 Oct 2024 • Negar Arabzadeh, Fernando Diaz, Junfeng He
Our proposed offline evaluation metrics for TTI not only capture how relevant generated images are with respect to the user's ideation need but also take into consideration the diversity and arrangement of the set of generated images.
1 code implementation • 12 Jul 2024 • Shrestha Mohanty, Negar Arabzadeh, Andrea Tupini, Yuxuan Sun, Alexey Skrynnik, Artem Zholus, Marc-Alexandre Côté, Julia Kiseleva
Seamless interaction between AI agents and humans using natural language remains a key goal in AI research.
no code implementations • 3 May 2024 • Negar Arabzadeh, Siqing Huo, Nikhil Mehta, Qinqyun Wu, Chi Wang, Ahmed Awadallah, Charles L. A. Clarke, Julia Kiseleva
The rapid development of Large Language Models (LLMs) has led to a surge in applications that facilitate collaboration among multiple agents, assisting humans in their daily tasks.
1 code implementation • 28 Apr 2024 • Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, Maarten de Rijke
RLT is crucial for re-ranking as it can improve re-ranking efficiency by sending variable-length candidate lists to a re-ranker on a per-query basis.
no code implementations • 11 Apr 2024 • Marwah Alaofi, Negar Arabzadeh, Charles L. A. Clarke, Mark Sanderson
We resolve this apparent circularity in two ways: 1) by viewing LLM-based assessment as a form of "slow search", where a slower IR system is used for evaluation and training of a faster production IR system; and 2) by recognizing a continuing need to ground evaluation in human assessment, even if the characteristics of that human assessment must change.
1 code implementation • 5 Apr 2024 • Negar Arabzadeh, Charles L. A. Clarke
Given that Gen-IR systems do not generate responses from a fixed set, we assume that methods for Gen-IR evaluation must largely depend on LLM-generated labels.
1 code implementation • 1 Apr 2024 • Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, Maarten de Rijke
To solve the challenges, we devise an approximation strategy to predict an IR measure considering recall and propose to fine-tune open-source LLMs using human-labeled relevance judgments.
no code implementations • 14 Feb 2024 • Negar Arabzadeh, Julia Kiseleva, Qingyun Wu, Chi Wang, Ahmed Awadallah, Victor Dibia, Adam Fourney, Charles Clarke
The rapid development in the field of Large Language Models (LLMs) has led to a surge in applications that facilitate collaboration among multiple agents to assist humans in their daily tasks.
no code implementations • 31 Jan 2024 • Negar Arabzadeh, Charles L. A. Clarke
The rapid advancement of natural language processing, information retrieval (IR), computer vision, and other technologies has presented significant challenges in evaluating the performance of these systems.
no code implementations • 9 Jan 2024 • Negar Arabzadeh, Amin Bigdeli, Charles L. A. Clarke
In the second, we compare generated answers to the top results retrieved by a diverse set of retrieval models, ranging from traditional approaches to advanced methods, allowing us to measure improvements without human judgments.
no code implementations • 20 Sep 2023 • Siqing Huo, Negar Arabzadeh, Charles L. A. Clarke
After presenting a question to an LLM and receiving a generated answer, we query the corpus with the combination of the question + generated answer.
Generative Question Answering
Open-Domain Question Answering
+1
no code implementations • 23 Jun 2023 • Siqing Huo, Negar Arabzadeh, Charles L. A. Clarke
After presenting a question to an LLM and receiving a generated answer, we query the corpus with the combination of the question + generated answer.
1 code implementation • 18 May 2023 • Chuan Meng, Negar Arabzadeh, Mohammad Aliannejadi, Maarten de Rijke
The QPP task is to predict the retrieval quality of a search system for a query without relevance judgments.
1 code implementation • 18 May 2023 • Shrestha Mohanty, Negar Arabzadeh, Julia Kiseleva, Artem Zholus, Milagro Teruel, Ahmed Awadallah, Yuxuan Sun, Kavya Srinet, Arthur Szlam
Human intelligence's adaptability is remarkable, allowing us to adjust to new tasks and multi-modal environments swiftly.
2 code implementations • 12 Nov 2022 • Shrestha Mohanty, Negar Arabzadeh, Milagro Teruel, Yuxuan Sun, Artem Zholus, Alexey Skrynnik, Mikhail Burtsev, Kavya Srinet, Aleksandr Panov, Arthur Szlam, Marc-Alexandre Côté, Julia Kiseleva
Human intelligence can remarkably adapt quickly to new tasks and environments.
1 code implementation • 1 Nov 2022 • Alexey Skrynnik, Zoya Volovikova, Marc-Alexandre Côté, Anton Voronov, Artem Zholus, Negar Arabzadeh, Shrestha Mohanty, Milagro Teruel, Ahmed Awadallah, Aleksandr Panov, Mikhail Burtsev, Julia Kiseleva
The adoption of pre-trained language models to generate action plans for embodied agents is a promising research strategy.
no code implementations • 9 Aug 2022 • Dahlia Shehata, Negar Arabzadeh, Charles L. A. Clarke
In this work, we propose boosting the performance of sparse retrievers by expanding both the queries and the documents with linked entities in two formats for the entity names: 1) explicit and 2) hashed.
no code implementations • 9 Aug 2022 • Negar Arabzadeh, Mahsa Seifikar, Charles L. A. Clarke
While the research community has paid substantial attention to the problem of predicting query ambiguity in traditional search contexts, researchers have paid relatively little attention to predicting when this ambiguity is sufficient to warrant clarification in the context of conversational systems.
1 code implementation • 27 May 2022 • Julia Kiseleva, Alexey Skrynnik, Artem Zholus, Shrestha Mohanty, Negar Arabzadeh, Marc-Alexandre Côté, Mohammad Aliannejadi, Milagro Teruel, Ziming Li, Mikhail Burtsev, Maartje ter Hoeve, Zoya Volovikova, Aleksandr Panov, Yuxuan Sun, Kavya Srinet, Arthur Szlam, Ahmed Awadallah
Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions.
1 code implementation • 5 May 2022 • Negar Arabzadeh, Ali Ahmadvand, Julia Kiseleva, Yang Liu, Ahmed Hassan Awadallah, Ming Zhong, Milad Shokouhi
The recent increase in the volume of online meetings necessitates automated tools for managing and organizing the material, especially when an attendee has missed the discussion and needs assistance in quickly exploring it.
no code implementations • 2 May 2022 • Ali Ahmadvand, Negar Arabzadeh, Julia Kiseleva, Patricio Figueroa Sanz, Xin Deng, Sujay Jauhar, Michael Gamon, Eugene Agichtein, Ned Friend, Aniruddha
Current interactive systems with natural language interfaces lack the ability to understand a complex information-seeking request which expresses several implicit constraints at once, and there is no prior information about user preferences e. g.,"find hiking trails around San Francisco which are accessible with toddlers and have beautiful scenery in summer", where output is a list of possible suggestions for users to start their exploration.
no code implementations • 22 Sep 2021 • Negar Arabzadeh, Xinyi Yan, Charles L. A. Clarke
These hybrid retrievers leverage low-cost, exact-matching based sparse retrievers along with dense retrievers to bridge the semantic gaps between query and documents.
2 code implementations • 31 Aug 2021 • Negar Arabzadeh, Alexandra Vtyurina, Xinyi Yan, Charles L. A. Clarke
To test this observation, we employed crowdsourced workers to make preference judgments between the top item returned by a modern neural ranking stack and a judged relevant item.