no code implementations • ACL (dialdoc) 2021 • Xi Chen, Faner Lin, Yeju Zhou, Kaixin Ma, Jonathan Francis, Eric Nyberg, Alessandro Oltramari
In this paper, we describe our systems for solving the two Doc2Dial shared task: knowledge identification and response generation.
1 code implementation • 25 Oct 2024 • Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Hongming Zhang, Tianqing Fang, Zhenzhong Lan, Dong Yu
In this paper, we introduce an open-source framework designed to facilitate the development of multimodal web agent that can autonomously conduct real-world exploration and improve itself.
1 code implementation • 3 Oct 2024 • Siru Ouyang, Wenhao Yu, Kaixin Ma, Zilin Xiao, Zhihan Zhang, Mengzhao Jia, Jiawei Han, Hongming Zhang, Dong Yu
Unlike traditional function-level or file-level coding tasks, AI software engineering requires not only basic coding proficiency but also advanced skills in managing and interacting with code repositories.
1 code implementation • 3 Oct 2024 • Zhaowei Wang, Hongming Zhang, Tianqing Fang, Ye Tian, Yue Yang, Kaixin Ma, Xiaoman Pan, Yangqiu Song, Dong Yu
In this paper, we study a new task of navigating to diverse target objects in a large number of scene types.
1 code implementation • 2 Oct 2024 • Mengzhao Jia, Wenhao Yu, Kaixin Ma, Tianqing Fang, Zhihan Zhang, Siru Ouyang, Hongming Zhang, Meng Jiang, Dong Yu
Tasks involving multiple text-rich images are especially challenging, as they require not only understanding the content of individual images but reasoning about inter-relationships and logical flows across multiple visual inputs.
1 code implementation • 16 Sep 2024 • Hongming Zhang, Xiaoman Pan, Hongwei Wang, Kaixin Ma, Wenhao Yu, Dong Yu
Cognitive Kernel adopts a model-centric design.
1 code implementation • 12 Sep 2024 • Liqiang Jing, Zhehui Huang, Xiaoyang Wang, Wenlin Yao, Wenhao Yu, Kaixin Ma, Hongming Zhang, Xinya Du, Dong Yu
To bridge this gap, we introduce DSBench, a comprehensive benchmark designed to evaluate data science agents with realistic tasks.
1 code implementation • 6 Sep 2024 • Koen Kraaijveld, Yifan Jiang, Kaixin Ma, Filip Ilievski
Then, we develop COLUMBUS, a synthetic benchmark that applies the task pipeline to create QA sets with text and icon rebus puzzles based on publicly available collections of compounds and common phrases.
1 code implementation • 15 Jul 2024 • Anni Zou, Wenhao Yu, Hongming Zhang, Kaixin Ma, Deng Cai, Zhuosheng Zhang, Hai Zhao, Dong Yu
In this paper, we introduce DocBench, a new benchmark designed to evaluate LLM-based document reading systems.
no code implementations • 22 Apr 2024 • Yifan Jiang, Filip Ilievski, Kaixin Ma
In this paper, we split the original benchmark to also support fine-tuning setting and present SemEval Task 9: BRAIN-TEASER(S), the first task at this competition designed to test the system's reasoning and lateral thinking ability.
1 code implementation • 21 Apr 2024 • Yifan Jiang, Jiarui Zhang, Kexuan Sun, Zhivar Sourati, Kian Ahrabian, Kaixin Ma, Filip Ilievski, Jay Pujara
Further analysis of perception questions reveals that MLLMs struggle to comprehend the visual features (near-random performance) and even count the panels in the puzzle ( <45%), hindering their ability for abstract reasoning.
2 code implementations • 25 Jan 2024 • Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, Dong Yu
The rapid advancement of large language models (LLMs) has led to a new era marked by the development of autonomous applications in real-world scenarios, which drives innovation in creating advanced web agents.
3 code implementations • 11 Dec 2023 • Tong Chen, Hongwei Wang, Sihao Chen, Wenhao Yu, Kaixin Ma, Xinran Zhao, Hongming Zhang, Dong Yu
We discover that the retrieval unit choice significantly impacts the performance of both retrieval and downstream tasks.
no code implementations • 15 Nov 2023 • Wenhao Yu, Hongming Zhang, Xiaoman Pan, Kaixin Ma, Hongwei Wang, Dong Yu
In response to these challenges, we introduces Chain-of-Noting (CoN), a novel approach aimed at improving the robustness of RALMs in facing noisy, irrelevant documents and in handling unknown scenarios.
2 code implementations • 8 Oct 2023 • Yifan Jiang, Filip Ilievski, Kaixin Ma, Zhivar Sourati
The success of language models has inspired the NLP community to attend to tasks that require implicit and complex reasoning, relying on human-like commonsense mechanisms.
1 code implementation • 15 Sep 2023 • Kaixin Ma, Hongming Zhang, Hongwei Wang, Xiaoman Pan, Wenhao Yu, Dong Yu
We evaluate our proposed LLM Agent with State-Space ExploRation (LASER) on both the WebShop task and amazon. com.
1 code implementation • 5 Jun 2023 • Jiarui Zhang, Filip Ilievski, Kaixin Ma, Aravinda Kollaa, Jonathan Francis, Alessandro Oltramari
Intelligent Traffic Monitoring (ITMo) technologies hold the potential for improving road safety/security and for enabling smart city infrastructure.
no code implementations • 8 May 2023 • Prateek Chhikara, Jiarui Zhang, Filip Ilievski, Jonathan Francis, Kaixin Ma
We experiment with four models on the 10 tasks in the ScienceWorld text-based game environment, to illustrate the impact of knowledge injection on various model configurations and challenging task settings.
1 code implementation • 4 May 2023 • Kaixin Ma, Hao Cheng, Yu Zhang, Xiaodong Liu, Eric Nyberg, Jianfeng Gao
Our approach outperforms recent self-supervised retrievers in zero-shot evaluations and achieves state-of-the-art fine-tuned retrieval performance on NQ, HotpotQA and OTT-QA.
Ranked #4 on
Question Answering
on HotpotQA
1 code implementation • 26 Apr 2023 • Yifan Jiang, Filip Ilievski, Kaixin Ma
Stories about everyday situations are an essential part of human communication, motivating the need to develop AI agents that can reliably understand these stories.
1 code implementation • 4 Dec 2022 • Jiarui Zhang, Filip Ilievski, Aravinda Kollaa, Jonathan Francis, Kaixin Ma, Alessandro Oltramari
Understanding novel situations in the traffic domain requires an intricate combination of domain-specific and causal commonsense knowledge.
2 code implementations • 22 Oct 2022 • Kaixin Ma, Hao Cheng, Xiaodong Liu, Eric Nyberg, Jianfeng Gao
We propose a novel open-domain question answering (ODQA) framework for answering single/multi-hop questions across heterogeneous knowledge sources.
1 code implementation • COLING 2022 • Kaixin Ma, Filip Ilievski, Jonathan Francis, Eric Nyberg, Alessandro Oltramari
In this paper, we propose Coalescing Global and Local Information (CGLI), a new model that builds entity- and timestep-aware input representations (local input) considering the whole context (global input), and we jointly model the entity states with a structured prediction objective (global output).
no code implementations • 21 May 2022 • Jiarui Zhang, Filip Ilievski, Kaixin Ma, Jonathan Francis, Alessandro Oltramari
In this paper, we study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
no code implementations • 17 Jan 2022 • Alessandro Oltramari, Jonathan Francis, Filip Ilievski, Kaixin Ma, Roshanak Mirzaee
This chapter illustrates how suitable neuro-symbolic models for language understanding can enable domain generalizability and robustness in downstream tasks.
1 code implementation • ACL 2022 • Kaixin Ma, Hao Cheng, Xiaodong Liu, Eric Nyberg, Jianfeng Gao
The retriever-reader framework is popular for open-domain question answering (ODQA) due to its ability to use explicit knowledge.
1 code implementation • EMNLP 2021 • Kaixin Ma, Filip Ilievski, Jonathan Francis, Satoru Ozaki, Eric Nyberg, Alessandro Oltramari
In this paper, we investigate what models learn from commonsense reasoning datasets.
no code implementations • 12 Jan 2021 • Filip Ilievski, Alessandro Oltramari, Kaixin Ma, Bin Zhang, Deborah L. McGuinness, Pedro Szekely
Recently, the focus has been on large text-based sources, which facilitate easier integration with neural (language) models and application to textual tasks, typically at the expense of the semantics of the sources and their harmonization.
no code implementations • 19 Dec 2020 • Yikang Li, Pulkit Goel, Varsha Kuppur Rajendra, Har Simrat Singh, Jonathan Francis, Kaixin Ma, Eric Nyberg, Alessandro Oltramari
Conditional text generation has been a challenging task that is yet to see human-level performance from state-of-the-art models.
1 code implementation • 15 Nov 2020 • Juncheng B Li, Kaixin Ma, Shuhui Qu, Po-Yao Huang, Florian Metze
This work aims to study several key questions related to multimodal learning through the lens of adversarial noises: 1) The trade-off between early/middle/late fusion affecting its robustness and accuracy 2) How do different frequency/time domain features contribute to the robustness?
1 code implementation • 7 Nov 2020 • Kaixin Ma, Filip Ilievski, Jonathan Francis, Yonatan Bisk, Eric Nyberg, Alessandro Oltramari
Guided by a set of hypotheses, the framework studies how to transform various pre-existing knowledge resources into a form that is most effective for pre-training models.
no code implementations • 9 Mar 2020 • Alessandro Oltramari, Jonathan Francis, Cory Henson, Kaixin Ma, Ruwan Wickramarachchi
Computational context understanding refers to an agent's ability to fuse disparate sources of information for decision-making and is, therefore, generally regarded as a prerequisite for sophisticated machine reasoning capabilities, such as in artificial intelligence (AI).
no code implementations • WS 2019 • Hemant Pugaliya, James Route, Kaixin Ma, Yixuan Geng, Eric Nyberg
The field of question answering (QA) has seen rapid growth in new tasks and modeling approaches in recent years.
no code implementations • WS 2019 • Kaixin Ma, Jonathan Francis, Quanyang Lu, Eric Nyberg, Alessandro Oltramari
Non-extractive commonsense QA remains a challenging AI task, as it requires systems to reason about, synthesize, and gather disparate pieces of information, in order to generate responses to queries.
Ranked #16 on
Common Sense Reasoning
on CommonsenseQA
no code implementations • NAACL 2018 • Kaixin Ma, Tomasz Jurczyk, Jinho D. Choi
This paper presents a new corpus and a robust deep learning architecture for a task in reading comprehension, passage completion, on multiparty dialog.