Research in quantitative reasoning within the financial domain indeed necessitates the use of realistic tasks and data, primarily because of the significant impact of decisions made in business and finance.
We conduct an in-depth evaluation of open-source and commercial LLMs, illustrating that BizBench is a challenging benchmark for quantitative reasoning in the finance and business domain.
Recent studies show that sentence-level extractive QA, i. e., based on Answer Sentence Selection (AS2), is outperformed by Generation-based QA (GenQA) models, which generate answers using the top-k answer sentences ranked by AS2 models (a la retrieval-augmented generation style).
In this paper, we propose to train a GenQA model by transferring knowledge from a trained AS2 model, to overcome the aforementioned issue.
Our cross-lingual generative system outperforms answer sentence selection baselines for all 5 languages and monolingual generative pipelines for three out of five languages studied.
Personal attributes represent structured information about a person, such as their hobbies, pets, family, likes and dislikes.
To support the broad range of real machine errors that can be identified by laypeople, the ten error categories of Scarecrow -- such as redundancy, commonsense errors, and incoherence -- are identified through several rounds of crowd annotation experiments without a predefined ontology.
Knowledge graphs capture entities and relations from long documents and can facilitate reasoning in many downstream applications.
Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process, often resulting in uninteresting responses.
We address the task of explaining relationships between two scientific documents using natural language text.
For sequence models with large vocabularies, a majority of network parameters lie in the input and output layers.
Systems were evaluated based on the percentage of correctly answered questions.
We introduce a new representation language to model precise operation programs corresponding to each math problem that aim to improve both the performance and the interpretability of the learned models.
Generating texts which express complex ideas spanning multiple sentences requires a structured representation of their content (document plan), but these representations are prohibitively expensive to manually produce.
Ranked #6 on KG-to-Text Generation on AGENDA
We introduce the Pyramidal Recurrent Unit (PRU), which enables learning representations in high dimensional space with more generalization power and fewer parameters.
We explore contemporary, data-driven techniques for solving math word problems over recent large-scale datasets.
Texts present coherent stories that have a particular theme or overall setting, for example science fiction or western.
This paper formalizes the problem of solving multi-sentence algebraic word problems as that of generating and scoring equation trees.