LaraDB: A Minimalist Kernel for Linear and Relational Algebra Computation

21 Mar 2017  ·  Dylan Hutchison, Bill Howe, Dan Suciu ·

Analytics tasks manipulate structured data with variants of relational algebra (RA) and quantitative data with variants of linear algebra (LA). The two computational models have overlapping expressiveness, motivating a common programming model that affords unified reasoning and algorithm design. At the logical level we propose Lara, a lean algebra of three operators, that expresses RA and LA as well as relevant optimization rules. We show a series of proofs that position Lara %formal and informal at just the right level of expressiveness for a middleware algebra: more explicit than MapReduce but more general than RA or LA. At the physical level we find that the Lara operators afford efficient implementations using a single primitive that is available in a variety of backend engines: range scans over partitioned sorted maps. To evaluate these ideas, we implemented the Lara operators as range iterators in Apache Accumulo, a popular implementation of Google's BigTable. First we show how Lara expresses a sensor quality control task, and we measure the performance impact of optimizations Lara admits on this task. Second we show that the LaraDB implementation outperforms Accumulo's native MapReduce integration on a core task involving join and aggregation in the form of matrix multiply, especially at smaller scales that are typically a poor fit for scale-out approaches. We find that LaraDB offers a conceptually lean framework for optimizing mixed-abstraction analytics tasks, without giving up fast record-level updates and scans.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper