Avatara: OLAP for Web-scale Analytics Products

The highlight of this system is a clear separation of the cube computation engine and the query serving engine of this OLAP system. Another newness is the use of a key value store, Voldemort, to fetch the results of queries. Contrary to conventional designs where the cube computation engine and the query serving infra are coupled tightly, Avatara chooses to separate these two aspects.

The offline batch engine is based on Hadoop. A set of MapReduce jobs transform the data into cubes. These cubes are then bulk loaded into the key value store for fast online access. The batch processing pipeline consists of three phases: preprocessing, projections and joins and cubification. Each of these phases is a collection of MapReduce jobs and carried out executed sequentially.

After data is preprocessed in the first phase a star schema is modeled in the second phase. The engine projects and joins the fields from the fact and dimension tables and writes the output to a temp location.

The third phase is when the cubification happens. This phase generates a number of small cubes as MOLAP blobs (multidimensional array) containing dimensions and measures. The storage format is optimized such that the query engine can retrieve this data in a single disk fetch.

When clients issue queries it is intercepted by the online query engine component. The engine reads data from the key value store. The query engine models a SQL like syntax and supports operations such as select, where and group by along with some math operations. Since the system is built to keep query response times very low it disallows joins from being performed online.

The discussion section (3.4) of the paper best summarizes the design trade offs. Worth inlining verbatim –

Avatara provides flexibility in terms of where cube materialization
can happen: operations such as sum, average, order, or limit can be
performed offline or online. With more offline aggregation, online
queries will be faster, but naturally less flexible. For example, after
a developer specifies the granularity of a time dimension to be at
the “week” level in the offline phase, future online aggregations can
only happen at equal or coarser levels such as “weeks” or “months”.
For WVMP, partial materialization happens offline to roll up profile
visits into a weekly aggregation level; the remaining materialization
happens online because the cubes are small. This also enables
us to introduce new query patterns by selecting a different set of
dimensions without recomputing any cubes.

Link to the paper.