With both the F1 and Spanner papers out its now possible to understand their interplay a bit holistically. So lets…
MillWheel: Fault-Tolerant Stream Processing at Internet Scale
MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation…
imGraph: A distributed in-memory graph database
Abstract Diverse applications including cyber security, social networks, protein networks, recommendation systems or citation networks work with inherently graph-structured data.…
Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams
Abstract A geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low…
Phoenix, an implementation of MapReduce for shared-memory systems
Abstract : Phoenix, an implementation of MapReduce for shared-memory systems that includes a programming API and an efï¬cient runtime system.…
Avatara: OLAP for Web-scale Analytics Products
The highlight of this system is a clear separation of the cube computation engine and the query serving engine of…
Ceph: A Scalable, High-Performance Distributed File System
Abstract : We have developed Ceph, a distributed ï¬le system that provides excellent performance, reliability, and scalability. Ceph maximizes the…
SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets
Companies providing cloud-scale services have an increasing need to store and analyze massive data sets such as search logs and…
An Efï¬cient Multi-Tier Tablet Server Storage Architecture
This work presents a new, highly scalable, and efficient TSSL architecture called the General Tablet Server Storage Layer or GTSSL.…
Iterative Map Reduce – Prior Art
There have been several attempts in the recent past at extending Hadoop to support efficient iterative data processing on clusters.…