A key feature of all distributed storage systems is their ability to replicate data not just across machines within a … More
Author: Hari
Apache Hadoop YARN: Yet Another Resource Negotiator
Detailed post to follow… Link to paper
Split Query Processing in Polybase
Abstract This paper presents Polybase, a feature of SQL Server PDW V2 that allows users to manage and query data … More
F1: A Distributed SQL Database That Scales
With both the F1 and Spanner papers out its now possible to understand their interplay a bit holistically. So lets … More
MillWheel: Fault-Tolerant Stream Processing at Internet Scale
MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation … More
imGraph: A distributed in-memory graph database
Abstract Diverse applications including cyber security, social networks, protein networks, recommendation systems or citation networks work with inherently graph-structured data. … More
Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams
Abstract A geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low … More