Haystack was the primary storage storage system designed initially for Facebookâ€™s Photos application. Its been around for almost 7 years now. Through this period it has served well through several optimizations such as reducing the number of disk seeks to read a BLOB to 1 almost always, fault tolerance through replication (replication factor of 3) across geographies and many more such optimizations. While it has served well thus far Facebook too has evolved during this period. As of Feb 2014 it stored about 400 billion photos. Correspondingly the workload on the BLOB store too has changed. Some of the key changes include –
- Types of BOLBs have increased. They now include videos, documents, heap dumps, traces and source code
- There is greater diversity in size, frequency of creates/reads/deletes
- Within the realm of reads they are now observing the long tail of files that are read infrequently. Having a fixed replication factor across files with varying access patterns results in over provisioning
Hence the need to revisit the storage solution. The overarching design strategy for the new system is based on classifying the problem space/BLOB files based on the access pattern as either Hot or Warm. Having thus classified the BLOBs they’ve developed, â€œf4â€ – a storage system that exclusively stores Warm BLOBs at a lower effective replication factor and still offers scalability, fault tolerance and other goodness.
The main highlight of this system is the innovative use of distributed erasure encoding techniques to achieve both storage efficiency and fault tolerance.
Link to paper