Although execution of distributed programs on large clusters has now become simpler with frameworks like Hadoop and Dryad the efficient management of resources that are used for these executions is still a challenge. This paper, from Microsoft Research, starts with the observation that about 50% of the files in their experimental 240 node cluster were not accessed at all in the last 250 days. Similarly across 25 clusters they estimated about 7000 hours of repeated computations that could be eliminated.
The solution proposed by Nectar centers on caching the results of computations that happen in the cluster and then transforming user programs to use these cached results. It relies on the services provided by Dryad/DryadLINQ and TidyFS, a distributed file system.
Another system which attempts to improve utilization of resources in a cluster that runs heterogeneous programs based on different execution engines is Mesos.
Nectar as system is interesting not only from a manageability point of view but is also a good example for folks interested in the area of Program Transformation.
Managing data and computation is at the heart of data center computing. Manual management of data can lead to data loss, wasteful consumption of storage, and laborious bookkeeping. Lack of proper management of computation can result in lost opportunities to share common computations across multiple jobs or to compute results incrementally.
Nectar is a system designed to address all the aforementioned problems. Nectar uses a novel approach that automates and uniï¬es the management of data and computation in a data center. With Nectar, the results of a computation, called derived datasets, are uniquely identiï¬ed by the program that computes it, and together with the program are automatically managed by a data center wide caching service. All computations and uses of derived datasets are controlled by the system. The system automatically regenerates a derived dataset from its
program if it is determined missing. Nectar greatly improves data center management and resource utilization: obsolete or infrequently used derived datasets are automatically garbage collected, and shared common computations are computed only once and reused by others.
This paper describes the design and implementation of Nectar, and reports our evaluation of the system using both analysis of actual logs from a number of production clusters and an actual deployment on a 240-node cluster
Previewing from http://research.microsoft.com/pubs/131525/nectar-tr.pdf