This paper sets the research agenda for 4D project — Decision, Dissemination, Discovery, and Data.
[more]
Zaharia et al (2010) Spark: Cluster Computing with Working Sets (HotCloud)
This paper proposed a new programming model extended from MapReduce. The key feature is the resilient distributed dataset (RDD). RDD is an object in the cluster that can be cached. By having a cache, the author claims a 10x speedup in some use cases (e.g. Logistic Regression) compared to MapReduce,...
[more]
Isard (2007) Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks(EuroSys)
Dryad is an attempt to make parallel programming easier by providing a lower-layer building blocks. It works as a “OS” layer over a “hardware” layer of computation cluster. The application can therefore built upon it.
[more]
Chen et al (2010) Generic and Automatic Address Configuration for Data Center Networks (SIGCOMM)
In a data center of a large number of nodes, a large number of address configuration is needed. In order to avoid manual configuration of these addresses, this paper investigates how can we derive the correct MAC-to-IP address mapping.
[more]
The Elements of Programming Style
Brian W. Kernighan and P. J. Plauger / 1974
As said in the preface, this book is mimicking Stunk and White. And also align with Stunk and White, what this book advocates is to make your program as short as possible, so that it can be clear.
[more]