COMP 6611B: Topics on Cloud Computing and Data Analytics Systems [Fall 2016]

This is a tentative reading list subject to changes over weeks.

General Guideline

Paper Reading

S. Keshav, ‘‘How to Read a Paper,’’ ACM SIGCOMM Comput. Commun. Rev., 2007.
M.J. Hanson, D.J. McNamee, ‘‘Efficient Reading of Papers in Science and Technology.’’

Giving a Talk

F. Kschischang, ‘‘Giving a Talk – Guidelines for the Preparation and Presentation of Technical Seminars.’’
B. Li, ‘‘The Art of Presentations.’’
J.L. Doumont, ‘‘Creating effective slides: Design, Construction, and Use in Science.’’

Overview of Cloud Computing and Datacenter Architecture

M. Armbrust et al., ‘‘Above the Clouds: A Berkeley View of Cloud Computing,’’ Tech. Rep. UCB/EECS-2009-28, Feb. 10, 2009.
L.A. Barroso, U. Hölzle, ‘‘The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines,’’ Synthesis Lectures on Computer Architecture, 2009. (Only Chapters 1 and 2)

Data Analytics Frameworks

J. Dean, S. Ghemawat, ‘‘MapReduce: Simplified Data Processing on Large Clusters,’’ USENIX OSDI 2004.
M. Zaharia et al., ‘‘Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing,’’ USENIX NSDI 2012.
Armbrust et al., ‘‘Spark SQL: Relational Data Processing in Spark,’’ ACM SIGMOD 2015.
B. Sahah et al., ‘‘Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications,’’ ACM SIGMOD 2015.
M. Zaharia et al., ‘‘Discretized Streams: Fault-Tolerant Streaming Computation at Scale,’’ ACM SOSP 2013.
Akidau et al., ‘‘The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing,’’ Proc. VLDB Endowment, 8(12):1792–1803, 2015.
Malewicz et al., ‘‘Pregel: A System for Large-Scale Graph Processing,’’ ACM SIGMOD 2010.
Gonzalez et al., ‘‘GraphX: Graph processing in a distributed dataflow framework,’’ USENIX OSDI 2014.

Storage Systems

K. Shvachko et al., ‘‘The Hadoop Distributed File System,’’ IEEE MSST 2010.
A. Lakshman, P. Malik, ‘‘Cassandra: a Decentralized Structured Storage System,’’ ACM SIGOPS Operating Systems Review, 2010.
H. Li et al., ‘‘Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks,’’ ACM SoCC 2014.

Workload Characteristics

A.K. Mishra et al., ‘‘Towards Characterizing Cloud Backend Workloads: Insights from Google Compute Clusters,’’ ACM SIGMETRICS Performance Evaluation Review, 2010.
B. Sharma et al., ‘‘Modeling and Synthesizing Task Placement Constraints in Google Compute Clusters,’’ ACM SoCC 2011.
C. Reiss et al., ‘‘Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis,’’ ACM SoCC 2012.

Cluster Management Systems

B. Hindman, et al., ‘‘Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center,’’ USENIX NSDI 2011.
V.K. Vavilapallih et al., ‘‘Apache Hadoop YARN: Yet Another Resource Negotiator,’’ ACM SoCC 2013.
A. Verma et al., ‘‘Large-scale cluster management at Google with Borg,’’ ACM EuroSys 2015.
Burns et al., ‘‘Borg, Omega, and Kubernetes,’’ ACM Queue, vol. 14 (2016), pp. 70-93.

Resource Management Policies

A. Ghodsi et al., ‘‘Dominant Resource Fairness: Fair Allocation of Multiple Resource Types,’’ USENIX NSDI 2011.
W. Wang et al., ‘‘Multi-Resource Fair Sharing for Datacenter Jobs with Placement Constraints,’’ IEEE/ACM SC 2016.
R. Grandl et al., ‘‘Multi-Resource Packing for Cluster Schedulers,’’ ACM SIGCOMM 2014.
Tumanov et al., ‘‘TetriSched: Global Rescheduling with Adaptive Plan-ahead in Dynamic Heterogeneous Clusters,’’ ACM EuroSys 2016.
R. Grandl et al., ‘‘Altruistic Scheduling in Multi-Resource Clusters,’’ USENIX OSDI 2016.
Ananthanarayanan et al., ‘‘PACMan: Coordinated memory caching for parallel jobs,’’ USENIX NSDI 2012.
Q. Pu et al., ‘‘FairRide: Near-Optimal, Fair Cache Sharing,’’ USENIX NSDI 2016.

Cluster Scheduler Design

K. Ousterhout et al., ‘‘Sparrow: Distributed, Low Latency Scheduling,’’ ACM SOSP 2013.
M. Schwarzkopf et al., ‘‘Omega: flexible, scalable schedulers for large computer clusters,’’ ACM EuroSys 2013.
Rasley et al., ‘‘Efficient Queue Management for Cluster Scheduling,’’ ACM EuroSys 2016.
G. Ananthanarayanan et al., ‘‘Effective Straggler Mitigation: Attack of the Clones,’’ USENIX NSDI 2013.
X. Ren et al., ‘‘Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale,’’ ACM SIGCOMM 2015.

Datacenter Networking

A. Singh et al., ‘‘Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network,’’ ACM SIGCOMM 2015.
L. Popa et al., ‘‘FairCloud: Sharing the Network in Cloud Computing,’’ ACM SIGCOMM 2012.
M. Chowdhury et al., ‘‘Efficient Coflow Scheduling with Varys,’’ ACM SIGCOMM 2014.
M. Chowdhury et al., ‘‘HUG: Multi-Resource Fairness for Correlated and Elastic Demands,’’ USENIX NSDI 2016.