=== Old Editions of the course === ** Teaching rooms **:\\ Room L1, Polo Fibonacci, first floor.\\ Room C1, Polo Fibonacci, first floor.\\ Room C40, ISTI-CNR, Door 19/20, first floor\\ ===== Lectures ===== 22/09. Introduction to the course, Logistics. Large-scale problems: status and perspectives.\\ 23/09. Data storage architectures: read-only (search engines), read-write (datastores). Grid computing: elements, definition, architecture, protocols and hourglass model. ([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez1.pdf|slides]])\\ 29/09. Grid computing: security, resource management, information management, data management. Architectural models, real-world Grids, Globus Toolkit. ([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez2.pdf|slides]])\\ 30/09. Cloud computing: definitions, properties, characteristics. Elasticity, dynamic provisioning and autonomic control. ([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez3.1.pdf|slides]])\\ 03/10. Cloud computing: user benefits, provider benefits, economies of scale. Service models: IAAS, PAAS, SAAS. Deployment models. ([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez3.1.pdf|slides]])\\ 07/10. Cloud computing: comparison with Grids. Cloud programming: fault tolerance, service-oriented architectures, decoupling, autoscaling. ([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez3.1.pdf|slides]])\\ 14/10. Cloud computing: design patterns for Cloud applications. Fault tolerance, decoupling, elasticity implementation, data storage, security. Amazon Web Services overview. ([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez3.2.pdf|slides]])\\ 17/10. Map Reduce: design principles, functional programming, basic programming model, exercises ([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez4.1.pdf|slides]])\\ 21/10. Map Reduce: Hadoop installation, configuration.\\ 24/10. Map Reduce: exercises.\\ 28/10. Map Reduce: combiners, partitioners, scheduling, fault tolerances. HDFS and I/O APIs. Exercises. ([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez4.2.pdf|slides]])\\ 11/11. Map Reduce: exercises.\\ 14/11 Map Reduce: algorithmic patterns. State management, matrices, database operations, graphs. ([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez4.3.pdf|slides]])\\ 18/11 a. Map Reduce: exercises.\\ 18/11 b. Map Reduce: exercises.\\ 21/11 Data models, representation and storage: relational, document, graph models. OLTP vs OLAP. LSM Trees and B-Trees. ([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez5.pdf|slides]])\\ 25/11 Data encoding: data flows, backward and forward compatibility, Thrift, Protocol Buffers and Avro encodings. ([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez5.pdf|slides]])\\ 28/11 Data management: scalability, performance and fault tolerance of distributed systems. General replicated architecture. Consistency models: strict, linearizable and sequential consistency.([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez6.pdf|slides]])\\ 02/12 Data replication: passive replication, replication log, active partitioning, quorum systems, write conflicts. ([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez6.pdf|slides]])\\ 05/12 Sospensione della didattica prot. 57140 del 18/11/2016, Università di Pisa.\\ 09/12 Data replication: client-centric consistency models, FLP and CAP theorems.([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez7.pdf|slides]])\\ 12/12 Time: physical clocks and logical clocks. Happens-before relation and systems of logical clocks. Scalar and vector logical clock systems. ([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez8.pdf|slides]])\\ 16/12 Data partitioning: consistent hashing and virtual nodes. ([[http://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformaticanetworking/cpa/aa1617/lez9.pdf|slides]])\\ 16/12 Projects discussion.\\ ===== Bibliography ===== **Grids** **//Study Notes//** - I. Foster, C. Kesselman, “The Grid 2: Blueprint for a New Computing Infrastructure”, Morgan Kaufmann Publishers Inc., 2003. Chapters 4 and 21. - K. Hwang, G. C. Fox, J. Dongarra, “Distributed and Cloud Computing”, Morgan Kaufmann Publishers Inc., 2012. Chapter 7. **//Reading Assignments//** - IBM Redbooks, Introduction to Grid Computing, 2003. [[http://www.redbooks.ibm.com/redbooks/pdfs/sg246778.pdf]] **Clouds** **//Study Notes//** - [[http://www.nist.gov/customcf/get_pdf.cfm?pub_id=909505|NIST Cloud Computing Reference Architecture]] - [[http://arxiv.org/pdf/0901.0131.pdf|Cloud Computing and Grid Computing 360-Degree Compared]] **//Reading Assignments//** - [[http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf|Above the Clouds: A Berkeley View of Cloud Computing]] - [[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.8397&rep=rep1&type=pdf|Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility]] **Map Reduce** **//Study Notes//** - [[http://infolab.stanford.edu/~ullman/mmds/ch2.pdf|Map-Reduce and the New Software Stack]] - [[http://www.umiacs.umd.edu/~jimmylin/MapReduce-book-final.pdf|Data-Intensive Text Processing with Map Reduce]] **//Reading Assignments//** - [[http://research.google.com/archive/mapreduce-osdi04.pdf|MapReduce: Simplified Data Processing on Large Clusters]] - [[http://research.google.com/archive/gfs-sosp2003.pdf|The Google File System]] - [[https://www.usenix.org/system/files/login/articles/105470-Shvachko.pdf|Apapche Hadoop. The scalability update]] - [[https://www.usenix.org/legacy/publications/login/2010-04/openpdfs/shvachko.pdf| HDFS Scalability: the limits to growth]] **Virtualization** - [[http://www.utdallas.edu/~muratk/courses/cloud11f_files/smith-vm-overview.pdf|An Overview of Virtual Machine Architectures]] - [[http://www.vmware.com/pdf/asplos235_adams.pdf|A Comparison of Software and Hardware Techniques for x86 Virtualization]] - [[https://www.usenix.org/legacy/event/usenix05/tech/freenix/full_papers/bellard/bellard.pdf|QEMU, a fast and portable dynamic translator]] ===== External Links ===== - [[http://www.globus.org/|The Globus Toolkit Homepage]] - [[http://hadoop.apache.org/|The Hadoop Homepage]] - [[http://developer.yahoo.com/hadoop/tutorial/index.html|The Yahoo! Hadoop Tutorial Homepage]]