Franco Piccinno: Large Graph Processing
The main topic of this seminar is large graph processing. We will try to understand what are the problems that arise when dealing with large datasets and which are the most used solutions today. Specifically we will introduce the Bulk Synchronous Parallel (BSP) model that allows us to process extremely large graphs by distributing the computation to cluster of machines. We will then move our attention to graph compression. We will survey two interesting graph compression techniques, namely K2-trees and WebGraph. The former approach exploits the sparsity of adjacency matrices to achieve compression through a recursive spatial decomposition. The latter technique exploits, similarly to LZ77 compression family, exploits repetitions in the adjacency list achieving a good compression ration and fast decompression. At the end we will briefly discuss how the WebGraph compression algorithm coupled with HyperLogLog counters was used to obtain the famous “4 degree of separation” in the Facebook graph.