Indice

Year 2014-2015

Prof. Roberto Grossi

Announcements

Overview

We will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. This course deepens and extends the algorithmic notions of students.

The syllabus is structured to highlight the applicative scenarios in which the studied algorithms and data structures can be successfully applied. The level of detail with which each argument will be dealt with can change year-by-year, and will be decided according to requests coming from other courses and/or specific issues arising in, possibly novel, applicative scenarios.

The advanced nature of this course focuses on developing algorithmic design skills. The students are exposed to complex problems that require a significant effort in problem solving. One “brainstorming” lecture per week is devoted to a full immersion in this activity, and it is highly recommend to attend it. The purpose is not that of teaching/learning further N algorithms, but to refine students' skills. The final written exam will be based on the topics discussed during the “brainstorming” lectures.

We like students that make mistakes in a constructive way, since this is the starting point of our brainstorming to drive their reasoning paths, thus learning by mistakes. We are interested in the route that leads to the algorithmic solution, rather than the solution itself. Regarding mistakes and intelligence, Alan Turing wrote in 1947 that “… if a machine is expected to be infallible, it cannot also be intelligent”. This is a motto for this class, with a kind invitation to express your own ideas even if they could be apparently wrong.

Topics

Please note that several topics are the outcome of recent advancement in algorithms and data structures, and thus most of the course material consists in research papers or book chapters.

Date Topics References and notes
Sept. 23 Introduction to the course. Problem solving: ideas and examples. -

Algorithms for massive data sets and hierarchical memories

Massive data sets are one of the key elements in our so-called big data society. In this scenario the design and analysis of algorithms and data structures adopt the external and cache-oblivious models, which take into account the memory hierarchy that provides performance to modern computers. We study basic problems on sorting and searching in these models, comparing the results with the classical RAM model of computation.

Date Topics References and notes
Sept. 24 External memory model (EMM). Case study for sequential access versus random access. Description of the model. CPU complexity and I/O complexity. K-way mergesort. Introduction, chapt.2 and 3 (set D=1 disks) notes
Sept. 25Cost of scanning, sorting and permuting. Scan-and-sort paradigm. Lower bounds for sorting in the external memory model. Introduction, chapt.2 and 3 (set D=1 disks), chapt.5 (set D=1 disks, no randomization, till Sect.5.1.1 included)NotesNotes
Sept. 30 Problem solving. External memory K-way merge and mergesort. External memory permuting. list of problems
Oct. 1 Scan-and-sort paradigm: map-reduce. Lower and upper bounds for static searching in external-memory. paper slides slides sect.11.1
Oct. 2 Selection of splitters for the distribution sort. Notes wikipedia
Oct. 7 Problem solving. Implicit static search in external memory. External memory distribution sort. list of problems
Oct. 8 Dynamic searching in external memory: (a,b)-trees and B-trees. Definition, search and insert operations. sect.1 and 2
Oct. 9 Dynamic searching in external memory: (a,b)-trees and B-trees. Deletion, construction, and I/O analysis. sect.2
Oct. 14 Problem solving. Number of split/fuse operations for $(a,b)$-trees. 1-D range queries. list of problems
Oct. 15 Suffix sorting and suffix array. Relation with standard sorting. DC-3 algorithm for RAM model. DC3 algorithm (up to Section3)
Oct. 16 Suffix sorting and suffix array. DC-3 algorithm for RAM model (continued). DC3 algorithm (up to Section3)
Oct. 21 Problem solving. Filling the details of the DC3 suffix sorting. Suffix array searching. list of problems
Oct. 22 Cache-oblivious (CO) model. van Emde Boas (vEB) layout of complete binary (search) trees. Sect.4.1 Notes
Oct. 23 Text indexing and searching: suffix trees. wikipedia
Oct. 28 Problem solving. Computing the LCP and suffix array construction. Navigating in the implicit vEB layout. list of problems

Randomization, hashing and data streaming

Randomization is now a common powerful tool to solve large-scale problems. After introducing the concept of randomized algorithms and hashing, we apply them to solve data streaming problems, a field emerged in the last decade. Here data flow as a stream and one-pass algorithms with limited memory can process it. We focus on the count-min sketch paradigm and its applications.

Date Topics References and notes
Oct. 29 Randomness and Kolmogorov complexity. Randomized quicksort. Notes (first 2 pages) par. 7.3
Oct. 30 Las Vegas and Montecarlo algorithms. Analysis of randomised quick sort. Randomized algorithm for min-cut. par.5.1 par. 7.3
Nov. 11 Problem solving. list of problems
Nov. 12 Data Streaming Model. Motivations and examples. sect.1 Site Notes
Nov. 13 Count-Min Sketches and applications - part 1 sect.2,3 Site
Nov. 18 Problem solving. list of problems
Nov. 19 Count-Min Sketches and applications - part 2 sect.2,3 Site
Nov. 20 Count-Min Sketches and applications - part 3 sect.2,3 Site
Nov. 25 Problem solving. list of problems

Enumeration, hardness and approximation of some combinatorial problems

NP-hard problems are important but difficult to solve and no deterministic polynomial algorithms are currently known for them. Even basic problems, such as finding a simple path of vertices in a graph, are NP-hard. We discuss how to attack these problems (a) by counting and listing their solutions where the cost is proportional to the output (it can be exponential in the worst case but we only pay for what we get) and (b) by approximating their solutions with polynomial-time algorithms when the problem requires to minimize a cost or maximize a benefit. Most of the addressed problems are on graphs, which are popular representations for modern networked data.

Date Topics References and notes
Nov. 26 Complexity classes: P, NP, NPC. Polynomial reductions (prof. Pagli) Turing Machines pp.255-260, 264-276
Nov. 27 Example of reductions: SAT, 3-SAT, NE-3-SAT. Notes pp.282-283
Dec. 2 Problem solving list of problems
Dec. 3 NP-completeness of MAX-CUT. Notes
Dec. 4 Coloring problems. Reduction from 3-coloring of planar maps to 3-SAT. Notes
Dec. 4 Problem solving list of problems
Dec. 9 Notion of r-approximation algorithm. TSP: NP-hardness and difficulty of approximation. 2-approximation for metric TSP. Except Sect.35.1
Dec. 9 Problem solving (4 hours) list of problems
Dec. 10 2-approximation for MAX-CUT: local search, greedy. 2-approximation for minimum vertex cover. Notes Sect.35.1
Dec. 11 Approximation for bin packing and knapsack. chapt.2: par. 2.1.1, par. 2.2.2

Tutoring (prof. Giuseppe Prencipe)

Date Topics References and notes
Nov. 17 Review (part I) of topics from Chapt. 1, Chapt. 2, Sect. 3.1, Sect. 4.5. Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms. MIT Press.
Nov. 26 Review (part II) of topics from Chapt. 1, Chapt. 2, Sect. 3.1, Sect. 4.5. Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms. MIT Press.
Dec. 3 Review of topics from Chapt. 6, Sect. 7.1, 7.2, Sect. 8.1, Chapt.9. Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms. MIT Press.
Dec. 10 Review of topics from Chapt. 10, Chapt. 11 (except Sect.11.5), Chapt. 12 (execpt Sect. 12.4), Sect. 15.3 and 15.4. Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms. MIT Press.
Excercises discussed in class

This is the final list of problems. Some of the screen snapshots shown in the classroom.

Official class documents
Examination outcomes (in Italian)