Prof. Roberto Grossi
We will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. This course deepens and extends the algorithmic notions of students.
The syllabus is structured to highlight the applicative scenarios in which the studied algorithms and data structures can be successfully applied. The level of detail with which each argument will be dealt with can change year-by-year, and will be decided according to requests coming from other courses and/or specific issues arising in, possibly novel, applicative scenarios.
The advanced nature of this course focuses on developing algorithmic design skills. The students are exposed to complex problems that require a significant effort in problem solving. One “brainstorming” lecture per week is devoted to a full immersion in this activity, and it is highly recommend to attend it. The purpose is not that of teaching/learning further N algorithms, but to refine students' skills. The final written exam will be based on the topics discussed during the “brainstorming” lectures.
We like students that make mistakes in a constructive way, since this is the starting point of our brainstorming to drive their reasoning paths, thus learning by mistakes. We are interested in the route that leads to the algorithmic solution, rather than the solution itself. Regarding mistakes and intelligence, Alan Turing wrote in 1947 that “… if a machine is expected to be infallible, it cannot also be intelligent”. This is a motto for this class, and an invitation to express your own ideas even if they could be apparently wrong.
Please note that several topics are the outcome of recent advancement in algorithms and data structures, and thus most of the course material consists in research papers or book chapters.
Date | Topics | References and notes |
---|---|---|
Sept. 24 | Introduction to the course. Problem solving: ideas and examples. | - |
NP-hard problems are important but difficult to solve and no deterministic polynomial algorithms are currently known for them. Even basic problems, such as finding a simple path of vertices in a graph, are NP-hard. We discuss how to attack these problems (a) by counting and listing their solutions where the cost is proportional to the output (it can be exponential in the worst case but we only pay for what we get) and (b) by approximating their solutions with polynomial-time algorithms when the problem requires to minimize a cost or maximize a benefit. Most of the addressed problems are on graphs, which are popular representations for modern networked data.
Date | Topics | References and notes |
---|---|---|
Sept. 25 | Counting and listing all triangles in a graph. | Section 10.6 |
Sept. 26 | Listing all maximal cliques in a graph. | wikipedia paper,sect.3 |
Oct. 1 | All-pairs shortest paths. Listing all paths. | chap.25, par.25.1, 25.3 |
Oct. 2 | Problem solving. Listing all complete graphs K_4 and s-t paths. | list of problems |
Oct. 3 | Feedback vertex/edge set: connection with spanning trees, vertex covers and independent sets; listing all the minimal feedback vertex sets in directed graphs. | sect. 1, 2 |
Oct. 8-10 | Class canceled. Homework: Review notions of polynomial certificate, polynomial reduction/transformation and NP-completeness. | chapt.9: par. 9.1-9.2 (Italian) |
Oct. 15 | Reviewed notions of NP-hardness, Eulerian vs Hamiltonian paths in graphs. Approximation algorithms. Difficulty of approximation for the general Travelling salesman Problem (TSP). | chapt.9: par. 9.3-9.4 (Italian) |
Oct. 16 | Problem solving. Listing all vertex-to-vertex shortest paths. Proving that the metagraph of all the the minimal feedback vertex sets is strongly connected. | list of problems |
Oct. 17 | Polinomial algorithms for the 2-approximation of the metric TSP and of the Minimum Vertex Cover. | chapt.8: par. 8.10-8.11 (Italian) |
Oct. 22 | Knapsack problem: bad example for greedy and its refinement for 2-approximation. Approximation for Min Bin Packing using Next Fit and First Fit Decreasing strategies. | chapt.2: par. 2.1.1, par. 2.2.2 |
Oct. 23 | Problem solving. Linear-time algorithm for finding an Eulerian cycle. 2-approximation for MAX-SAT. | list of problems |
Oct. 24 | Density-approximation for Maximum Independent Set (MIS). | chapt.2: par.2.1.2 |
Oct. 29 | Review: polynomial reductions among problems, NP vs #P. Proof of Cook-Levin theorem that SAT is NP-complete. | wikipedia Th.7.2 |
Oct. 30 | Polynomial-time reductions: SAT to Clique; Clique to Vertex Cover; Partition to Knapsack (easy). Class by prof. Pagli). | SAT2Clique Clique2VC |
Oct. 31 | Problem solving. Polynomial-time 2-approximation for MAX-CUT e MAX-SAT. | list of problems |
Randomization is now a common powerful tool to solve large-scale problems. After introducing the concept of randomized algorithms and hashing, we apply them to solve data streaming problems, a field emerged in the last decade. Here data flow as a stream and one-pass algorithms with limited memory can process it. We focus on the count-min sketch paradigm and its applications.
Date | Topics | References and notes |
---|---|---|
Nov. 12 | Randomized quicksort and two analyses for its expected cost. | one-hour class, par.5.1par. 7.3 |
Nov. 13 | Basic randomized techniques: Indicator random variables. Markov's inequality. K-wise indepent hash functions. | Notes |
Nov. 14 | Data Streaming Model. Motivations and examples. Count-Min Sketches. | sect.1 Site Notes |
Nov. 19 | Class canceled (teacher abroad) | |
Nov. 20 | Count-Min Sketches (continued). | sect.2,3 Site |
Nov. 21 | Chernoff bounds, indicator variables, and applications of count-min sketches. | par.4.1,Th.4.1 (proof is optional) sect.4 Notes |
Massive data sets are one of the key elements in our so-called big data society. In this scenario the design and analysis of algorithms and data structures adopt the external and cache-oblivious models, which take into account the memory hierarchy that provides performance to modern computers. We study basic problems on sorting and searching in these models, comparing the results with the classical RAM model of computation.
Date | Topics | References and notes |
---|---|---|
Nov. 26 | External memory model (EMM). Case study for sequential access versus random access. Description of the model. CPU complexity and I/O complexity. Cost of scanning and sorting. External binary search and binary mergesort. | Introduction, chapt.2 and 3 (set D=1 disks) |
Nov. 27 | Problem solving. Pairwise independent hash functions. | list of problems |
Nov. 28 | K-way distribution sort. Scan-and-sort paradigm and mapReduce/Hadoop. | Introduction, chapt.2 and 3 (set D=1 disks), chapt.5 (set D=1 disks, no randomization, till Sect.5.1.1 included) Slides Slides (thanks to M. Idini) |
Dec. 3 | Lower bounds for sorting and permuting in the external memory model. | Introduction, chapt.2 and 3 (set D=1 disks), chapt.5 (set D=1 disks, no randomization, till Sect.5.1.1 included NotesNotes |
Dec. 4 | Problem solving. K-way merging, permuting. | list of problems |
Dec. 5 | Suffix sorting. DC-3 algorithm for RAM model and EMM. | DC3 algorithm (up to Section3) |
Dec. 10 | Cache-oblivious (CO) model. Ideal cache. Motivations and simple examples: multiple scanning, inverting an array. van Emde Boas (vEB) layout of complete binary (search) trees. | Sect.4.1 Notes |
Dec. 11 | Problem solving. Filling the details of the DC3 suffix sorting. Suffix array searching. Implicit van Emde Boas layout of binary complete search trees. | list of problems |
Dec. 12 | Text indexing and searching: suffix trees and inverted lists. | wikipedia wikipedia pp.124-128,131-135,139-143 |
Dec. 17 | Text compression: inverted lists and Lempel-Ziv algorithms. | sect.3.2-3.3 (up to p.119) sect.2.6 (up to p.81) |
This is the final list of problems.