Indice

Algorithm Engineering A.A. 2023-2024

Thanks to P. Sanders Teacher: Paolo Ferragina

CFU: 9 (first semester).

Course ID: 531AA.

Language: English.

Question time: Monday: 15-17, or by appointment (also in video-conferencing through the virtual room of the course)

News: about this course will be distributed via a Telegram channel.

Official lectures schedule: The schedule and content of the lectures are available below and with the official registro.

Goals

In this course, we will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees, and graphs. The design and analysis will involve several models of computation— such as RAM, 2-level memory, cache-oblivious, streaming— in order to take into account the architectural features and the memory hierarchy of modern PCs and the availability of Big Data upon which those algorithms could work on. We will add to such theoretical analysis several other engineering considerations spurring from the implementation of the proposed algorithms and from experiments published in the literature.

Every lecture will follow a problem-driven approach that starts from a real software-design problem, abstracts it in a combinatorial way (suitable for an algorithmic investigation), and then introduces algorithms aimed at minimizing the use of some computational resources like time, space, communication, I/O, energy, etc. Some of these solutions will be discussed also at an experimental level, in order to introduce proper engineering and tuning tools for algorithmic development.

Week Schedule of the Lectures

Week Schedule
Day Time Slot Room
Monday 9:00 - 11:00 Room Fib C
Tuesday 9:00 - 11:00 Room Fib C
Wednesday 11:00 - 13:00 Room Fib C

Exam

The exam will consist of a written test including two parts: exercises and “oral” questions. The exam is passed if in both parts the student gets a sufficient score (expressed in 30), which is then averaged.

The first (exercises) and the second (theory questions) parts of the exam can be split into different exam dates, even of different exam sessions. The exam dates are the ones indicated in the calendar on ESAMI. In the case that the second part is not passed or the student abandons the exam, (s)he can keep the rank of the first exam, but this may occur just once. The second time this happens, the rank of the first part is dropped, and the student has to do both parts again.

Dates Room Text Notes
06/11/2023, start at 14:00 room E Mid term: text, solution, results
12/12/2023, start at 09:30 room C Final term: text, solution, results
15/01/2024, start at 9:00 room C1 Exam: text, solution, results
09/02/2024, start at 00:00 room 00 Exam: text, solution, results Correction and view of the written exam on Wednesday, 14 February, hr: 17:30 (MS Teams).
27/05/2024, start at 11:00 room C text
21/06/2024, start at 11:00 room C text
11/07/2024, start at 11:00 room C text
04/09/2024, start at 09:00 room C text

Background and Notes of the Course

I strongly suggest refreshing your knowledge about basic Algorithms and Data Structures by looking at the well-known book Introduction to Algorithms, Cormen-Leiserson-Rivest-Stein (third edition). Specifically, I suggest you look at the chapters 2, 3, 4, 6, 7, 8, 10, 11 (no perfect hash), 12 (no randomly built), 15 (no optimal BST), 18, 22 (no strongly connected components). Also, you could look at the Video Lectures by Erik Demaine and Charles Leiserson, specifically Lectures 1-7, 9-10, and 15-17.

The content of the course will be mainly covered by the book published by Cambridge University Press as [https://www.cambridge.org/core/books/pearls-of-algorithm-engineering/95061352D7263CCCBD4F243018236EB2|Pearls of Algorithm Engineering].

Lectures

Video-lectures of last year are available at the link and they are linked just for reference, if you wish to re-check something you listened in class. This year, lectures are in presence and the program of the course could be different.

The lectures below include also some EXTRA material, which is suggested to be read for students aiming to high rankings.

Date Lecture Biblio
18/09/2023 Introduction to the course. The definition of Algorithm and time/space complexity. Models of computation: RAM, 2-level memory. An example of algorithm analysis (time, space and I/Os): the sum of n numbers. Chap. 01 of the book.
Read note 1 and note 2 on B-trees.
Students are warmly invited to read Chapter 2 of the book to refresh their notions about time complexity evaluation
19/09/2023 Another example of I/Os-analysis: binary search. The B-tree (or B+-tree) data structure: searching and updating a big set of sorted keys. The role of the Virtual Memory system. RAM algorithm for Permuting. The question: sorting vs permuting, comments on the time and I/O bounds. Chap. 05 of the book.
20/09/2023 Permuting algorithm via sorting and scanning, and its I/O-bound. Binary merge-sort, and its I/O-bounds. Snow Plow, with complexity proof and an example. Chap. 05 of the book
Students are warmly invited to refresh their know-how about: Divide-and-conquer technique for algorithm design and Master Theorem for solving recurrent relations; and Binary Search Trees Lecture 2, 9 and 10 of Demaine-Leiserson's course at MIT
25/09/2023 Multi-way mergesort: algorithm and I/O-analysis. Lower bound for sorting: comparisons and I/Os. The case of D>1 disks: non-optimality of multi-way MergeSort, the disk-striping technique.
EXTRA: Lower bound for permuting (sect 5.2.2).
Chap. 05 of the book
26/09/2023 Quicksort: recap on best-case, worst-case. Quicksort: Average-case with analysis. Selection of kth ranked item in linear average time (with proof). 3-way partition for better in-memory quicksort. Bounded Quicksort. Multi-way Quicksort: definition, design, and I/O-complexity. Chap. 05 of the book
27/09/2023 Selection of k-1 “good pivot” via Oversampling. Proof of the expected time complexity of Multi-way Quicksort. Random sampling: disk model, known length (algorithms and proofs). Random sampling on the streaming model: known length. Chap. 05 and Chap. 03 of the book
02/10/2023 Reservoir sampling: Algorithm and proofs. Exercises on Random Sampling. Randomized data structures: Treaps with their query (search a key, 3-sided range query). Chap. 03 of the book. Notes on Treaps.
03/10/2023 More on Treaps: update (insert, delete, split, merge) operations (with time complexity and proofs). Study Theorems and Lemmas in Treaps' notes.
04/10/2023 Randomized data structures: Skip lists (with proofs and comments on time-space complexity and I/Os). String sorting: comments on the difficulty of the problem on disk, lower bound. See Demaine's lecture num. 12 on skip lists. String sorting in Chap. 07 of the book.
09/10/2023 LSD-radix sort with proof of time complexity and correctness. MSD-radix sort and the trie data structure. Chap. 07 of the book.
10/10/2023 Multi-key Quicksort: algorithm and analysis. Ternary search tree. Exercises.
11/10/2023 Interpolation Search. Fast set intersection, various solutions: scan, sorted merge, binary search, mutual partition. A lower bound on the number of comparisons. Chap. 06 of the book. See sect. 9.3 for interpolation search.
Students are warmly invited to refresh their know-how about: hash functions and their properties; hashing with chaining. Lectures 7 of Demaine-Leiserson's course at MIT
16/10/2023 Fast set intersection: Exponential jumps, and two-level scan. Hashing and dictionary problem: direct addressing, simple hash functions, hashing with chaining. Global rebuilding technique. Chap. 08 of the book.
17/10/2023 Universal hashing (definition and properties). Three examples of Universal Hash functions, one with correctness proof. The “power of two choices” result. All theorems with proof, except Theo 8.3 without proof (only the statement).
18/10/2023 Cuckoo hashing (with all proofs). Don't read Perfect Hash (hence no sect. 8.4).
23/10/2023 Minimal ordered perfect hashing: definition, properties, construction, space, and time complexity. Exercises.
24/10/2023 Bloom Filter: properties, construction, query and insertion operations, error estimation (with proofs). Application of Bloom Filters to DBs and P2P.
EXTRA: Lower bound on BF space
No compressed BF.
25/10/2023 Prefix search: definition of the problem, solution based on arrays, Front-coding, two-level indexing based on compacted tries. Analysis of space, I/Os, and time of all described solutions. Chap. 09 (sect. 9.1 and 9.4) of the book. No Locality Preserving front coding.
30/10/2023 Exercises
31/10/2023 Exercises. Two-level indexing based on Patricia Trees, with exercises.
06/11/2023 Exercises
06/11/2023 MIDTERM exam in room E, 14-16 Please register here by Nov 1st 2023.
07/11/2023 Substring search: definition, properties, reduction to prefix search. The Suffix Array. Binary searching the Suffix Array: p log n. Suffix Array construction via qsort and its asymptotic analysis.
EXTRA: Use of the LCP array to achieve p + log n in the substring search (after Lemma 10.1, up to Lemma 10.2)
Chap. 10 of the book: lemma 10.1 and sect. 10.2.3 (but no “The skew algorithm”, no “The Scan-based algorithm”).
08/11/2023 Suffix trees: properties, construction from Suffix Arrays and LCP array. Search and mining algorithms over suffix arrays and suffix trees. Exercises Sect. 10.4.4 (text mining)
13/11/2023 Review of the Midterm exam
14/11/2023 Prefix-free codes, the notion of entropy. Integer coding: the problem and some considerations. Shannon Theorem and optimal codes. The codes Gamma and Delta, space/time performance, and consideration on optimal distributions. The codes Rice, PForDelta. Chap. 11 of the book.
15/11/2023 Elias-Fano code: definition, properties, and time/space complexity. Definition of Rank/Select operations and comments. Access and NextGEQ operations. Examples on Elias-Fano.
20/11/2023 Variable-byte code and (s,c)-dense code, with examples. Interpolative code. Various exercises on all integer coders.
21/11/2023 Rank and Select: definition and succinct solution for Rank. Two theorems: one for a compact solution and one for a compressed solution based on Elias-Fano.
EXTRA: Solution for Select.
Chap. 15 of the book
22/11/2023 Succinctly encoding of binary trees, and labeled binary trees, with examples.
27/11/2023 Data Compression. Static, semistatic, and dynamic models for frequency estimation. Huffman, with optimality (proof) and relation to Entropy. Canonical Huffman: construction, properties. Huffman decompression. Chap. 12 of the book. No PPM.
27/11/2023 Arithmetic coding: properties, the converter tool. Compression, decompression, and entropy bounds.
28/11/2023 Exercises on Huffman and Arithmetic coding, and on succinct binary trees.
04/12/2023 Dictionary-based compressors: properties and algorithmic structure. LZ77, LZSS, LZ78. Chap. 13 of the book. No LZW. Slides.
05/12/2023 Compressor bzip: MTF, RLE0, Wheeler-code, Burrows-Wheeler Transform. How to construct the BWT via qsort. How to invert the BWT. Chap. 14 of the book, and slides.
06/12/2023 Exercises
11/12/2023 Exercises
12/12/2023 FINAL TERM exam in room C, 9:30-11 Please register in the ESAMI platform, on the exam date of January.