Please register at http://piazza.com/unipi.it/fall2017/642aa for electronic discussion and other activities.
The advanced nature of this course focuses on developing algorithmic design skills, exposing the students to complex problems that cannot be directly handled by standard libraries (being aware that several basic algorithms and data structures are already covered by the libraries of modern programming languages), thus requiring a significant effort in problem solving. These problems involve all basic data types, such as integers, strings, (geometric) points, trees and graphs as a starting point. The syllabus is structured to highlight the applicative situations in which the corresponding algorithms can be successfully employed, making references to software applications and libraries. The level of detail in each argument can change year-by-year, and will be decided according to requests coming from other courses in the curriculum and/or specific issues arising in, possibly novel, applicative scenarios.
Written exam:
Suggested reading: some useful tips for scientific writing in English (first two sections) by J.S. Vitter.
Example of interaction: student and instructor discussing the report's content and structure.
Oral exam: topics discussed in class, please read the references in the notes.
Caveat: Several topics are the outcomes of recent advancements in the field, and thus the course material mostly consists in research papers or book chapters.
Date | Topics | References and notes |
---|---|---|
28.02.2019 | Introduction on the course and glimpse on the topics | none |
Randomization is a powerful tool to solve large-scale problems. After introducing the concept of randomized algorithms and hashing, we consider some applications, such as data streaming algorithms, a field emerged in the last decade. Here data flow as a stream and one-pass algorithms with limited memory can process it. We focus on the count-min sketch paradigm and its applications. [Note: to refresh the basic notions on counting and probability, please refer to Appendix C in Cormen-Leiserson-Rivest-Stein's book “Introduction to Algorithms”, 3rd ed., MIT Press.]
Date | Topics | References and notes |
---|---|---|
05.03.2019 | Playing with probability. Random indicator variables: secretary problem and random permuting (suggested reading: birthday paradox). Randomized quick sort. | [CLRS 5.1-5.3 (optional 5.4.1), par. 7.3] code |
06.03.2019 | Virus scan and stream analysis with Karp-Rabin fingerprints: randomized checking and pattern matching. Montecarlo and Las Vegas algorithms. | [RM par.7.4-7.6] code |
08.03.2019 | Dictionary of keywords. Quick review of classical hashing. Universal hashing. Markov's inequality. Perfect hashing. | [CLRS 11.2, 11.3.3, CLRS 11.5 ] code |
12.03.2019 | Data Streaming algorithms. Motivations and examples. Count-Min Sketches | sects.1-3, 4.1 Site Notes code |
13.03.2019 | Case study on hashing: rsync and file synchronization using hash functions. | slides |
15.03.2019 | Queries with Count-Min Sketches: implementation and analysis. | sects.3-4 Notes |
19.03.2019 | Case study on hashing: document tagging and perfect hashing. | paper code |
20.03.2019 | Document resemblance with MinHash, k-sketches and the Jaccard similarity index. Azuma-Hoeffding bound. | paper paper Azuma-Hoeffding code |
22.03.2019 | Proxy caches and dictionaries with errors: Bloom filters. | Survey: except par.2.5-2.6 (optional: par.2.2) |
26.03.2019 | Bloom filters (continued). Cuckoo hashing. | Notes Notes code |
27.03.2019 | Case study on data streams (I): probabilistic counting. | slides code |
29.03.2019 | Cuckoo hashing (continued). Distributed server and load balancing through hashing. | blog Sect.7 and 8.1 |
09.04.2019 | Networked data and randomized min-cut algorithm for graphs. | par.1.1 |
10.04.2019 | Case study on data streams (II): set membership and heavy hitters. | slides code |
12.04.2019 | NP-hard problems: download file manager and the knapsack problem. Reduction from Partition to Knapsack (restriction). Dynamic programming algorithms for Knapsack: Case 1: integer weights, complexity O(nW). Case 2: integer values, complexity O(n2vmax). Examples. | PDF code |
17.04.2019 | Case study on approximation for graphs (max cut): single individual haplotypes reconstruction problem (HapCUT) | hapcut.pdf bio.pdf |
30.04.2019 | NP-hard problems: heuristics based on dynamic programming; approximation algorithms. Case study: knapsack problem. | chapt.2: par. 2.1.1 code |
03.05.2019 | NP-hard problems: counting version (#P) based on dynamic programming, uniform random sampling of the feasible solutions. Case study: #knapsack problem. | notes code |
06.05.2019 | Case study on approximation for metric k-center: Clustering and video summarization. | clustering.pdf chapter2.pdf |
07.05.2019 | NP-hard problems: fully polynomial-time randomized approximation schemes (FPRASs). Case study: #knapsack problem. | notes code |
10.05.2019 | General inapproximability results. Case study: travel salesman problem (TSP). 2-approximation algorithms for metric TSP, Local search. Greedy. Case study: max cut for graphs. Non-existence of PTAS. | [CLRS 35.2] Notes |
13.05.2019 | Randomized approximation and derandomization: universal hash functions; conditional expectations. Case study: max-cut for graphs. | sect. 3-4 sect. 1.1 |
14.05.2019 | Case study on bottom-k sketches: approximate similarity searching | 06691730.pdf only Sect.1 bio.pdf |
17.05.2019 | Fixed-parameter tractable (FPT) algorithms. Kernelization. Bounded search tree. Case study: min-vertex cover in graphs. | sect. 2.2.1, 3.1 |
21.05.2019 | Randomized FPT algorithms: color coding and randomized separation. Case study: longest path in graphs and subgraph isomorphism. | sect. 5.2, 5.3 |
22.05.2019 | Case study on graphs: community detection is social networks | sna.pdf nature03607.pdf nature03607-s1.pdf |
24.05.2019 | Fine-grained algorithms. SETH conjecture and conditional lower bounds. Guaranteed heuristics. Case study: diameter in undirected unweighted graphs. | notes sect. 2.3, 2.4, 3, 4 |
28.05.2019 | Approximation in fine-grained algorithms and limitations. | notes |