digitalhealth:ad2da2024:start
Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
Prossima revisione | Revisione precedente | ||
digitalhealth:ad2da2024:start [09/07/2024 alle 12:52 (8 mesi fa)] – creata Paolo Ferragina | digitalhealth:ad2da2024:start [11/09/2024 alle 13:06 (6 mesi fa)] (versione attuale) – [Lectures (TO BE CHANGED SIGNIFICANTLY)] Paolo Ferragina | ||
---|---|---|---|
Linea 1: | Linea 1: | ||
+ | ====== Algorithm and Data Structures for Data-intensive Applications A.A. 2024-2025 ====== | ||
**Teachers**: | **Teachers**: | ||
Linea 8: | Linea 9: | ||
**Language: | **Language: | ||
- | **Question time:** Monday: 15-17, or by appointment (also in video-conferencing through the [[|virtual room]] of the course) | + | **Question time:** Monday: 15-17, or by appointment (also in video-conferencing through the [[https:// |
- | **News:** about this course will be distributed via a [[ | Telegram channel]]. | + | **News:** about this course will be distributed via a [[https:// |
**Official lectures schedule:** The schedule and content of the lectures are available below and with the [[https:// | **Official lectures schedule:** The schedule and content of the lectures are available below and with the [[https:// | ||
Linea 25: | Linea 26: | ||
- | ====== | + | ===== Exam ===== |
+ | |||
+ | The exam will consist of a written test including two parts: **exercises and " | ||
+ | |||
+ | The first (exercises) and the second (theory questions) parts of the exam can be split into different exam dates, even of different exam sessions. The exam dates are the ones indicated in the calendar on ESAMI. In the case that the second part is not passed or the student abandons the exam, (s)he can keep the rank of the first exam, but this may occur just once. The second time this happens, the rank of the first part is dropped, and the student has to do both parts again. | ||
+ | |||
+ | |||
+ | ^ Dates ^ Room ^ Text ^ Notes ^ | ||
+ | | 00/00/2024, start at 00:00 | room 00 | text, results | | | ||
+ | |||
+ | |||
+ | |||
+ | ===== Goals ===== | ||
The goal of this course is to enrich and strengthen the knowledge about basic algorithms and data structures, learned in undergraduate courses, with further models, methodologies, | The goal of this course is to enrich and strengthen the knowledge about basic algorithms and data structures, learned in undergraduate courses, with further models, methodologies, | ||
Linea 34: | Linea 47: | ||
- | ====== Syllabus | + | ===== Syllabus ===== |
* Key issues about I/Os, streaming, and hierarchical memory in managing massive datasets. | * Key issues about I/Os, streaming, and hierarchical memory in managing massive datasets. | ||
* Algorithmic and data structure issues concerning atomic versus variable-length items. | * Algorithmic and data structure issues concerning atomic versus variable-length items. | ||
Linea 48: | Linea 61: | ||
- | ====== Background and Notes of the Course | + | ===== Background and Notes of the Course ===== |
I strongly suggest refreshing your knowledge about basic Algorithms and Data Structures by looking at the well-known book [[https:// | I strongly suggest refreshing your knowledge about basic Algorithms and Data Structures by looking at the well-known book [[https:// | ||
- | The content of the course will be mainly | + | The content of the course will be covered |
- | ====== Exam ====== | + | \\ |
- | The exam will consist of a written test including two parts: **exercises and " | + | ===== Lectures |
- | The first (exercises) and the second (theory questions) parts of the exam can be split into different exam dates, even of different exam sessions. The exam dates are the ones indicated in the calendar on ESAMI. In the case that the second part is not passed or the student abandons the exam, (s)he can keep the rank of the first exam, but this may occur just once. The second time this happens, the rank of the first part is dropped, and the student has to do both parts again. | ||
- | ^ Dates ^ Room ^ Text ^ Notes ^ | ||
- | | 00/00/2024, start at 00:00 | room 00 | text, results | | | ||
+ | ^ Date ^ Lecture ^ Biblio ^ | ||
+ | | | Key issues about I/Os, streaming, and hierarchical memory in managing massive datasets. | | | ||
+ | | | Algorithmic and data structure issues concerning atomic versus variable-length items. | | | ||
+ | | | Two fundamental tools: sorting and permuting. | | | ||
+ | | | Key-value stores: From basic to advanced hash tables and tries. | | | ||
+ | | | Textual search engines: compressing and accessing posting lists (integer sequences) and text collections. Deduplicating similar/ | ||
+ | | | Bio-Informatics engines: compressing and indexing arbitrary text for substring search, exact or approximate. | | | ||
+ | | | Vector DBs: nearest neighbor search, hamming, or Euclidean distance. | | | ||
+ | | | Advanced storage: Streaming and random access to compressed raw files, time series, and (labeled) graphs. | | | ||
- | ====== Lectures (TO BE CHANGED SIGNIFICANTLY) ====== | ||
- | |||
- | The lectures below include also some **EXTRA** material, which is suggested to be read for students aiming to high rankings. | ||
- | |||
- | ^ Date ^ Lecture ^ Biblio ^ | ||
- | | | Introduction to the course. The definition of Algorithm and time/space complexity. Models of computation: | ||
- | | | //Students are warmly invited to read Chapter 2 of the book to refresh their notions about time complexity evaluation// | ||
- | | | Another example of I/ | ||
- | | | Permuting algorithm via sorting and scanning, and its I/O-bound. Binary merge-sort, and its I/O-bounds. Snow Plow, with complexity proof and an example. | Chap. 05 of the book | | ||
- | | | //Students are warmly invited to refresh their know-how about:// | ||
- | | | Multi-way mergesort: algorithm and I/ | ||
- | | | Quicksort: recap on best-case, worst-case. Quicksort: Average-case with analysis. Selection of kth ranked item in linear average time (with proof). 3-way partition for better in-memory quicksort. Bounded Quicksort. Multi-way Quicksort: definition, design, and I/ | ||
- | | | Selection of k-1 "good pivot" via Oversampling. Proof of the expected time complexity of Multi-way Quicksort. Random sampling: disk model, known length (algorithms and proofs). Random sampling on the streaming model: known length. | Chap. 05 and Chap. 03 of the book | | ||
- | | | Reservoir sampling: Algorithm and proofs. Exercises on Random Sampling. Randomized data structures: Treaps with their query (search a key, 3-sided range query). | Chap. 03 of the book. [[http:// | ||
- | | | More on Treaps: update (insert, delete, split, merge) operations | ||
- | | | Randomized data structures: Skip lists (with proofs and comments on time-space complexity and I/ | ||
- | | | LSD-radix sort with proof of time complexity and correctness. MSD-radix sort and the trie data structure. | ||
- | | | Multi-key Quicksort: algorithm and analysis. Ternary search tree. Exercises. | | | ||
- | | | Interpolation Search. Fast set intersection, | ||
- | | | //Students are warmly invited to refresh their know-how about:// hash functions and their properties; hashing with chaining. | Lectures 7 of [[http:// | ||
- | | | Fast set intersection: | ||
- | | | Universal hashing (definition and properties). Three examples of Universal Hash functions, one with correctness proof. The "power of two choices" | ||
- | | | Cuckoo hashing (with all proofs). | Don't read Perfect Hash (hence no sect. 8.4). | | ||
- | | | Minimal ordered perfect hashing: definition, properties, construction, | ||
- | | | Bloom Filter: properties, construction, | ||
- | | | Prefix search: definition of the problem, solution based on arrays, Front-coding, | ||
- | | | Exercises | | | ||
- | | | Exercises. Two-level indexing based on Patricia Trees, with exercises. | | | ||
- | | | Exercises | | | ||
- | | | **MIDTERM exam in room ** | .| | ||
- | | | Substring search: definition, properties, reduction to prefix search. The Suffix Array. Binary searching the Suffix Array: p log n. Suffix Array construction via qsort and its asymptotic analysis.\\ **EXTRA:** Use of the LCP array to achieve p + log n in the substring search (after Lemma 10.1, up to Lemma 10.2) | Chap. 10 of the book: lemma 10.1 and sect. 10.2.3 (but no "The skew algorithm", | ||
- | | | Suffix trees: properties, construction from Suffix Arrays and LCP array. Search and mining algorithms over suffix arrays and suffix trees. Exercises | Sect. 10.4.4 (text mining) | | ||
- | | | Review of the Midterm exam | | | ||
- | | | Prefix-free codes, the notion of entropy. Integer coding: the problem and some considerations. Shannon Theorem and optimal codes. The codes Gamma and Delta, space/time performance, | ||
- | | | Elias-Fano code: definition, properties, and time/space complexity. Definition of Rank/Select operations and comments. Access and NextGEQ operations. Examples on Elias-Fano. | ||
- | | | Variable-byte code and (s,c)-dense code, with examples. Interpolative code. Various exercises on all integer coders. | | | ||
- | | | Rank and Select: definition and succinct solution for Rank. Two theorems: one for a compact solution and one for a compressed solution based on Elias-Fano. \\ **EXTRA:** Solution for Select.| Chap. 15 of the book | | ||
- | | | Succinctly encoding of binary trees, and labeled binary trees, with examples. | | | ||
- | | | Data Compression. Static, semistatic, and dynamic models for frequency estimation. Huffman, with optimality (proof) and relation to Entropy. Canonical Huffman: construction, | ||
- | | | Arithmetic coding: properties, the converter tool. Compression, | ||
- | | | Exercises on Huffman and Arithmetic coding, and on succinct binary trees. | | | ||
- | | | Dictionary-based compressors: | ||
- | | | Compressor bzip: MTF, RLE0, Wheeler-code, | ||
- | | | Exercises | | | ||
- | | | **FINAL TERM exam in room ** | .| |
digitalhealth/ad2da2024/start.1720529564.txt.gz · Ultima modifica: 09/07/2024 alle 12:52 (8 mesi fa) da Paolo Ferragina