Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

--- digitalhealth:0002a [09/07/2024 alle 12:42 (9 mesi fa)] – Paolo Ferragina
+++ digitalhealth:0002a [09/07/2024 alle 12:47 (9 mesi fa)] (versione attuale) – [Current year] Paolo Ferragina
@@ Linea 13: / Linea 13: @@
 ====== Goals ======
-TBD
+The goal of this course is to enrich and strengthen the knowledge about basic algorithms and data structures, learned in undergraduate courses, with further models, methodologies, tools, and techniques to design, analyze, experiment, and improve solutions for managing massive datasets of multimodal types: integer, string, vector, text, time series and (labeled) graph.
+These knowledge and skills constitute the building blocks upon which Big Data platforms and applications are built (e.g., key-value stores, graph DBs, vector DBs, Transformers, …), so this theoretical part will provide students with the “methodological toolbox” for designing and evaluating their big data solutions, and for choosing the right library to adopt for their implementation.
+The theoretical lectures will be complemented by some hands-on experience in coding with relevant programming libraries that offer the above-mentioned building blocks. The course provides most of the algorithmic prerequisites for the other courses of this master’s degree.
+====== Syllabus ======
+  * Key issues about I/Os, streaming, and hierarchical memory in managing massive datasets.
+  * Algorithmic and data structure issues concerning atomic versus variable-length items.
+  * Two fundamental tools: sorting and permuting.
+  * Key-value stores: From basic to advanced hash tables and tries.
+  * Textual search engines: compressing and accessing posting lists (integer sequences) and text collections. Deduplicating similar/equal texts, clustering text collections.
+  * Bio-Informatics engines: compressing and indexing arbitrary text for substring search, exact or approximate.
+  * Vector DBs: nearest neighbor search, hamming, or Euclidean distance.
+  * Advanced storage: Streaming and random access to compressed raw files, time series, and (labeled) graphs.
+Lectures will include in-depth discussions on the practical efficiency of the proposed algorithms and data structures, plus hands-on experience with coding solutions for some topics.
@@ Linea 24: / Linea 42: @@
 ====== Current year ======
-  * [[.ae2024:|Academic Year 2024-2025]]
+  * [[.ad2da2024:|Academic Year 2024-2025]]
 ====== Previous years ======