Strumenti Utente

Strumenti Sito


digitalhealth:0002a

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
digitalhealth:0002a [09/07/2024 alle 12:42 (9 mesi fa)] Paolo Ferraginadigitalhealth:0002a [09/07/2024 alle 12:47 (9 mesi fa)] (versione attuale) – [Current year] Paolo Ferragina
Linea 13: Linea 13:
 ====== Goals ======  ====== Goals ====== 
  
-TBD+The goal of this course is to enrich and strengthen the knowledge about basic algorithms and data structures, learned in undergraduate courses, with further models, methodologies, tools, and techniques to design, analyze, experiment, and improve solutions for managing massive datasets of multimodal types: integer, string, vector, text, time series and (labeled) graph. 
 + 
 +These knowledge and skills constitute the building blocks upon which Big Data platforms and applications are built (e.g., key-value stores, graph DBs, vector DBs, Transformers, …), so this theoretical part will provide students with the “methodological toolbox” for designing and evaluating their big data solutions, and for choosing the right library to adopt for their implementation. 
 + 
 +The theoretical lectures will be complemented by some hands-on experience in coding with relevant programming libraries that offer the above-mentioned building blocks. The course provides most of the algorithmic prerequisites for the other courses of this master’s degree.  
 + 
 + 
 +====== Syllabus ======  
 +  * Key issues about I/Os, streaming, and hierarchical memory in managing massive datasets. 
 +  * Algorithmic and data structure issues concerning atomic versus variable-length items. 
 +  * Two fundamental tools: sorting and permuting. 
 +  * Key-value stores: From basic to advanced hash tables and tries.  
 +  * Textual search engines: compressing and accessing posting lists (integer sequences) and text collections. Deduplicating similar/equal texts, clustering text collections. 
 +  * Bio-Informatics engines: compressing and indexing arbitrary text for substring search, exact or approximate. 
 +  * Vector DBs: nearest neighbor search, hamming, or Euclidean distance.  
 +  * Advanced storage: Streaming and random access to compressed raw files, time series, and (labeled) graphs. 
 + 
 + 
 +Lectures will include in-depth discussions on the practical efficiency of the proposed algorithms and data structures, plus hands-on experience with coding solutions for some topics.
  
  
Linea 24: Linea 42:
  
 ====== Current year ====== ====== Current year ======
-  * [[.ae2024:|Academic Year 2024-2025]]+  * [[.ad2da2024:|Academic Year 2024-2025]]
  
  
 ====== Previous years ====== ====== Previous years ======
  
digitalhealth/0002a.1720528930.txt.gz · Ultima modifica: 09/07/2024 alle 12:42 (9 mesi fa) da Paolo Ferragina

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki