Strumenti Utente

Strumenti Sito


dm:dm.2020-21

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Prossima revisione
Revisione precedente
dm:dm.2020-21 [06/09/2021 alle 09:45 (4 anni fa)] – creata Anna Monrealedm:dm.2020-21 [04/11/2022 alle 12:14 (2 anni fa)] (versione attuale) Salvatore Ruggieri
Linea 1: Linea 1:
-<html> +====== Data Mining A.A2020/21 ======
-<!-- Google Analytics --> +
-<script type="text/javascript" charset="utf-8"> +
-(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ +
-(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), +
-m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) +
-})(window,document,'script','//www.google-analytics.com/analytics.js','ga');+
  
-ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker'true}); +===== DM1 Data MiningFoundations (6 CFU) =====
-ga('personalTracker.require', 'linker')+
-ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it'] ); +
-   +
-ga('personalTracker.require', 'displayfeatures'); +
-ga('personalTracker.send', 'pageview', 'ruggieri/teaching/dm/'); +
-setTimeout("ga('send','event','adjusted bounce rate','30 seconds')",30000);  +
-</script> +
-<!-- End Google Analytics --> +
-<!-- Global site tag (gtag.js) - Google Analytics --> +
-<script async src="https://www.googletagmanager.com/gtag/js?id=G-LPWY0VLB5W"></script> +
-<script> +
-  window.dataLayer window.dataLayer || []; +
-  function gtag(){dataLayer.push(arguments);+
-  gtag('js', new Date());+
  
-  gtag('config', 'G-LPWY0VLB5W'); +Instructors: 
-</script> +  * **Dino Pedreschi** 
-<!-- Global site tag (gtag.js) - Google Analytics --> +    * KDDLab, Università di Pisa 
-<script async src="https://www.googletagmanager.com/gtag/js?id=G-LPWY0VLB5W"></script> +    * [[http://www-kdd.isti.cnr.it]] 
-<script> +    * [[dino.pedreschi@unipi.it]]  
-  window.dataLayer = window.dataLayer || []+
-  function gtag(){dataLayer.push(arguments);+
-  gtag('js', new Date());+
  
-  gtag('config', 'G-LPWY0VLB5W'); +  * **Mirco Nanni** 
-</script> +    * KDDLabISTI - CNRPisa 
-<!-- Capture clicks --> +    [[http://www-kdd.isti.cnr.it]
-<script> +    * [[mirco.nanni@isti.cnr.it]]  
-jQuery(document).ready(function(){ +
-  jQuery('a[href$=".pdf"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send''event' 'DM', 'PDFs', fname); +
-  }); +
-  jQuery('a[href$=".r"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'Rs', fname); +
-  }); +
-  jQuery('a[href$=".zip"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'ZIPs', fname); +
-  }); +
-  jQuery('a[href$=".mp4"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'Videos', fname); +
-  }); +
-  jQuery('a[href$=".flv"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'Videos', fname); +
-  }); +
-}); +
-</script> +
-</html> +
-====== Data Mining (309AA) - 9 CFU ======+
  
-**Instructor:** +Teaching Assistant 
-  * **Anna Monreale**+  * **Salvatore Citraro**
     * KDDLab, Università di Pisa     * KDDLab, Università di Pisa
-    * [[anna.monreale@unipi.it]]    +    * [[http://www-kdd.isti.cnr.it]] 
-**Teaching Assistant:** +    * [[salvatore.citraro@phd.unipi.it]]   
-  * **Francesca Naretto** +===== DM2 - Data Mining: Advanced Topics and Applications (6 CFU) ===== 
-    * KDDLab, SNS, Pisa + 
-    * [[francesca.naretto@sns.it]]  +Instructors
 +  * **Riccardo Guidotti** 
 +    * KDDLab, Università di Pisa 
 +    * [[https://kdd.isti.cnr.it/people/guidotti-riccardo]]    
 +    * [[riccardo.guidotti@di.unipi.it]] 
  
 ====== News ====== ====== News ======
-    * [01.10.2020] ** The lecture on 9.10.2020 will be suppressed. ** +    * **[30.05.2021Agenda link for DM2 is [[https://agende.unipi.it/nhb-ang-wvp|here]].** 
-    * [09.09.2020] The course will be held online, please use this link to join the class: https://teams.microsoft.com/l/team/19%3a8f6779bab74f4368ba7ce1c2b092346d%40thread.tacv2/conversations?groupId=8da15095-b6e5-41c1-a894-d418aed3983e&tenantId=c7456b31-a220-47f5-be52-473828670aa1    * +    * **[20.05.2021] The project must be delivered to [[riccardo.guidotti@unipi.it]] AND [[[email protected]]] with subject "[DM2 Project] Draft 2" or "[DM2 Project] Final"** 
 +    * [20.05.2021] CAT4 answers are available {{ :dm:cat4_2021_solutions.pdf | here}}. 
 +    * [13.05.2021] CAT4 is available {{ :dm:cat4_2021.pdf | here}}. 
 +    * [05.05.2021] CAT3 answers are available {{ :dm:cat3_2021_solutions.pdf | here}}. 
 +    * [28.05.2021] CAT3 is available {{ :dm:cat3_2021.pdf | here}}. 
 +    * [14.04.2021] CAT2 answers are available {{ :dm:cat2_2021_solutions.pdf | here}}. 
 +    * [08.04.2021] CAT2 is available {{ :dm:cat2_2021.pdf | here}}. 
 +    * [06.04.2021] The project must be delivered to [[riccardo.[email protected]]] AND [[[email protected]]] with subject "[DM2 Project] Draft 1" 
 +    [08.03.2021] CAT1 answers are available {{ :dm:cat1_2021_answ.pdf | here}}. 
 +    [01.03.2021] CAT1 is available {{ :dm:cat1_2021.pdf | here}}. 
 +    * [15.02.2021] Groups should be registered [[https://docs.google.com/spreadsheets/d/1RaAocJ2bCjCOYj4R068Rg6OLNNVmIG8YXu9puGIC1OU/edit?usp=sharing|here]] 
 +    * [11.02.2021] The course will be held online on  [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]].  
 +    [11.02.2020] The first lesson will be held on 15/02/2021.
 ====== Learning Goals ====== ====== Learning Goals ======
 +  * DM1
      * Fundamental concepts of data knowledge and discovery.      * Fundamental concepts of data knowledge and discovery.
      * Data understanding      * Data understanding
      * Data preparation      * Data preparation
      * Clustering      * Clustering
-     * Classification & Regression+     * Classification
      * Pattern Mining and Association Rules      * Pattern Mining and Association Rules
 +     * Clustering
 +
 +  * DM2
      * Outlier Detection      * Outlier Detection
 +     * Regression and Forecasting
 +     * Advanced Classification
      * Time Series Analysis      * Time Series Analysis
      * Sequential Pattern Mining      * Sequential Pattern Mining
 +     * Advanced Clustering
 +     * Transactional Clustering
      * Ethical Issues      * Ethical Issues
  
 ====== Hours and Rooms ====== ====== Hours and Rooms ======
 +
 +===== DM1 =====
  
 **Classes** **Classes**
  
 ^  Day of Week  ^  Hour  ^  Room  ^  ^  Day of Week  ^  Hour  ^  Room  ^ 
-|  Wednesday |  09:00 - 10:45  |  Online  |  +|  Monday  |  14:00 - 16:00  |  [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  |  
-|  Thursday  |  09:00 - 10:45  |  Online  |  +|  Wednesday  |  16:00 - 18:00  |  [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  
-|  Friday    |  11:00 12:45   Online  +
  
 +**Office hours - Ricevimento:**
  
 +  * Prof. Pedreschi: Monday 16:00 - 18:00, Online
 +  * Prof. Nanni: appointment by email, Online
  
-**Office hours - Ricevimento:** +   
-Anna MonrealeWednesday11:00-13:00 online using Teams (Appointment by email) +===== DM 2 ===== 
-Francesca NarettoMonday: 15:00-18:00 online using Teams (Appointment by email)+ 
 + 
 +**Classes** 
 + 
 +^  Day of Week  ^  Hour  ^  Room  ^  
 +|  Monday  |  14:00 - 16:00  |  [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  |  
 +|  Wednesday  |  16:00 - 18:00   [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  |   
 + 
 +**Office Hours - Ricevimento:** 
 + 
 +  * Room 268 Dept. of Computer Science 
 +  * Tuesday: 15-17, RoomMS Teams 
 +  * Appointment by email
  
-  
 ====== Learning Material -- Materiale didattico ====== ====== Learning Material -- Materiale didattico ======
  
Linea 108: Linea 100:
   * Pang-Ning Tan, Michael Steinbach, Vipin Kumar. **Introduction to Data Mining**. Addison Wesley, ISBN 0-321-32136-7, 2006   * Pang-Ning Tan, Michael Steinbach, Vipin Kumar. **Introduction to Data Mining**. Addison Wesley, ISBN 0-321-32136-7, 2006
     * [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php]]     * [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php]]
-    * Chapters 4,6 and 8 are also available at the publisher's Web site.+    * I capitoli 4, 6, 8 sono disponibili sul sito del publisher. -- Chapters 4,6 and 8 are also available at the publisher's Web site.
   * Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7   * Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7
   * Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition.   * Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition.
Linea 125: Linea 117:
   * Scikit-learn: python library with tools for data mining and data analysis [[http://scikit-learn.org/stable/ | Documentation page]]   * Scikit-learn: python library with tools for data mining and data analysis [[http://scikit-learn.org/stable/ | Documentation page]]
   * Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. [[http://pandas.pydata.org/ | Documentation page]]   * Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. [[http://pandas.pydata.org/ | Documentation page]]
 +  * [[http://www.knime.org | KNIME ]] The Konstanz Information Miner. [[http://www.knime.org/download-desktop| Download page ]] 
 +  * [[http://www.cs.waikato.ac.nz/ml/weka/ | WEKA ]] Data Mining Software in JAVA. University of Waikato, New Zealand [[http://www.cs.waikato.ac.nz/ml/weka/ | Download page ]] 
 +  * Didactic Data Mining [[http://matlaspisa.isti.cnr.it:5055/| DDM]]
    
 ====== Class Calendar (2020/2021) ====== ====== Class Calendar (2020/2021) ======
  
-===== First Semester  =====+===== First Semester (DM1 - Data Mining: Foundations) =====
  
-^ ^ Day ^ Topic ^ Learning material ^ References +^ ^ Day ^ Room ^ Topic ^ Learning material ^ Instructor 
-|1.|  16.09  09:00-10:45 Overview. Introduction to KDD        | {{ :magistraleinformatica:dmi:1-overview.pdf |}} {{ :magistraleinformatica:dmi:1-intro-dm.pdf |}} | Chap. 1 Kumar Book|  +|1.|  16.09.2020  14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Introduction| {{ :dm:1.dm-overview-corso.pdf | Course Overview}} {{ :dm:2.introduction-short.pdf | Introduction DM}} | Pedreschi 
-|2.|  17.09  09:00-10:45 Data Understanding | {{ :magistraleinformatica:dmi:2-data_understanding.pdf | Slides DU}} |Chap.2 Kumar Book and additioanl resource of Kumar Book:[[https://www-users.cs.umn.edu/~kumar001/dmbook/data_exploration_1st_edition.pdf|Exploring Data]] If you have the first ed. of KUMAR this is the Chap 3 | +|2.|  23.09.2020  16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  Data Understanding {{ :dm:3.dataunderstanding-2019.pdf |Slides DU}} {{ :dm:2-statistica_descrittiva.pdf |Slides on Descriptive Statistics}} | Pedreschi 
-|3.|  18.09  09:00-10:45 | Data Preparation        | {{ :magistraleinformatica:dmi:3-data_preparation.pdf |}} | Chap. 2 Kumar Book |  +|3.|  28.09.2020  14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  Data Understanding |  | Pedreschi 
-|4.|  23.09  09:00-10:45 Data PreparationTransformations PCA         | {{ :magistraleinformatica:dmi:3-data_preparation.pdf |}} Chap. 2 Kumar Book, Appendix B Dimensionality Reduction (only PCA) +|4.|  30.09.2020  16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  | Data Preparation  | {{ :dm:3.dm_ml_data_preparation.pdf | Slides DP}} | Pedreschi 
-|5.|  24.09  09:00-10:45 Data SimilaritiesIntroduction to Clustering.|{{ :magistraleinformatica:dmi:4-data_similarity.pdf |}} {{ :magistraleinformatica:dmi:5-basic_cluster_analysis-intro.pdf |}}       Data Similarity is in Chap. 2 while Clustering is in Chap. 7  +|5.|  05.10.2020  14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  | Lab: Introduction to Python and Knime | {{ :dm:python_basics.ipynb.zip |Python Introduction}}{{ :dm:00_start_with_knime.zip | Knime simple workflow}}  [[https://web.microsoftstream.com/video/97b6fb6f-8909-417c-bc41-2f1e5a9ab00eLecture 5 part 1]], [[https://web.microsoftstream.com/video/2179e7ad-e1b7-48f6-a7cf-dc903a97f8dc|Lecture 5 part 2]]Guidotti, Citraro 
-|6.|  25.09  11:00-12:45 LABData Understanding in Python |  {{ :magistraleinformatica:dmi:python_basics.ipynb.zip | Very basic notions on Python}} {{ :magistraleinformatica:dmi:tips_data_understanding.ipynb.zip |Notebook on Data Understanding}}  {{ :magistraleinformatica:dmi:tipsdata.zip |}}|  +|6.|  07.10.2020  16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  Lab: Data Understanding & Preparation | Dataset: {{ :dm:iris.csv.zip Iris}}{{ :dm:titanic.csv.zip Titanic}}, Knime: {{ :dm:01_data_understanding.zip |}} Python: {{ :dm:titanic_data_understanding2.ipynb.zip |}} [[https://web.microsoftstream.com/video/c328a4e7-40d3-4378-a0c7-d5c24400d59a|Lecture 6 part 1]], [[https://web.microsoftstream.com/video/0a95d217-9cfb-46d6-aab6-e4a7fc65eac6|Lecture 6 part 2]]Guidotti, Citraro 
-|7.|  30.09  09:00-10:45 | Center-based clustering: kmeans{{ :magistraleinformatica:dmi:6-basic_cluster_analysis-kmeans-variants.pdf |}}  Chap. 7 Kumar Book+|7.|  12.10.2020  14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  | Clustering: Intro & K-means | {{ :dm:basic_cluster_analysis-intro-kmeans_2020.pdf |Slides clustering 1}} | Nanni 
-|8.|  01.10  09:00-10:45 Center-based clusteringBisecting K-means, Xmeans, EMSame Slides of the previous lectures Chap. 7 Kumar Book, {{ :magistraleinformatica:dmi:clusteringmixturemodels.pdf Clustering & Mixture Models}} {{ :magistraleinformatica:dmi:xmeans.pdf |}}+|8.|  14.10.2020  16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  | Clustering: Hierarchical methods | {{ :dm:basic_cluster_analysis-hierarchical_2020.pdf |Slides clustering 2}} | Nanni 
-|9.|  02.10  11:00-12:45 | Hierarchical clustering| {{ :magistraleinformatica:dmi:7.basic_cluster_analysis-hierarchical.pdf |}} {{ :magistraleinformatica:dmi:ex._hierarchical-clustering.pdf |}}| Chap7 Kumar Book |  +|9.|  19.10.2020  14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  ClusteringDensity-based methods and exercises | {{ :dm:6.basic_cluster_analysis-dbscan.pdf |Slides clustering 3}}{{ :dm:ex._clustering_2020.pdf |Clustering exercises}} | Nanni 
-|10.|  07.10  09:00-10:45 Density based clustering|{{ :magistraleinformatica:dmi:8.basic_cluster_analysis-dbscan-validity.pdf |}}  Chap. 7 Kumar Book |  +|10.|  21.10.2020  16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  Clustering: Validation methods and exercises | {{ :dm:6.basic_cluster_analysis-validity_2020.pdf |Slides clustering 4}} | Nanni 
-|11.|  08.10  09:00-10:45 Labclustering + Project Assignment | {{ :magistraleinformatica:dmi:py-clustering.zip |}} |  |  +|11.|  26.10.2020  14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  | Lab: Clustering | {{ :dm:knime_clustering.zip Knime }}{{ :dm:python_clustering-iris.zip |Python Iris}} {{ :dm:titanic_clustering.ipynb.zip | Python Titanic}}   Citraro 
-  |  09.10  11:00-12:45 Lecture canceled |  |  |  +|12.|  28.10.2020  16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  | ClassificationIntro and Decision Trees  | {{ :dm:7.chap3_basic_classification-2019.pdf |Slides classification}} | Nanni 
-|12.|  14.10  09:00-10:45 | Classification Problem + Decision trees|  {{ :magistraleinformatica:dmi:9.chap3_basic_classification-2020.pdf |}}|  Chap. 3 Kumar Book |  +| |  02.11.2020  14:00-16:00  No Lecture. Project Week. | | 
-|13.|  15.10  09:00-10:45 Only 30 minutes of Discussion on the project due to connection problems|  |  Chap3 Kumar Book |  +| |  04.11.2020  16:00-18:00  No Lecture. Project Week. | | | 
-|14.|  16.10  11:00-12:45 | Decision Tree + Classifier Evaluation|  |  Chap. 3 Kumar Book |  +|13.|  09.11.2020  14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  | Classification: Decision Trees/2 |  | Nanni 
-|15.|  21.10  09:00-10:45 Evaluation Methods for Classification Models|  {{ :magistraleinformatica:dmi:9.chap3_basic_classification-2020.pdf |}}|  Chap. 3 Kumar Book + Chap. 4 Kumar Book|  +|14.|  11.11.2020  16:00-18:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  ClassificationDecision Trees/|  | Nanni 
-|16.|  22.10  09:00-10:45 | Statistical tool for model evaluation + Rule based classification| {{ :magistraleinformatica:dmi:10-rule-based-clussifiers.pdf |}} |  Chap. 3 Kumar Book +  Chap. 4 Kumar Book|  +|15.|  16.11.2020  14:00-16:00 [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  | Classification: Decision Trees/| {{ :dm:ex-classification.pdf |Sample exercise}}  Nanni 
-|17.|  23.10  11:00-12:45 Rule based classification + Instance-based Classification| {{ :magistraleinformatica:dmi:11-knn.pptx |}} |  Chap4 Kumar Book |  +|16.|  18.11.2020  16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  ClassificationDecision Trees/5 + Exercises | {{ :dm:classificazione_1.pdf |Exercises 1}}{{ :dm:classificazione_2.pdf |Excercises 2}} | Nanni 
-|18.|  28.10  09:00-10:45 |Naive Bayesian Classifier + Ensemble Classifieres | {{ :magistraleinformatica:dmi:12-naive_bayes.pdf |}} {{ :magistraleinformatica:dmi:13_ensemble_2020.pdf |}} |  Chap. 4 Kumar Book |  +|17.|  23.11.2020  14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]]  ClassificationKNN | {{ :dm:classification_knn.pdf |Slides}}, {{ :dm:ex_knn_dm2_exam.2017.10.30.pdf |Exercise 1 (KNN only)}}, {{ :dm:ex_knn_2020.pdf |Exercise 2}} | Nanni 
-|19.|  29.10  09:00-10:45 SVM NN |  {{ :magistraleinformatica:dmi:14_svm_2020.pdf |}} {{ :magistraleinformatica:dmi:15_neural_networks_2020.pdf |}}|  Chap. 4 Kumar Book |  +|18.|  25.11.2020  16:00-18:00 [[https://web.microsoftstream.com/video/15574ad9-650b-413a-818f-d76dea123f80|MS Teams]]  Lab: Clustering |  {{ :dm:knime_classification.zip knime_classification}} {{ :dm:python_classification.zip python_classification}} {{ :dm:python_classification.rar python_classification2}}  | Citraro 
-|20.|  30.10  11:00-12:45 | MLNN & Lab on Classification| {{ :magistraleinformatica:dmi:classification.zip |Nootebook Python for classification}} |  Chap. 4 Kumar Book |  +|19.|  02.12.2020  16:00-18:00 [[ https://web.microsoftstream.com/video/8798715f-6eee-4754-b207-ec382ec08f21 |MS Teams]]  Pattern & Association Rule Mining - Apriori algorithm for frequent itemset mining | {{ :dm:2-dm2-restructured_assoc-2020.pdf |}} | Pedreschi 
-|21.|  04.11  09:00-10:45 Regression Association Rule Mining{{ :magistraleinformatica:dmi:16_linear_regression.pdf |}} {{ :magistraleinformatica:dmi:17_association_analysis.pdf |}}|  Regression: Appendix D in Kumar BOOK Chap.5 Association Rules: Kumar Book|  +|20.|  07.12.2020  14:00-16:00 [[ https://web.microsoftstream.com/video/f043ae04-0d5d-4f18-889e-7b7a84375481 |MS Teams]]  | Pattern & Association Rule Mining - Rule mining and evaluation, Closed and maximal itemsets, Multi-dimensional, Quantitative and Multy-level association rules |  | Pedreschi 
-|22.|  05.11  09:00-10:45 Association Rule Mining| |  Chap.5 Association Rules: Kumar Book|  +|21.|  14.12.2020  14:00-16:00   Lab Pattern Mining  {{ :dm:pattern_knime.zip |knime_pattern}} {{ :dm:pattern_python.zip |python_pattern}} https://anaconda.org/conda-forge/pyfim, http://www.borgelt.net/pyfim.html {{ :dm:ex-frequentpatterns-ar.pdf |}} Citraro | 
-|23.|  06.11  11:00-12:45 Sequential Pattern Mining{{ :magistraleinformatica:dmi:18_sequential_patterns_2020.pdf |}}  Chap.6  Kumar Book|  +===== Second Semester (DM2 - Data Mining: Advanced Topics and Applications) =====
-|24.|  11.11  09:00-10:45 | Ethics in AI & Privacy | {{ :magistraleinformatica:dmi:19_ethics_privacy.pdf |}} | [[https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai|Report in Trustworthy AI]] |  +
-|25.|  12.11  09:00-10:45 Ethics in AI Privacy |  | {{ :dm:allegato1_chapter.pdf Overview on Privacy}}  {{ :magistraleinformatica:dmi:allegato11-cpdp13.pdf |}}{{ :dm:capprivacy.pdf | Privacy by design}} |  +
-|26.|  13.11  11:00-12:45 Ethics in AI Privacy, Explainability | {{ :magistraleinformatica:dmi:20_explainability_2020.pdf |}} | |  +
-|27.|  18.11  09:00-10:45 Explainability | {{ :magistraleinformatica:dmi:20_explainability_2020.pdf |}} | Material: [[https://arxiv.org/pdf/1805.10820.pdf|LORE]] [[https://www.kdd.org/kdd2016/papers/files/rfp0573-ribeiroA.pdf| LIME]]   [[http://delivery.acm.org/10.1145/3240000/3236009/a93-guidotti.pdf?ip=94.38.73.6&id=3236009&acc=OA&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2ED544636226B69D47&__acm__=1576196869_06b3353aae4fe3bd8ea30d9c9c5356eb|Survey]] {{ :magistraleinformatica:dmi:pkdd_2019_abele_cr.pdf |ABELE}}|  +
-|28.|  19.11  09:00-10:45 | Anomaly Detection | {{ :magistraleinformatica:dmi:21_anomaly_detection_2020.pdf |}} | Chap. 9 of Kumar Book|  +
-|29.|  20.11  11:00-12:45 | Anomaly Detection | {{ :magistraleinformatica:dmi:anomalydetection.ipynb.zip |}} | Chap. 9 of Kumar Book |  +
-|30.|  25.11  09:00-10:45 |Time series Siminarity  |  {{ :magistraleinformatica:dmi:22_time_series_similarity.pdf |}}| [[https://cs.gmu.edu/~jessica/BookChapterTSMining.pdf|Overview on DM for time series]], [[https://pdfs.semanticscholar.org/18f3/55d7ef4aa9f82bf5c00f84e46714efa5fd77.pdf|DTW paper by Sakoe and Chiba, 1978]]|  +
-|31.|  26.11  09:00-10:45 |Time series Clustering  |  {{ :magistraleinformatica:dmi:22_time_series_similarity.pdf |}}  | |  +
-|32.|  27.11  11:00-12:45 |Lab on Association Rules and Sequential Pattern Mining  | {{ :magistraleinformatica:dmi:patterns.zip |}} | |  +
-|33.|  02.12  09:00-10:45 Time SeriesMotif Discovery  | {{ :magistraleinformatica:dmi:23_time_series_motif_shapelets.pdf |}} {{ :magistraleinformatica:dmi:randomproj.pdf |}}{{ :magistraleinformatica:dmi:matrixprofile.pdf |}}|  +
-|34.|  03.12  09:00-10:45 Time SeriesShapelets Discovery + ExDTW + Subsequences + Thesis available| {{ :magistraleinformatica:dmi:23_time_series_motif_shapelets.pdf |}} {{ :magistraleinformatica:dmi:ex-dtw-sequences.pdf |}} {{ :magistraleinformatica:dmi:research_topics.pdf |Thesis Proposals}} | {{ :magistraleinformatica:dmi:shaplet.pdf |}} |  +
-  |  04.12  11:00-12:45 | Lecture Canceled  | |  +
-|35.|  09.12  09:00-10:45 Paper Presentation |  | |  +
-|36.|  10.12  09:00-10:45 Paper Presentation |  | |  +
-|37.|  11.12  11:00-12:45 Paper Presentation |  | | +
  
 +^ ^ Day ^ Room ^ Topic ^ Learning material ^ Instructor ^ Recordings ^
 +|1.| 15.02.2021 14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Introduction, CRIPS, KNN | {{ :dm:00_dm2_intro_2021.pdf | Intro}}, {{ :dm:01_dm2_crispdm_2021.pdf | CRISP}}, {{ :dm:02_dm2_knn_2021.pdf | KNN}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Riunione-20210215_141005-Registrazione%20della%20riunione.mp4?web=1| 1stPart]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Riunione-20210215_151106-Registrazione%20della%20riunione.mp4?web=1| 2ndPart]] |
 +|2.| 17.02.2021 16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Performance Evaluation | {{ :dm:03_dm2_performance_evaluation_2021.pdf | Eval}}, {{ :dm:occupancy_data.zip | occupancy_data}}, {{ :dm:01_knn_eval.ipynb.zip | KNN_Eval_Notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%202-20210217_160202-Registrazione%20della%20riunione.mp4?web=1| Dataset]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%202-20210217_161131-Registrazione%20della%20riunione.mp4?web=1| Lecture]] |
 +|3.| 22.02.2021 14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Imbalanced Learning | {{ :dm:04_dm2_imbalanced_learning_2021.pdf | ImbLearn}}, {{ :dm:02_dimensionality_reduction.ipynb.zip | DimRed_notebook}}, {{ :dm:03_perfeval_imbalance.ipynb.zip |ImbLearn_notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Lecture%203-20210222_141416-Registrazione%20della%20riunione.mp4?web=1| 1stPart]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Lecture%203-20210222_151806-Registrazione%20della%20riunione.mp4?web=1| 2ndPart]] |
 +|4.| 23.02.2021 16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Anomaly Detection | {{ :dm:05_dm2_maximum_likelihood_estimation_2021.pdf | MLE}}, {{ :dm:06_dm2_anomaly_detection_2021.pdf | Anomaly Detection}}, {{ :dm:04_outlier_detection.ipynb.zip | Anomaly_notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Lecture%204%20-%20DM2-20210224_160801-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Lecture%204%20-%20DM2-20210224_170545-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] |
 +|5.| 01.03.2021 14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Anomaly Detection | {{ :dm:06_dm2_anomaly_detection_2021.pdf | Anomaly Detection}}, {{ :dm:04_outlier_detection.ipynb.zip | Anomaly_notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%205-20210301_140023-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%205-20210301_144434-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] |
 +|6.| 03.02.2021 16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Anomaly Detection | {{ :dm:06_dm2_anomaly_detection_2021.pdf | Anomaly Detection}}, {{ :dm:04_outlier_detection.ipynb.zip | Anomaly_notebook}}, Extended Isolation Forest [[https://github.com/sahandha/eif|link]] | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%206-20210303_161521-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%206-20210303_171402-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] |
 +|7.| 08.03.2021 14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Naive Bayes Classifier | {{ :dm:07_dm2_naive_bayes_2021.pdf | NBC}}, {{ :dm:05_naive_bayes.ipynb.zip | NBC_notebook}}, {{:dm:nbc_ex1_miro.png?linkonly| Ex1_Miro}}, {{:dm:nbc_ex2_miro.png?linkonly| Ex2_Miro}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%207-20210308_141327-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%207-20210308_151426-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] |
 +| | 10.02.2021 16:00-18:00 | | Lezione sul tema “Da Pisa al Fermilab di Chicago: Viaggio verso un rivoluzionario computer quantistico” della prof.ssa Anna Grassellino | [[https://www.youtube.com/watch?v=NIJ9ko9fAoE|Link]] | Guidotti | |
 +|8.| 15.03.2021 14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Linear and Logistic Regression, Rule-based Classifiers | {{ :dm:08_dm2_linear_logistic_regression_2021.pdf | Regression}}, {{ :dm:09_dm2_rule_based_classifier_2021.pdf | RuleBased}}, {{ :dm:06_linear_logistic_regression.ipynb.zip | Regression_Notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%208-20210315_141301-Registrazione%20della%20riunione.mp4?web=1| 1stPart]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%208-20210315_152032-Registrazione%20della%20riunione.mp4?web=1| 2ndPart]] |
 +|9.| 17.03.2021 16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Rule-based Classifiers, Support Vector Machines | {{ :dm:09_dm2_rule_based_classifier_2021.pdf | RuleBased}}, {{ :dm:07_rule_based_classifiers.ipynb.zip | RuleBased_Notebook}}, {{ :dm:10_dm2_svm_2021.pdf | SVM}}, {{ :dm:08_support_vector_machines.ipynb.zip | SVM_Notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%209-20210317_161210-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%209-20210317_171648-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] |
 +|10.| 22.03.2021 14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | (Nonlinear) Support Vector Machines, Linear Perceptron | {{ :dm:10_dm2_svm_2021.pdf | SVM}}, {{ :dm:08_support_vector_machines.ipynb.zip | SVM_Notebook}}, {{ :dm:11_dm2_perceptron_2021.pdf | Linear Perceptron}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%2010-20210322_141555-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%2010-20210322_150937-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] |
 +|11.| 24.03.2021 16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Neural Networks, Deep Neural Networks | {{ :dm:12_dm2_neural_network_2021.pdf | Neural Network}}, {{ :dm:09_neural_networks.ipynb.zip | NN_Notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%2011-20210324_161454-Registrazione%20della%20riunione.mp4?web=1| 1st Part]],[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%2011-20210324_170537-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]]  |
 +|- | 25.03.2021 15:00-17:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Neural Networks Forward and Backpropagation Example, Case Study Music | {{ :dm:09_neural_network_implementation.ipynb.zip | NN_Implementation}}, {{ :dm:sanremoanalysis_unipi.pdf | Case Study}}| Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Office%20Hours-20210325_151702-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Office%20Hours-20210325_160016-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] |
 +|12.| 29.03.2021 14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Neural Networks (Training Tricks), Ensemble Classifiers | {{ :dm:13_dm2_ensemble_2021.pdf | Ensemble Classifiers}}  | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Riunione%20in%20_Generale_-20210329_141249-Registrazione%20della%20riunione.mp4?web=1 | 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Riunione%20in%20_Generale_-20210329_151828-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] |
 +|13.| 31.03.2021 16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Ensemble Classifiers | {{ :dm:13_dm2_ensemble_2021.pdf | Ensemble Classifiers}}, {{ :dm:10_ensemble.ipynb.zip | Ensemble_Notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%2013-20210331_160739-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20-%20Lecture%2013-20210331_170034-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] |
 +|14.| 12.04.2021 14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Time Series Similarity | {{ :dm:14_dm2_time_series_similarity_2021.pdf | Time Series Similarity}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Riunione%20in%20_Generale_-20210412_141848-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Riunione%20in%20_Generale_-20210412_152505-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] |
 +|15.| 14.04.2021 16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Time Series Similarity, Approximation and Clustering | {{ :dm:14_dm2_time_series_similarity_2021.pdf | Time Series Similarity}}, {{ :dm:15_dm2_time_series_clustering_approximation_2021.pdf | Time Series Approximation and Clustering}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2015-20210414_161711-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2015-20210414_170525-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] |
 +|16.| 19.04.2021 14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Time Series  Motifs | {{ :dm:dm2_ts_dtw_and_similarity.zip | TS_Similarty_Notebook}}, {{ :dm:16_dm2_time_series_matrix_profile_2021.pdf | Time Series  Motifs}}, {{ :dm:ts_datasets.zip | TS Datasets}}, {{ :dm:keras_custom_accuracy_metrics.ipynb.zip | Keras Accuracy}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2016-20210419_141448-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2016-20210419_150019-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] |
 +|17.| 21.04.2021 16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Time Series Classification | {{ :dm:17_dm2_time_series_classification_2021.pdf | Time Series Classification}}, {{ :dm:time_series_transpose_to_plot.zip | TS_Plot}}, {{ :dm:11_time_series_sim_appr_clus.ipynb.zip | TS_Similarty_Notebook (updated)}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2017-20210421_162232-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2017-20210421_170000-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/Office%20Hours-20210422_151322-Registrazione%20della%20riunione.mp4?web=1| Office Hours]] |
 +|18.| 26.04.2021 14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Time Series Classification  | {{ :dm:17_dm2_time_series_classification_2021.pdf | Time Series Classification}}, {{ :dm:12_time_series_shapelet_motif.ipynb.zip | TS_Shapelet_Motif_Notebook}}, {{ :dm:13_time_series_classification.ipynb.zip | TS_classification_Notebook}}, {{ :dm:extracting-ts_dataset-from-mp3_data.ipynb.zip | TS_from_MP3_Notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2018-20210426_141328-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2018-20210426_151443-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2018-20210426_154214-Registrazione%20della%20riunione.mp4?web=1|Tutorial MP3]] |
 +|19.| 28.04.2021 16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Sequential Pattern Mining | {{ :dm:18_dm2_sequential_pattern_mining_2021.pdf |  Sequential Pattern Mining}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2019-20210428_161945-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2019-20210428_171750-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]]|
 +|20.| 03.05.2021 14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Sequential Pattern Mining (Timing Constraints)   | {{ :dm:18_dm2_sequential_pattern_mining_2021.pdf |  Sequential Pattern Mining}}, {{ :dm:14_sequential_pattern_mining.ipynb.zip | SPM_Notebook}}, {{ :dm:feature_extraction_ts.ipynb.zip | TS_extraction_RMS}}, [[https://drive.google.com/file/d/1SXYQnszPXKmyHRbxE8naOfQeUi2oL0qQ/view?usp=sharing|RMSE_TS Dataset]] | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2020-20210503_141330-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2020-20210503_151538-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2020-20210503_152810-Registrazione%20della%20riunione.mp4?web=1|Tutorial RMSE]] |
 +|21.| 05.05.2021 16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Advanced Clustering Methods | {{ :dm:19_dm2_advanced_clustering_algorithms_2021.pdf | Advanced Clustering Methods}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2021-20210505_161617-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2021-20210505_172157-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]]|
 +|22.| 10.05.2021 14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Transactional Clustering Methods  | {{ :dm:20_dm2_transactional_clustering_2021.pdf | Transactional Clustering Methods}}, {{ :dm:15_advanced_clustering.ipynb.zip | ACM_notebooks}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2022-20210510_141348-Registrazione%20della%20riunione.mp4?web=1|Hint Clus TS]][[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2022-20210510_141616-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2022-20210510_151602-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] |
 +|23.| 12.05.2021 16:00-18:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Explainable Artificial Intelligence | {{ :dm:21_dm2_explainability_2021.pdf | XAI}}, {{ :dm:15_advanced_clustering.ipynb.zip | ACM_Notebook}}  | Guidotti | [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2023-20210512_161529-Registrazione%20della%20riunione.mp4?web=1|ACM_Notebook]][[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2023-20210512_163117-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2023-20210512_171248-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]]|
 +|24.| 17.05.2021 14:00-16:00 | [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Explainable Artificial Intelligence  | {{ :dm:21_dm2_explainability_2021.pdf | XAI}}, {{ :dm:16_explainable_artificial_intelligence.ipynb.zip | XAI_Notebook}} | Guidotti |[[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2024-20210517_141734-Registrazione%20della%20riunione.mp4?web=1| 1st Part]], [[https://unipiit.sharepoint.com/sites/td48028/Shared%20Documents/General/Recordings/DM2%20Lecture%2024-20210517_151349-Registrazione%20della%20riunione.mp4?web=1| 2nd Part]] |
  
  
 +====== Exams ======
  
 +===== Exam DM1 ======
  
 +The exam is composed of two parts:
  
 +  * An **oral exam **, that includes: (1) discussing the project report; (2) discussing topics presented during the classes, including the theory and practical exercises. 
  
 +  * A **project** consists in exercises that require the use of data mining tools for analysis of data. Exercises include: data understanding, clustering analysis, frequent pattern mining, and classification (see the guidelines for more details). The project has to be performed by min 3, max 4 people. It has to be performed by using Knime, Python or a combination of them. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The paper must be emailed to [[[email protected]]]. Please, use “[DM1 2020-2021] Project” in the subject. 
 +Tasks of the project:
 +      - ** Data Understanding: ** Explore the dataset with the analytical tools studied and write a concise “data understanding” report describing data semantics, assessing data quality, the distribution of the variables and the pairwise correlations. (see Guidelines for details)
 +      - ** Clustering analysis: ** Explore the dataset using various clustering techniques. Carefully describe your's decisions for each algorithm and which are the advantages provided by the different approaches. (see Guidelines for details)
 +      - ** Classification: ** Explore the dataset using classification trees. Use them to predict the target variable. (see Guidelines for details)
 +      -  ** Association Rules: ** Explore the dataset using frequent pattern mining and association rules extraction. Then use them to predict a variable either for replacing missing values or to predict target variable. (see Guidelines for details)
  
 +  * Project 1
 +      - Dataset: **IBM-HR**
 +      - Assigned: 16/09/2020
 +      - Midterm Deadline: 21/11/2020 (half project required, i.e., data understanding and at least two clustering algorithms)
 +      - Final Deadline: <del>07/01/2021</del> 14/01/2021(complete project required)
 +      - Data: {{ :dm:datasetproject1.zip | here}}
 +      - Description: [[https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset|IBM-HR]]
 +      - (please download the data from {{ :dm:datasetproject1.zip | here}} and not from the link with the description as we are using a different version of the data)
  
 +  * Project 2
 +      - Dataset: **Bank Loan Status**
 +      - Assigned: 15/01/2021
 +      - Deadline: 4 days before the oral exam
 +      - This dataset must be used for all tasks. For the classification task, you have to split the dataset into train and test set and the class to predict is the variable "Loan Status".
 +      - This dataset is valid for all the exam sessions until September.
 +      - Download the dataset {{:dm:credit_2020.zip|Bank Loan Status dataset}} (in CSV format, zipped)
  
 + **Guidelines for the project are [[:dm:start:guidelines|here]].**
 + 
 +===== Exam DM part II (DMA) ======
  
 +** Exam Rules**
 +  * Rules for DM2 exam available {{ :dm:dm2_exam_rules.pdf | here}}.
  
-====== Exams ====== +**Exam Booking Periods** 
-**Mid-term Project **+  3rd Appello: 04/05/2021 00:00 - 29/05/2021 23:59 
 +  4th Appello: 25/05/2021 00:00 19/06/2021 23:59 
 +  5th Appello: 15/06/2021 00:00 - 10/07/2021 23:59
  
-A project consists in data analyses based on the use of data mining tools.  +**Exam Booking Agenda** 
-The project has to be performed by a team of 2/3 students. It has to be performed by using Python. The guidelines require to address specific tasks. Results must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The students must deliver both: paper (single column) and  well commented Python Notebooks.+  * Agenda Link: [[https://agende.unipi.it/nhb-ang-wvp|here]] 
 +  * 3rd Appello: starts 03/06/2021 
 +  * 4th Appello: starts 24/06/2021 
 +  * 5th Appello: starts 15/07/2021 
 +  * Important! if you book in the agenda in data in days between 03/06/2021 and 23/06/2021 you MUST be registered for the 3rd appello, if you book in the agenda in data in days between 24/06/2021 and 14/07/2021 you must be registered for the 4th appello, if you book in the agenda in data in days after 15/07/2021 you must be registered for the 5th appello.
  
-  * First part of the project consists in the **assignments** described here: {{ :magistraleinformatica:dmi:dm-projectdescriptionpart1.pdf | Project Description}} +The link to the agenda for booking a slot for the exam is displayed at the end of the registration
-     * **Dataset:** {{ :magistraleinformatica:dmi:customer_supermarket.csv.zip |}}  +During the exam the camera must remain open and you must be able to share your screenFor the exam could be required the usage of the Miro platform (https://miro.com/app/dashboard/).
-     * **Deadline**: the fist part has to be delivered within  <del>November, 5th 2020.</del> ** November, 12 2020. ** +
-  * Second part of the project consists in the **assignment Task 3** described here: {{ :magistraleinformatica:dmi:project_description.pdf |Updated Project Description}} +
-     * **Deadline**: the second part has to be delivered within  ** January, 4th 2021 **  +
-  * Third part of the project consists in the **assignment Task 4** described here{{ :magistraleinformatica:dmi:dm-project_description.pdf | Final Project Description}} +
-     * **Deadline**:  ** January, 4th 2021 (strict** Prepare a single zip folder containing also the material of the previous submitted task (even if they are already submitted). Note that, in the file of the project description I reported all the detailed instructions for the delivery of all the tasks for the final submission+
  
 +The exam is composed of two parts:
  
-** Project to be delivered during the exam sessions **+  **project**, that consists in employing the methods and algorithms presented during the classes for solving exercises on a given dataset. The project has to be realized by max 3 people. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 30 pages (suggested 25) of text including figures + 1 cover page (minimum font 11, minimum interline 1). The project must be delivered at least 7 days before the oral exam. The project must be delivered to [[[email protected]]] AND [[[email protected]]] with subject "[DM2 Project]"
  
-Students who did not deliver the above project within 4 Jan 2021 need to ask by email a new project to the teacher. +  * An **oral exam**, that includes: (1) discussing topics presented during the classes, including the theory of the parts already covered by the written exam; (2) resolving simple exercises using the Miro platform; (3) discussing the project report with a group presentation;  
  
-** Paper Presentation (OPTIONAL)**+  * **Dataset**: the data is about Music Analysis and can be downloaded here: [[https://github.com/mdeff/fma| github]] (or here [[https://archive.ics.uci.edu/ml/datasets/FMA%3A+A+Dataset+For+Music+Analysis|uci]]) 
 +     Data can be downloaded here [[https://os.unil.cloud.switch.ch/fma/fma_metadata.zip|fma_metadata.zip]] 
 +     * Submission Draft 1: 19/04/2020 23:59 Italian Time (we expect Module 1 and Module 2) 
 +     * Submission Draft 2: 22/05/2020 23:59 Italian Time (we expect Module 3) 
 +     Final Submission: one week before the oral exam.
  
-Students need to present a research paper (made available by the teacher) during the last week of the course. This presentation is OPTIONAL: Students that decide to do the paper presentation can avoid the oral exam with open questions. They only need to present the project (see next point).+** Project Guidelines **
  
-**Oral Exam** +  * **Module 1 - Introduction, Imbalanced Learning and Anomaly Detection** 
-  * **Project presentation** (with slides– 10 minutes: mandatory for all the students +      - Explore and prepare the dataset. You are allowed to take inspiration from the associated GitHub repository and figure out your personal research perspective (from choosing a subset of variables to the class to predict…). You are welcome in creating new variables and performing all the pre-processing steps the dataset needs. 
-  * ** Open questions ** on the entire programoptional only for students opting for paper presentation+      - Define one or more (simple) classification tasks and solve it with Decision Tree and KNN. You decide the target variable. 
-  +      - Identify the top 1% outliersadopt at least three different methods from different families (e.g., density-based, angle-based... ) and compare the results. Deal with the outliers by removing them from the dataset or by treating the anomalous variables as missing values and employing replacement techniques. In this second case, you should check that the outliers are not outliers anymore. Justify your choices in every step. 
-  +      - Analyze the value distribution of the class to predict with respect to point 2; if it is unbalanced leave it as it is, otherwise turn the dataset into an imbalanced version (e.g., 96% - 4%, for binary classification). Then solve the classification task using the Decision Tree or the KNN by adopting various techniques of imbalanced learning
-====== Exam Dates ======+      - Draw your conclusions about the techniques adopted in this analysis.
  
-TBD+  * **Module 2 - Advanced Classification Methods** 
 +      - Solve the classification task defined in Module 1 (or define new ones) with the other classification methods analyzed during the course: Naive Bayes Classifier, Logistic Regression, Rule-based Classifiers, Support Vector Machines, Neural Networks, Ensemble Methods and evaluate each classifier with the techniques presented in Module 1 (accuracy, precision, recall, F1-score, ROC curve). Perform hyper-parameter tuning phases and justify your choices. 
 +      - Besides the numerical evaluation draw your conclusions about the various classifiers, e.g. for Neural Networks: what are the parameter sets or the convergence criteria which avoid overfitting? For Ensemble classifiers how the number of base models impacts the classification performance? For any classifier which is the minimum amount of data required to guarantee an acceptable level of performance? Is this level the same for any classifier? What is revealing the feature importance of Random Forests? 
 +      - Select two continuous attributes, define a regression problem and try to solve it using different techniques reporting various evaluation measures. Plot the two-dimensional dataset. Then generalize to multiple linear regression and observe how the performance varies. 
 + 
 +  * **Module 3 - Time Series Analysis** 
 +      - Select the feature(s) you prefer and use it (them) as a time series. You can use the temporal information provided by the authors’ datasets, but you are also welcome in exploring the .mp3 files to build your own dataset of time series according to your purposes. You should prepare a dataset on which you can run time series clustering; motif/anomaly discovery and classification.  
 +      - On the dataset created, compute clustering based on Euclidean/Manhattan and DTW distances and compare the results. To perform the clustering you can choose among different distance functions and clustering algorithms. Remember that you can reduce the dimensionality through approximation. Analyze the clusters and highlight similarities and differences. 
 +      - Analyze the dataset for finding motifs and/or anomalies. Visualize and discuss them and their relationship with other features. 
 +      - Solve the classification task on the time series dataset(s) and evaluate each result. In particular, you should use shapelet-based classifiers. Analyze the shapelets retrieved and discuss if there are any similarities/differences with motifs and/or shapelets.  
 + 
 +  * **Module 4 - Sequential Patterns and Advanced Clustering**  
 +      - Sequential Pattern Mining: Convert the time series into a discrete format (e.g., by using SAX) and extract the most frequent sequential patterns (of at least length 3/4) using different values of support, then discuss the most interesting sequences. 
 +      - Advanced Clustering: On a dataset already prepared for one of the previous tasks in Module 1 or Module 2, run at least one clustering algorithm presented in the advanced clustering lectures (e.g. X-Means, Bisecting K-Means, OPTICS). Discuss the results that you find analyzing the clusters and reporting external validation measures (e.g SSE, silhouette). 
 +      - Transactional Clustering: By using categorical features, or by turning a dataset with continuous variables into a dataset with categorical variables (e.g. by using binning), run at least one clustering algorithm presented in the transactional clustering lectures (e.g. K-Modes, ROCK). Discuss the results that you find analyzing the clusters and reporting external validation measures (e.g SSE, silhouette). 
 + 
 +  * **Module 5 - Explainability (optional)**  
 +      - Try to use one or more explanation methods (e.g., LIME, LORE, SHAP, etc.) to illustrate the reasons for the classification in one of the steps of the previous tasks. 
 + 
 + 
 + 
 + 
 +N.B. When "solving the classification task", remember, (i) to test, when needed, different criteria for the parameter estimation of the algorithms, and (ii) to evaluate the classifiers (e.g., Accuracy, F1, Lift Chart) in order to compare the results obtained with an imbalanced technique against those obtained from using the "original" dataset.  
 + 
 + 
 + 
 +====== Exam Dates ======
  
 ===== Exam Sessions ===== ===== Exam Sessions =====
-TBD+^ Session ^ Date            ^ Time        ^ Room   ^ Notes ^ Marks ^ 
 +|1.|16.01.2019| 14:00 - 18:00| [[https://teams.microsoft.com/l/team/19%3aeebd8a88148d433582ca36bc54d6e441%40thread.tacv2/conversations?groupId=adba5ac4-f242-40be-b8aa-e375da1d4f2c&tenantId=c7456b31-a220-47f5-be52-473828670aa1|MS Teams]] | Please, use the system for registration: https://esami.unipi.it/ | |
  
 +===== Past Exams =====
 +  * Past exams texts can be found in old pages of the course. Please do not consider these exercises as a unique way of testing your knowledge. Exercises can be changed and updated every year and will be published together with the slides of the lectures.
  
 ===== Reading About the "Data Scientist" Job ===== ===== Reading About the "Data Scientist" Job =====
Linea 236: Linea 307:
  
 ====== Previous years ===== ====== Previous years =====
-[[http://didawiki.cli.di.unipi.it/doku.php/dm/dm.2019-20|DM-2019/20]] +   [[dm.2019-20]] 
 +   * [[dm.2018-19]] 
 +   * [[dm.2017-18]] 
 +  * [[dm.2016-17]] 
 +  * [[dm.2015-16]] 
 +  * [[dm.2014-15]] 
 +  * [[dm.2013-14]] 
 +  * [[dm.2012-13]] 
 +  * [[dm.2011-12]] 
 +  * [[dm.2010-11]] 
 +  * [[dm.2009-10]] 
 +  * [[dm.2008-09]] 
 +  * [[dm.2007-08]] 
 +  * [[dm.2006-07]] 
 +  * [[PhDWorkshop2011]] 
 +  * [[SNA.Ingegneria2011]] 
 +  * [[SNA.IMT.2011]] 
 +  * [[MAINS.SANTANNA.2011-12]] 
 +  * [[MAINS.SANTANNA.DM4CRM.2012]] 
 +  * [[MAINS.SANTANNA.DM4CRM.2016]] 
 +  * [[MAINS.SANTANNA.DM4CRM.2017 Data Mining for Customer Relationship Management 2017]] 
 +  * [[MAINS.SANTANNA.DM4CRM.2018]] 
 +  * [[MAINS.SANTANNA.DM4CRM.2019]] 
 +  * [[SDM2018 | Instructions for camera ready and copyright transfer]] 
 +  * [[DM-SAM | Storie dell'Altro Mondo]] 
 +  * [[DM-I40 | Master Industry 4.0]]
  
dm/dm.2020-21.1630921540.txt.gz · Ultima modifica: 06/09/2021 alle 09:45 (4 anni fa) da Anna Monreale

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki