Instructors - Docenti:
Instructors:
Instructors:
… a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.
Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.
La grande disponibilità di dati provenienti da database relazionali, dal web o da altre sorgenti motiva lo studio di tecniche di analisi dei dati che permettano una migliore comprensione ed un più facile utilizzo dei risultati nei processi decisionali. L'obiettivo del corso è quello di fornire un'introduzione ai concetti di base del processo di estrazione di conoscenza, alle principali tecniche di data mining ed ai relativi algoritmi. Particolare enfasi è dedicata agli aspetti metodologici presentati mediante alcune classi di applicazioni paradigmatiche quali il Basket Market Analysis, la segmentazione di mercato, il rilevamento di frodi. Infine il corso introduce gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza. Il corso consiste delle seguenti parti:
Classes - Lezioni
Day of Week | Hour | Room |
---|---|---|
Lunedì/Monday | 14:00 - 16:00 | Aula E1 |
Mercoledì/Wednesday | 16:00 - 18:00 | Aula A1 |
Venerdì/Friday | 11:00 - 13:00 | Aula C1 |
Office hours - Ricevimento:
Classes - Lezioni
Day of Week | Hour | Room |
---|---|---|
Monday | 09:00 - 11:00 | C |
Wednesday | 16:00 - 18:00 | C1 |
Office hours - Ricevimento:
* Exercises on Clustering: ex._clustering.pdf
* Some text of past exams on DM1 (6CFU):
* Some solutions of past exams containing exercises on KNN and Naive Bayes classifiers DM1 (9CFU):
* Some exercises (partially with solutions) on sequential patterns and time series can be found in the following texts of exams from the last years:
Day | Topic | Learning material | Instructor | |
---|---|---|---|---|
1. | 16.09 14:00-16:00 | Overview. Introduction to KDD | Course Overview Introduction DM | Pedreschi |
18.09 16:00-18:00 | Lecture canceled (Event at Scuola S. Anna Information in News Section of this page) | Pedreschi | ||
2. | 20.09 11:00-13:00 | Introduction to KDD: technologies, Application and Data | Pedreschi | |
3. | 23.09 14:00-16:00 | Data Understanding (from Bertold book!) | Slides DU Slides on Descriptive Statistics useful for clarifying some statistical notions of statistics. Unfortunately this material is only in Italian. | Monreale |
4. | 25.09 16:00-18:00 | Data Preparation | Slides DP | Monreale |
27.09 11:00-13:00 | Climate Strike | |||
5. | 30.09 14:00-16:00 | Introduction to Python. | Python Introduction | Monreale |
6. | 02.10 16:00-18:00 | Clustering: Introduction + Centroid-based clustering, K-means | Clustering: Intro and K-means | Pedreschi |
7. | 04.10 11:00-13:00 | Lab: Data Understanding & Preparation in Knime | Knime: 01_data_understanding.zip Data: Titanic File | Monreale |
8. | 07.10 14:00-16:00 | Lab: DU Python + Project presentation | Python: titanic_data_understanding2.ipynb.zip | Monreale |
9. | 09.10 16:00-18:00 | Clustering: K-means + Hierarchical | 5.basic_cluster_analysis-hierarchical.pdf | Monreale |
10. | 11.10 11:00-13:00 | Suppressed for Internet festival | Pedreschi | |
11. | 14.10 14:00-16:00 | Clustering: DBSCAN & VALIDITY | 6.basic_cluster_analysis-dbscan-validity.pdf | Pedreschi |
12. | 16.10 16:00-18:00 | Exercises on Clustering | Tool for Dm ex: Didactic Data Mining Ex. Clustering PDF Ex. Clustering PPTX | Monreale |
13. | 18.10 11:00-13:00 | Lab: Clustering | clustering_knime clustering_python | Monreale |
14. | 21.10 14:00-16:00 | Classification | 7.chap3_basic_classification-2019.pdfA visual intro to machine learning | Pedreschi |
15. | 23.10 16:00-18:00 | Classification | Pedreschi | |
16. | 25.10 11:00-13:00 | Classification | Pedreschi | |
17. | 28.10 14:00-16:00 | LAB: Classificazione | knime_classification python_classification | Monreale |
18. | 30.10 16:00-18:00 | Exercises Classification + Discussion Clustering | ex-classification.pdf | Monreale |
19. | 04.11 14:00-15:00 | Pattern Mining | Note: the lecture will terminate at 15:00 to allow for the participation of the Informatica50 event (see news) slides | Pedreschi |
20. | 06.11 16:00-18:00 | Pattern Mining | Pedreschi | |
08-14.11 | Project work | |||
21. | 15.11 11:00-13:00 | Exercises and Lab on Pattern Mining | knime_pattern python_pattern https://anaconda.org/conda-forge/pyfim, http://www.borgelt.net/pyfim.html ex-frequentpatterns-ar.pdf | Monreale |
18.11 14:00-16:00 | Suppressed for weather conditions | |||
20.11 16:00-18:00 | Suppressed | |||
22. | 22.11 11:00-13:00 | Exercises Classification | Monreale | |
Next Classes are dedicated to DM of 9 CFU | ||||
23. | 25.11 14:00-16:00 | Alternative methods for classification/1 | K-Nearest Neighbors & Naive Bayes | Pedreschi |
24. | 27.11 16:00-18:00 | Alternative methods for classification/2 | Wisdom of the crowd & Ensemble methods: Bagging, Random Forest & Boosting Galton's "Vox Populi" 1907 Nature paper | Pedreschi |
25. | 29.11 11:00-13:00 | Alternative methods for classification/3 | Recap Ensemble methods & Hints to Rule-based classification | Pedreschi |
26. | 02.12 14:00-16:00 | Alternative Methods for Pattern Mining + Ex on KNN and NB | fp-growth.pdf KNN & NB | Monreale |
27. | 04.12 16:00-18:00 | Alternative Methods for Clustering | 1-alternative-clustering-2019.pdf2-transactionalclustering-2019.pdf | Monreale |
28. | 06.12 11:00-13:00 | Sequential Pattern Mining | Sequential patterns | Pedreschi |
29. | 09.12 14:00-16:00 | Exercises on sequential pattern mining & ROCK | exsequentialpatternmining.pdf ex-clustering-rock.pdf | Monreale |
30. | 11.12 16:00-18:00 | Black Box Explanations | 2019-dm_xai.pdf Material: LORE LIME Survey ABELE | Monreale |
31. | 13.12 11:00-13:00 | Exercises on written exam - all students | 9_cfu_ex.pdf ex_clustering_fpm_dt.pdf hierarchical_max_sim.pdf | Monreale |
32. | 16.12 13:30-16:00 | Mid-term Test (Rooms A, E1, C1) | Monreale | |
30. | 18.12 16:00-18:00 | Privacy in DM. Project. | privacydt.pdf Overview on Privacy Privacy by design | Monreale |
Day | Room (Aula) | Topic | Learning material | Instructor (Guidotti) | |
---|---|---|---|---|---|
1. | 17.02.2020 09:00-11:00 | C | Introduction, Instance-based and Bayesian Classifiers | Intro, Libraries, Instance-Based and Bayesian Classifiers | |
2. | 19.02.2020 16:00-18:00 | C1 | Linear and Logistic Regression, Dimensionality Reduction, Exercises KNN and Naive Bayes | Regression, Dimensionality Reduction, Ex_KNN_NB_Lift, Appendix | |
3. | 24.02.2020 09:00-11:00 | C | Imbalanced Learning, Performance Evaluation and Rule-based Classifiers | Imbalanced Learning Rule-based Classifiers | |
4. | 26.02.2020 16:00-18:00 | C1 | Exercises Lift, ROC, KNN and Naive Bayes. Lab KNN and Naive Bayes. | Ex_KNN_NB_Lift, Lab_KNN_NB, Data Preparation, Churn Dataset, Iris Dataset | |
5. | 02.03.2020 09:00-11:00 | C | Lab Regression, Dimensionality Reduction, Imbalanced Learning + CAT1 | Regression, Dimensionality Reduction, Imbalanced Learning Airquality Dataset | |
6. | 04.03.2020 16:00-18:00 | C1 | CRISP-DM, SVM, Intro NN | CRISP-DM, SVM, NN | |
7. | 09.03.2020 09:00-11:00 | online | Neural Network, Exercises NN | NN , Ex_NN_Ensemble | |
8. | 11.03.2020 16:00-18:00 | online | Neural Network, Exercises NN, Deep Neural Network, Intro Ensemble, Exercises Ensemble | NN , DNN Ex_NN_Ensemble | |
9. | 16.03.2020 09:00-11:00 | online | Ensemble Classifiers, Exercises Ensemble | Ensemble, Ex_NN_Ensemble | |
10. | 18.03.2020 16:00-18:00 | online | Lab SVM, Neural Network, Ensemble | Lab_SVM_NN_RF | |
11. | 23.03.2020 09:00-11:00 | online | Time Series Similarity, Ex DTW | Time Series Similarity, Ex_DTW | |
12. | 25.03.2020 16:00-18:00 | online | Time Series Motif/Shapelet, Ex Matrix Profile | Time Series Motif/Shapelet, Ex_MP | |
13. | 30.03.2020 09:00-11:00 | online | Time Series Stationariety and Forecasting | Time Series Forecasting | |
14. | 01.04.2020 16:00-18:00 | online | Lab Time Series | Lab_TS | |
15. | 06.04.2020 09:00-11:00 | online | Time Series Classification, Lab Time Series | Time Series Classification, Lab_TS, Data Partitioning | |
- | 08.04.2020 | Reading/Project Week | |||
- | 15.04.2020 | Reading/Project Week | |||
16. | 20.04.2020 09:00-11:00 | online | Sequential Pattern Mining | SPM | |
17. | 22.04.2020 16:00-18:00 | online | SPM Time Constraints, Exercises, Lab | Ex_SPM, Lab_SPM | |
18. | 27.04.2020 09:00-11:00 | online | Advanced Clustering, Ex, SPM, Lab EM, X-Means | Advanced Clustering , Lab_AC | |
19. | 29.04.2020 16:00-18:00 | online | Transactional Clustering, Ex TC, Lab K-Mode | Ex_SPM_TC | |
20. | 04.05.2020 09:00-11:00 | online | Anomaly Detection, Ex AD | Anomaly Detection , Ex_AD | |
21. | 06.05.2020 16:00-18:00 | online | Anomaly Detection, Ex AD, Lab AD | Anomaly Detection , Ex_AD, Lab_AD | |
22. | 11.05.2020 09:00-11:00 | online | Ethics: Privacy | Privacy | |
23. | 13.05.2020 16:00-18:00 | online | Ethics: Explainability | Explainability | |
24. | 18.05.2020 09:00-11:00 | online | Ethics: Local Explainability, Inspection, Transparent Methods, Lab | Explainability, Lab_XAI | |
- | 20.05.2020 | Reading/Project Week | |||
- | 25.05.2020 | Reading/Project Week | |||
- | 27.05.2020 | Reading/Project Week |
RULES FOR EXAMS for COMPUTER SCIENCE - 9CFU: EXAM RULES Summer Session - 9 CFU
RULES FOR EXAMS for DATA SCIENCE & BI and DIGITAL HUMANITIES - DM1(6CFU): EXAM RULES Summer Session - DM1(6CFU)
The exam is composed of two parts:
Tasks of the project:
Guidelines for the project are here.
The exam is composed of three parts:
Date | Hour | Place | Notes | Marks | |
---|---|---|---|---|---|
DM1: First Mid-term 2018 | 16.12.2019 | 13:30-16:00 | Room E1, C1, A | Please, use the system for registration: https://esami.unipi.it/ |
Session | Date | Time | Room | Notes | Marks |
---|---|---|---|---|---|
1. | 16.01.2019 | 14:00 - 18:00 | Room E | ||
2. | 06.02.2019 | 14:00 - 18:00 | Room E | ||
3. | 19.06.2019 | 09:00 - 13:00 | Room A1 | Oral Exam on DM1 within 15 July. If you cannot do within that date you can do the oral exam on September. | Results |
4. | 10.07.2019 | 09:00 - 13:00 | Room A1 | Oral Exam on DM1 within 15 July. If you cannot do within that date you can do the oral exam on September. | Results |
5. | 08.06.2020 | 09:00 - 18:00 | Microsoft Teams | From 08/06 to 25/06. Please register ( here) and select your slot here. We remind to submit the project one week before the exam. It would be helpful if you submit the project within 01/06. | |
6. | 26.06.2020 | 09:00 - 18:00 | Microsoft Teams | From 26/06 to 16/07. Please register ( here) and select your slot here. We remind to submit the project one week before the exam. It would be helpful if you submit the project within 21/06. | |
7. | 17.07.2020 | 09:00 - 18:00 | Microsoft Teams | From 17/07 to 29/07. Please register ( here) and select your slot at the agenda link that will be available from 12/07 only for those registered for the exam. We remind to submit the project one week before the exam. It would be helpful if you submit the project within 10/07. It is mandatory to submit the project before 15/07. |