Instructors:
Teaching Assistant
Instructors:
Classes
Office hours - Ricevimento:
Classes
Office Hours - Ricevimento:
Day | Room | Topic | Learning material | Instructor | |
---|---|---|---|---|---|
1. | 16.09.2020 14:00-16:00 | MS Teams | Introduction. | Course Overview Introduction DM | Pedreschi |
2. | 23.09.2020 16:00-18:00 | MS Teams | Data Understanding | Slides DU Slides on Descriptive Statistics | Pedreschi |
3. | 28.09.2020 14:00-16:00 | MS Teams | Data Understanding | Pedreschi | |
4. | 30.09.2020 16:00-18:00 | MS Teams | Data Preparation | Slides DP | Pedreschi |
5. | 05.10.2020 14:00-16:00 | MS Teams | Lab: Introduction to Python and Knime | Python Introduction, Knime simple workflow Lecture 5 part 1, Lecture 5 part 2 | Guidotti, Citraro |
6. | 07.10.2020 16:00-18:00 | MS Teams | Lab: Data Understanding & Preparation | Dataset: Iris, Titanic, Knime: 01_data_understanding.zip Python: titanic_data_understanding2.ipynb.zip Lecture 6 part 1, Lecture 6 part 2 | Guidotti, Citraro |
7. | 12.10.2020 14:00-16:00 | MS Teams | Clustering: Intro & K-means | Slides clustering 1 | Nanni |
8. | 14.10.2020 16:00-18:00 | MS Teams | Clustering: Hierarchical methods | Slides clustering 2 | Nanni |
9. | 19.10.2020 14:00-16:00 | MS Teams | Clustering: Density-based methods and exercises | Slides clustering 3, Clustering exercises | Nanni |
10. | 21.10.2020 16:00-18:00 | MS Teams | Clustering: Validation methods and exercises | Slides clustering 4 | Nanni |
11. | 26.10.2020 14:00-16:00 | MS Teams | Lab: Clustering | Knime , Python Iris Python Titanic | Citraro |
12. | 28.10.2020 16:00-18:00 | MS Teams | Classification: Intro and Decision Trees | Slides classification | Nanni |
02.11.2020 14:00-16:00 | No Lecture. Project Week. | ||||
04.11.2020 16:00-18:00 | No Lecture. Project Week. | ||||
13. | 09.11.2020 14:00-16:00 | MS Teams | Classification: Decision Trees/2 | Nanni | |
14. | 11.11.2020 16:00-18:00 | MS Teams | Classification: Decision Trees/3 | Nanni | |
15. | 16.11.2020 14:00-16:00 | MS Teams | Classification: Decision Trees/4 | Sample exercise | Nanni |
16. | 18.11.2020 16:00-18:00 | MS Teams | Classification: Decision Trees/5 + Exercises | Exercises 1, Excercises 2 | Nanni |
17. | 23.11.2020 14:00-16:00 | MS Teams | Classification: KNN | Slides, Exercise 1 (KNN only), Exercise 2 | Nanni |
18. | 25.11.2020 16:00-18:00 | MS Teams | Lab: Clustering | knime_classification python_classification python_classification2 | Citraro |
19. | 02.12.2020 16:00-18:00 | MS Teams | Pattern & Association Rule Mining - Apriori algorithm for frequent itemset mining | 2-dm2-restructured_assoc-2020.pdf | Pedreschi |
20. | 07.12.2020 14:00-16:00 | MS Teams | Pattern & Association Rule Mining - Rule mining and evaluation, Closed and maximal itemsets, Multi-dimensional, Quantitative and Multy-level association rules | Pedreschi | |
21. | 14.12.2020 14:00-16:00 | Lab Pattern Mining | knime_pattern python_pattern https://anaconda.org/conda-forge/pyfim, http://www.borgelt.net/pyfim.html ex-frequentpatterns-ar.pdf | Citraro |
Day | Room | Topic | Learning material | Instructor | Recordings | |
---|---|---|---|---|---|---|
1. | 15.02.2021 14:00-16:00 | MS Teams | Introduction, CRIPS, KNN | Intro, CRISP, KNN | Guidotti | 1stPart, 2ndPart |
2. | 17.02.2021 16:00-18:00 | MS Teams | Performance Evaluation | Eval, occupancy_data, KNN_Eval_Notebook | Guidotti | Dataset, Lecture |
3. | 22.02.2021 14:00-16:00 | MS Teams | Imbalanced Learning | ImbLearn, DimRed_notebook, ImbLearn_notebook | Guidotti | 1stPart, 2ndPart |
4. | 23.02.2021 16:00-18:00 | MS Teams | Anomaly Detection | MLE, Anomaly Detection, Anomaly_notebook | Guidotti | 1st Part, 2nd Part |
5. | 01.03.2021 14:00-16:00 | MS Teams | Anomaly Detection | Anomaly Detection, Anomaly_notebook | Guidotti | 1st Part, 2nd Part |
6. | 03.02.2021 16:00-18:00 | MS Teams | Anomaly Detection | Anomaly Detection, Anomaly_notebook, Extended Isolation Forest link | Guidotti | 1st Part, 2nd Part |
7. | 08.03.2021 14:00-16:00 | MS Teams | Naive Bayes Classifier | NBC, NBC_notebook, Ex1_Miro, Ex2_Miro | Guidotti | 1st Part, 2nd Part |
10.02.2021 16:00-18:00 | Lezione sul tema “Da Pisa al Fermilab di Chicago: Viaggio verso un rivoluzionario computer quantistico” della prof.ssa Anna Grassellino | Link | Guidotti | |||
8. | 15.03.2021 14:00-16:00 | MS Teams | Linear and Logistic Regression, Rule-based Classifiers | Regression, RuleBased, Regression_Notebook | Guidotti | 1stPart, 2ndPart |
9. | 17.03.2021 16:00-18:00 | MS Teams | Rule-based Classifiers, Support Vector Machines | RuleBased, RuleBased_Notebook, SVM, SVM_Notebook | Guidotti | 1st Part, 2nd Part |
10. | 22.03.2021 14:00-16:00 | MS Teams | (Nonlinear) Support Vector Machines, Linear Perceptron | SVM, SVM_Notebook, Linear Perceptron | Guidotti | 1st Part, 2nd Part |
11. | 24.03.2021 16:00-18:00 | MS Teams | Neural Networks, Deep Neural Networks | Neural Network, NN_Notebook | Guidotti | 1st Part, 2nd Part |
- | 25.03.2021 15:00-17:00 | MS Teams | Neural Networks Forward and Backpropagation Example, Case Study Music | NN_Implementation, Case Study | Guidotti | 1st Part, 2nd Part |
12. | 29.03.2021 14:00-16:00 | MS Teams | Neural Networks (Training Tricks), Ensemble Classifiers | Ensemble Classifiers | Guidotti | 1st Part, 2nd Part |
13. | 31.03.2021 16:00-18:00 | MS Teams | Ensemble Classifiers | Ensemble Classifiers, Ensemble_Notebook | Guidotti | 1st Part, 2nd Part |
14. | 12.04.2021 14:00-16:00 | MS Teams | Time Series Similarity | Time Series Similarity | Guidotti | 1st Part, 2nd Part |
15. | 14.04.2021 16:00-18:00 | MS Teams | Time Series Similarity, Approximation and Clustering | Time Series Similarity, Time Series Approximation and Clustering | Guidotti | 1st Part, 2nd Part |
16. | 19.04.2021 14:00-16:00 | MS Teams | Time Series Motifs | TS_Similarty_Notebook, Time Series Motifs, TS Datasets, Keras Accuracy | Guidotti | 1st Part, 2nd Part |
17. | 21.04.2021 16:00-18:00 | MS Teams | Time Series Classification | Time Series Classification, TS_Plot, TS_Similarty_Notebook (updated) | Guidotti | 1st Part, 2nd Part, Office Hours |
18. | 26.04.2021 14:00-16:00 | MS Teams | Time Series Classification | Time Series Classification, TS_Shapelet_Motif_Notebook, TS_classification_Notebook, TS_from_MP3_Notebook | Guidotti | 1st Part, 2nd Part, Tutorial MP3 |
19. | 28.04.2021 16:00-18:00 | MS Teams | Sequential Pattern Mining | Sequential Pattern Mining | Guidotti | 1st Part, 2nd Part |
20. | 03.05.2021 14:00-16:00 | MS Teams | Sequential Pattern Mining (Timing Constraints) | Sequential Pattern Mining, SPM_Notebook, TS_extraction_RMS, RMSE_TS Dataset | Guidotti | 1st Part, 2nd Part, Tutorial RMSE |
21. | 05.05.2021 16:00-18:00 | MS Teams | Advanced Clustering Methods | Advanced Clustering Methods | Guidotti | 1st Part, 2nd Part |
22. | 10.05.2021 14:00-16:00 | MS Teams | Transactional Clustering Methods | Transactional Clustering Methods, ACM_notebooks | Guidotti | Hint Clus TS 1st Part, 2nd Part |
23. | 12.05.2021 16:00-18:00 | MS Teams | Explainable Artificial Intelligence | XAI, ACM_Notebook | Guidotti | ACM_Notebook 1st Part, 2nd Part |
24. | 17.05.2021 14:00-16:00 | MS Teams | Explainable Artificial Intelligence | XAI, XAI_Notebook | Guidotti | 1st Part, 2nd Part |
The exam is composed of two parts:
Tasks of the project:
Guidelines for the project are here.
Exam Rules
Exam Booking Periods
Exam Booking Agenda
The link to the agenda for booking a slot for the exam is displayed at the end of the registration. During the exam the camera must remain open and you must be able to share your screen. For the exam could be required the usage of the Miro platform (https://miro.com/app/dashboard/).
The exam is composed of two parts:
Project Guidelines
N.B. When “solving the classification task”, remember, (i) to test, when needed, different criteria for the parameter estimation of the algorithms, and (ii) to evaluate the classifiers (e.g., Accuracy, F1, Lift Chart) in order to compare the results obtained with an imbalanced technique against those obtained from using the “original” dataset.
Session | Date | Time | Room | Notes | Marks |
---|---|---|---|---|---|
1. | 16.01.2019 | 14:00 - 18:00 | MS Teams | Please, use the system for registration: https://esami.unipi.it/ |
… a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.
Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.