Instructors - Docenti:
Teaching assistant - Assistente:
Instructors:
… a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.
Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.
La grande disponibilità di dati provenienti da database relazionali, dal web o da altre sorgenti motiva lo studio di tecniche di analisi dei dati che permettano una migliore comprensione ed un più facile utilizzo dei risultati nei processi decisionali. L'obiettivo del corso è quello di fornire un'introduzione ai concetti di base del processo di estrazione di conoscenza, alle principali tecniche di data mining ed ai relativi algoritmi. Particolare enfasi è dedicata agli aspetti metodologici presentati mediante alcune classi di applicazioni paradigmatiche quali il Basket Market Analysis, la segmentazione di mercato, il rilevamento di frodi. Infine il corso introduce gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza. Il corso consiste delle seguenti parti:
Classes - Lezioni
Day of Week | Hour | Room |
---|---|---|
Lunedì/Monday | 11:00 - 13:00 | Aula C |
Venerdì/Friday | 14:00 - 16:00 | Aula A1 |
Office hours - Ricevimento:
Classes - Lezioni
Day of week | Hour | Room |
---|---|---|
Tuesday | 16:00 - 18:00 | B |
Friday | 16:00 - 18:00 | B |
Office hours - Ricevimento:
Le slide utilizzate durante il corso verranno inserite nel calendario al termine di ogni lezione. In buona parte esse sono tratte da quelle fornite dagli autori del libro di testo: Slides per "Introduction to Data Mining"
Day | Aula | Topic | Learning material | Instructor | |
---|---|---|---|---|---|
1. | 19.09.2016 11:00-13:00 | C | Canceled | - | |
2. | 23.09.2016 14:00-16:00 | A1 | Introduction | Course OverviewDM Introduction | Monreale |
3. | 26.09.2016 11:00-13:00 | C | Data Understanding | 3.dataunderstanding.pdf 3.data-understanting-appendix.pdf | Monreale |
4. | 30.09.2016 14:00-16:00 | A1 | Data Preparation | 4.data_preparation.pdf | Monreale |
5. | 03.10.2016 11:00-13:00 | C | Introduction to Python, Knime | python_tutorial.zip | Monreale/Guidotti |
6. | 07.10.2016 14:00-16:00 | A1 | Exercises on Data Understanding. | exercises-dm1.pdf | Monreale/Guidotti |
7. | 10.10.2016 11:00-13:00 | C | Centroid-based methods. | dm2014_clustering_intro.pdf dm2014_clustering_kmeans.pdf | Monreale |
8. | 14.10.2016 14:00-16:00 | A1 | Hierarchical methods.Density Based Clustering | dm2014_clustering_hierarchical.pdf knime_slides_mains.pdf | Monreale |
9. | 17.10.2016 11:00-13:00 | C | Knime - Python: Data Understanding | python_data_understanding.zip knime_data_manipulation_iris.zip knime_data_manipulation_adult.zip | Monreale/Guidotti |
10. | 21.10.2016 14:00-16:00 | A1 | Clustering Validation | dm2014_clustering_validation.pdf | Monreale |
11. | 24.10.2016 11:00-13:00 | C | Knime - Python: Clustering | HC with Group Average exercises-clustering.pdf knime_clustering_iris.zip titanic_clustering.ipynb.zip | Monreale/Guidotti |
12. | 28.10.2016 14:00-16:00 | A1 | Exercises on Clustering | HC with Group Average exercises-clustering.pdf | Monreale/Guidotti |
04.11.2016 9:00-11:00 | A | First Mid-term test | Monreale/Guidotti | ||
13. | 07.11.2016 11:00-13:00 | C | Frequent Patterns & Association Rules | 4-5tdm-restructured_assoc.pdf | Monreale |
14. | 11.11.2016 14:00-16:00 | A1 | Event on Big Data: Aula Magna | ||
15. | 14.11.2016 11:00-13:00 | C | Frequent Patterns & Association Rules | ||
16. | 18.11.2016 14:00-16:00 | A1 | Knime - Python: Frequent Pattern & Association Rules | knime_pattern.zip knime_pattern_titanic2.zip titanic_frequent_patterns.ipynb.zip (http://www.borgelt.net/apriori.html) | |
17. | 21.11.2016 11:00-13:00 | C | Classification | chap4_basic_classification.pdf | |
18. | 25.11.2016 14:00-16:00 | A1 | Classification | ||
19. | 28.11.2016 11:00-13:00 | C | Classification | ||
20. | 02.12.2016 14:00-16:00 | A1 | Exercises on Patterns & Classification | ||
21. | 05.12.2016 11:00-13:00 | C | Canceled | ||
22. | 09.12.2016 14:00-16:00 | A1 | Canceled | ||
23. | 12.12.2016 11:00-13:00 | C | Exercises on Patterns & Classification | knime_classification_iris.zip titanic_classification.ipynb.zip | Guidotti / Pedreschi |
24. | 16.12.2016-18.12.2015 | A1 | Knime - Python: Classification | Guidotti / Pedreschi | |
21.12.2016 9:00-11:00 | A | Second Mid-term test | Monreale/Guidotti |
Day | Room (Aula) | Topic | Learning material | Instructor (default: Nanni) | |
---|---|---|---|---|---|
1. | 21.02.2017 16:00-18:00 | B | Introduction + Sequential patters/1 | Introduction Sequential patters | Nanni + Pedreschi |
2. | 24.02.2017 16:00-18:00 | B | Sequential patterns/2 | ||
3. | 28.02.2017 16:00-18:00 | B | Sequential patterns/3 | Link to SPMF, a tool for seq. patterns and sample dataset. Exercises: Text 1 and Text 2 | |
| | cancelled | |||
4. | 07.03.2017 16:00-18:00 | B | Time series/1 | Time series | |
5. | 10.03.2017 16:00-18:00 | B | Time series/2 | Python examples, Knime examples, link to sounds dataset (source: speech recognition example) | |
6. | 14.03.2017 16:00-18:00 | B | Time series/3 | Python examples/2 | |
7. | 17.03.2017 16:00-18:00 | B | Time series/4 | Python examples/3, Knime example | |
8. | 21.03.2017 16:00-18:00 | B | DM Process/1 | Example AMRP (also described in this report, in Italian), CRISP-DM, Link to the CRISP-DM 1.0 guide (by SPSS) | |
9. | 24.03.2017 16:00-18:00 | B | DM Process/2 | Intro_CRM Churn | |
10. | 28.03.2017 16:00-18:00 | B | DM Process/3 | Collective churn analysis, Promotions, Sophistication. Sample reports made by students and (loosely) following CRISP-DM: Report 1 (Italian), Report 2 (English), Report 3 (Italian). Exercise on CRISP-DM: understanding churn | |
| B | Cancelled | |||
11. | 04.04.2017 16:00-18:00 | B | Exercises | Exercise on Understanding churn (with a solution). See also exercises in section Past Exams | |
07.04.2017 11:00-13:00 | A1 + C1 | Mid-term exams | |||
12. | 21.04.2017 16:00-18:00 | B | Classification: alternative methods/1 | slides on K-nearest neighbours and Naive Bayes | Pedreschi |
13. | 28.04.2017 16:00-18:00 | B | Classification: alternative methods/2 | slides on Artificial Neural Networks and Support Vector Machines | Pedreschi |
14. | 02.05.2017 16:00-18:00 | B | Classification: alternative methods/3 | slides on ensemble methods and slides on the wisdom of the crowds original 1907 Nature paper by Francis Galton "Vox populi" | Pedreschi |
15. | | Lecture canceled | |||
16. | 09.05.2017 16:00-18:00 | B | Classification: validation methods/1 | Slides from P. Adamopoulos, Slides from J.F. Ehmke | |
17. | 12.05.2017 16:00-18:00 | B | Classification: validation methods/2 | Imbalanced data & evaluation, Knime sample classification & evaluation, Python sample classification & evaluation | |
18. | 16.05.2017 16:00-18:00 | B | Classification: validation methods/3 | ||
19. | 19.05.2017 16:00-18:00 | B | Exercises | Ex. from past exams 1, Ex. from past exams 2, Mixed Exercises, Lift chart | |
20. | 23.05.2017 16:00-18:00 | B | Outlier Detection/1 | Slides from SDM2010 tutorial | |
21. | 26.05.2017 16:00-18:00 | B | Outlier Detection/2 | Python examples, Knime examples, link to ELKI framework, test dataset for ELKI | |
22. | 30.05.2017 16:00-18:00 | B | Exercises | Exercises on outliers detection, Exercises on ensembles and ROC/Lift chart | |
06.06.2017 11:00-13:00 | A + B | Mid-term exams |
The exam is composed of three parts:
Guidelines for the project are here.
The exam is composed of three parts:
Date | Hour | Place | Notes | Marks | |
---|---|---|---|---|---|
First Mid-term 2016 | 4.11.2016 | 9:00 - 11:00 | Room A | ||
Second Mid-term 2016 | 21.12.2016 | 9:00 - 11:00 | Room A |
Date | Hour | Place | Notes | Marks | |
---|---|---|---|---|---|
1st Mid-term 2017 | 7.4.2017 | 11:00 - 13:00 | Rooms A1 + C1 | Solutions | Results 7.4.2017 |
2nd Mid-term 2017 | 6.6.2017 | 11:00 - 13:00 | Rooms A + B | Solutions | Results 6.6.2017 |
Session | Date | Time | Room | Notes | Solutions | Marks |
---|---|---|---|---|---|---|
1. | 19 Jan 2017 | 09:00 | C | In the same date we will define the dates for the oral exam. | ||
2. | 08 Feb 2017 | 14:00 | C | In the same date we will define the dates for the oral exam. | ||
3. | 08 June 2017 | 14:00 | A1 | (1) Oral exam of DM1 for students having already the vote for the written exam of DM1. (2) Oral exam of DM2 for students having already the vote for the written exam of DM2. Please, use the system for registration: https://esami.unipi.it/ | ||
4. | 09 June 2017 | 10:00 | A1 | (1) Oral exam of DM1 for students having already the vote for the written exam of DM1. (2) Oral exam of DM2 for students having already the vote for the written exam of DM2. Please, use the system for registration: https://esami.unipi.it/ | ||
5. | 13 June 2017 | 11:00 | A1 | Written exam of DM1/DM2. In the same date we will do oral exam for students already having the written vote and we will define the dates for the oral exam. Please, use the system for registration: https://esami.unipi.it/ | Solutions | Results DM2 13.6.2017 |
6. | 04 July 2017 | 09:00 | A1 | Written exam of DM1/DM2. In the same date we will do oral exam for students already having the written vote and we will define the dates for the oral exam. Please, use the system for registration: https://esami.unipi.it/ | Solutions | Results DM2 4.7.2017 |
7. | 06 September 2017 | 09:00 | A1 | Written exam of DM1/DM2. In the same date we will do oral exam for students already having the written vote and we will define the dates for the oral exam. Please, use the system for registration: https://esami.unipi.it/ | Solutions | Results DM2 6.9.2017 |
Date | Time | Room | Notes | Results |
---|---|---|---|---|