Instructors - Docenti:
Teaching assistants - Assistenti:
… a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.
Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.
La grande disponibilità di dati provenienti da database relazionali, dal web o da altre sorgenti motiva lo studio di tecniche di analisi dei dati che permettano una migliore comprensione ed un più facile utilizzo dei risultati nei processi decisionali. L'obiettivo del corso è quello di fornire un'introduzione ai concetti di base del processo di estrazione di conoscenza, alle principali tecniche di data mining ed ai relativi algoritmi. Particolare enfasi è dedicata agli aspetti metodologici presentati mediante alcune classi di applicazioni paradigmatiche quali il Basket Market Analysis, la segmentazione di mercato, il rilevamento di frodi. Infine il corso introduce gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza. Il corso consiste delle seguenti parti:
Classes - Lezioni:
Giorno | Orario | Aula |
---|---|---|
Lunedì | 9:00 - 11:00 | Aula N1 |
Mercoledì | 9:00 - 11:00 | Aula L1 |
Office hours - Ricevimento:
Dropbox folder of audio/video lectures captured by students and instructors.
Day | Aula | Topic | Learning material | Instructor | |
---|---|---|---|---|---|
1. | 28.09.2012 14:00-16:00 | N1 | Intro: data mining & knowledge discovery process | Textbook, Chapt. 1 dm_intro-2011.pdf | Pedreschi |
2. | 04.10.2012 14:00-16:00 | N1 | Overview of data mining techniques and applications | Nanni | |
3. | 05.10.2012 14:00-16:00 | N1 | Overview of data mining techniques and applications | Nanni | |
4. | 11.10.2012 14:00-16:00 | N1 | Data: types and basic measures | Textbook, Chapt. 2 chap2_data_new.pdf | Pedreschi |
5. | 12.10.2012 09:00-11:00 | N1 | Data: types and basic measures | Pedreschi | |
6. | 18.10.2012 14:00-16:00 | N1 | Exploratory data analysis and data understanding. | Textbook, Chapt. 3 chap3_data_exploration.pdf | Nanni |
7. | 19.10.2012 14:00-16:00 | N1 | Exploratory data analysis and data understanding. | Pedreschi | |
8. | 25.10.2012 14:00-16:00 | N1 | Exploratory data analysis and data understanding. Weka Lab | Weka | Pedreschi |
9. | 26.10.2012 14:00-16:00 | N1 | Clustering analysis. Centroid-based methods | Textbook, Chapt. 8 chap8_basic_cluster_analysis.pdf | Pedreschi |
10. | 08.11.2012 14:00-16:00 | N1 | Clustering analysis. Hierarchical methods | Pedreschi | |
11. | 09.11.2012 14:00-16:00 | N1 | Clustering analysis. Density-based methods | Pedreschi | |
12. | 15.11.2012 14:00-16:00 | N1 | Clustering analysis. Validation and Weka Lab | Pedreschi | |
13. | 16.11.2012 14:00-16:00 | N1 | Classification and predictive methods | Textbook, Chapt. 4 chap4_basic_classification.pdf | Pedreschi |
14. | 22.11.2012 14:00-16:00 | N1 | Classification. Decision trees | Pedreschi | |
15. | 23.11.2012 14:00-16:00 | N1 | Classification. Decision trees | Pedreschi | |
16. | 29.11.2012 14:00-16:00 | N1 | Classification. Rule-based and bayesian methods | Pedreschi | |
17. | 30.11.2012 14:00-16:00 | N1 | Classification. Validation and Weka Lab | Pedreschi | |
18. | 06.12.2012 14:00-16:00 | N1 | canceled | Pedreschi | |
19. | 07.12.2012 14:00-14:00 | N1 | Classification. Validation and Weka Lab | Pedreschi | |
20. | 13.12.2012 14:00-16:00 | N1 | canceled | Pedreschi | |
21. | 14.12.2012 14:00-14:00 | N1 | Wrap-up. Presentation of Second Semester syllabus | Pedreschi, Giannotti, Nanni |
Day | Aula | Topic | Learning material | Instructor | |
---|---|---|---|---|---|
1. | 18.02.2013 9:00-11:00 | N1 | Introduction | Giannotti | |
2. | 27.02.2013 9:00-11:00 | N1 | Frequent patterns and association rules / 1 | Association Rules -- Slides | Giannotti |
3. | 04.03.2013 9:00-11:00 | N1 | Frequent patterns and association rules / 2 | Giannotti | |
4. | 06.03.2013 9:00-11:00 | N1 | Frequent patterns and association rules / 3 | Giannotti | |
5. | 11.03.2013 9:00-11:00 | N1 | Introduction to CRM and Churn analysis | 1.dm2_crm_customersegmentation-airmiles_2013.pdf 3.dm2012_st_events.pdf 4.dm2_churn_coop_2013.pdf 4.dm2_churn_intro_2013.pdf | Giannotti |
6. | 13.03.2013 9:00-11:00 | N1 | Association rules on DM tools | en_tanagra_assoc_rules_comparison.pdf http://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes | Giannotti |
7. | 18.03.2013 9:00-11:00 | N1 | Sequential patterns / 1 | Textbook, Sect. 7.4 Sequential Patterns - Slides [1-12] | Nanni |
8. | 20.03.2013 9:00-11:00 | N1 | Sequential patterns / 2 | Sequential Patterns - Slides [13-24] | Nanni |
9. | 25.03.2013 9:00-11:00 | N1 | Time series / 1 + Data exploration: assignments | Time Series - Slides [1-34] | Nanni |
10. | 27.03.2013 9:00-11:00 | L1 | Time series / 2 | Time Series - Slides [35-84] | Nanni |
11. | 08.04.2013 9:00-11:00 | N1 | Classification: evaluation methods + Case study: Fraud detection | fraud_detection.pdfdm2-fraudedetection1.ppt.pdf | Giannotti |
12. | 10.04.2013 9:00-11:00 | L1 | Network diffusion and Virality Marketing | 7.mains_crm_innovatori.pdf | Giannotti |
13. | 15.04.2013 9:00-11:00 | N1 | Mobility Data Mining / 1 | Mobility DM - Slides [1-33] + Reference book chapter (ask to instructor) | Nanni |
14. | 17.04.2013 9:00-11:00 | L1 | Mobility Data Mining / 2 | Nanni | |
15. | 22.04.2013 9:00-11:00 | N1 | Case study: Mobility Data Mining | MDM case study GSM for transport plannig | Nanni |
16. | 24.04.2013 9:00-11:00 | L1 | Case study: Mobility Data Mining/2 | Giannotti - Nanni | |
17. | 06.05.2013 9:00-11:00 | N1 | Data exploration: results of assignments + Presentation of projects | Project 1 sample solution | Nanni |
18. | 08.05.2013 9:00-11:00 | L1 | Data Mining and Privacy/1 | Privacy Mobility Data & Privacy | Giannotti |
19. | 13.05.2013 9:00-11:00 | N1 | Case study: Mining official data ed health data | Mining Official Data | Nanni |
20. | 15.05.2013 9:00-11:00 | L1 | Data Mining and Privacy/2 | Giannotti |
L'esame consiste in una prova scritta ed in una prova orale:
L'esame consta di due parti:
Data | Orario | Luogo | Note | Voti | |
---|---|---|---|---|---|
I Esercizio e II Esercizio |
Appello | Data | Orario | Luogo | Note | Voti |
---|---|---|---|---|---|
Tuesday 22 January 2013 | 9:00 | Aula A | |||
Tuesday 12 February 2013 | 9:00 | Aula C | |||
Monday 28 January 2013 | 10:00 | Pedreschi's Office | oral exam | ||
Friday 01 February 2013 | 15:00 | Pedreschi's Office | oral exam | ||
Wednesday 06 February 2013 | 15:00 | Pedreschi's Office | oral exam | ||
June 3, 2013 (Monday) | 9:00 | Aula N1 | |||
July 1, 2013 (Monday) | 9:00 | Aula N1 | |||
July 24, 2013 (Wednesday) | 9:00 | Aula L1 |