Indice

Data Mining A.A. 2014/15

DM 1: Foundations of Data Mining

Instructors - Docenti:

Teaching assistant - Assistente:

DM 2: Advanced topics on Data Mining and case studies

Instructors:

News

Learning goals -- Obiettivi del corso

… a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.

Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.

La grande disponibilità di dati provenienti da database relazionali, dal web o da altre sorgenti motiva lo studio di tecniche di analisi dei dati che permettano una migliore comprensione ed un più facile utilizzo dei risultati nei processi decisionali. L'obiettivo del corso è quello di fornire un'introduzione ai concetti di base del processo di estrazione di conoscenza, alle principali tecniche di data mining ed ai relativi algoritmi. Particolare enfasi è dedicata agli aspetti metodologici presentati mediante alcune classi di applicazioni paradigmatiche quali il Basket Market Analysis, la segmentazione di mercato, il rilevamento di frodi. Infine il corso introduce gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza. Il corso consiste delle seguenti parti:

  1. i concetti di base del processo di estrazione della conoscenza: studio e preparazione dei dati, forme dei dati, misure e similarità dei dati;
  2. le principali tecniche di datamining (regole associative, classificazione e clustering). Di queste tecniche si studieranno gli aspetti formali e implementativi;
  3. alcuni casi di studio nell’ambito del marketing e del supporto alla gestione clienti, del rilevamento di frodi e di studi epidemiologici.
  4. l’ultima parte del corso ha l’obiettivo di introdurre gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza

Reading about the "data scientist" job

Hours - Orario e Aule

DM 1

Classes - Lezioni

Giorno Orario Aula
Lunedì/Monday 16:00 - 18:00 Aula C
Venerdì/Friday 14:00 - 16:00 Aula A1

Office hours - Ricevimento:

DM 2

Classes - Lezioni

Day of week Hour Room
Monday 9:00 - 11:00 Room N1
Thursday 9:00 - 11:00 Room A1

Office hours - Ricevimento:

Learning Material -- Materiale didattico

Textbook -- Libro di Testo

Slides of the classes -- Slides del corso

Testi di esame

Data mining software

Class calendar - Calendario delle lezioni (2014-2015)

First part of course, first semester (DMF - Data mining: foundations)

Day Aula Topic Learning material Instructor
1. 25.09.2014 14:00-16:00 B Intro: data mining & knowledge discovery process Textbook, Chapt. 1 dm_intro-2011.pdf Pedreschi
2. 26.09.2014 16:00 CNR Evento BRIGHT presso il CNR di Pisa - Big Data Tales Pedreschi
3. 02.10.2014 14:00-16:00 B Intro: data mining & knowledge discovery process Textbook, Chapt. 1 dm_intro-2011.pdf Pedreschi
4. 03.10.2014 14:00-16:00 A1 Intro: data mining & knowledge discovery process Textbook, Chapt. 1 dm_intro-2011.pdf Pedreschi
5. 09.10.2014 14:00-16:00 B Data: types and basic measures Textbook, Chapt. 2 chap2_data_new.pdf Pedreschi
6. 10.10.2014 14:00-16:00 A1 Data: types and basic measures Textbook, Chapt. 2 chap2_data_new.pdf Pedreschi
7. 13.10.2014 14:00-16:00 B Data: types and basic measures Textbook, Chapt. 2 chap2_data_new.pdf Pedreschi
8. 17.10.2014 14:00-16:00 A1 Canceled Pedreschi
9. 20.10.2014 14:00-16:00 B Exploratory data analysis and data understanding. Textbook, Chapt. 3 chap3_data_exploration.pdf Pedreschi
10. 24.10.2014 14:00-16:00 A1 Clustering analysis. Centroid-based methods Textbook, Chapt. 8 dm2014_clustering_intro.pdf dm2014_clustering_kmeans.pdf Pedreschi
11. 27.10.2014 14:00-16:00 B Clustering analysis. Hierarchical methods Textbook, Chapt. 8 dm2014_clustering_hierarchical.pdf Pedreschi
12. 31.10.2014 14:00-16:00 A1 Tutorial on Knime Slide: knime_slides_dm.pdf Workflows: data-manipulation_iris.zip data-manipulation_adult.zip clustering_iris.zip Pedreschi
13. 10.11.2014 14:00-16:00 B Clustering analysis. Density-based methods Textbook, Chapt. 8 dm2014_clustering_dbscan.pdf Pedreschi
14. 14.11.2014 14:00-16:00 A1 Classification and predictive methods Textbook, Chapt. 4 chap4_basic_classification.pdf Pedreschi
15. 17.11.2014 14:00-16:00 B Classification. Decision trees Textbook, Chapt. 4 chap4_basic_classification.pdf Pedreschi
16. 21.11.2014 14:00-16:00 A1 Classification. Decision trees Textbook, Chapt. 4 chap4_basic_classification.pdf Pedreschi
17. 24.11.2014 14:00-16:00 B Classification. Validation and Weka & KNIME Lab Workflows:decisiontreeiris.zip decisiontreeadult.zip decisiontreeadultoverfitting.zip Milli
18. 28.11.2014 14:00-16:00 A1 Classification. Rule-based and bayesian methods Textbook, Chapt. 4 chap4_basic_classification.pdf Pedreschi
19. 01.12.2014 14:00-16:00 B Frequent Pattern Mining. Textbook, Chapt. 6 2-3tdm-restructured_assoc_2013.pdf Pedreschi
20. 05.12.2014 14:00-16:00 A1 Association Rule Mining Textbook, Chapt. 6 2-3tdm-restructured_assoc_2013.pdf Pedreschi
21. 12.12.2014 14:00-16:00 A1 Cancelled for strike Pedreschi
22. 15.12.2014 14:00-16:00 B Association Rule Mining and Knime Workflow: FP and AR Monreale

Second part of course, second semester (DMA - Data mining: advanced topics and case studies)

Day Aula Topic Learning material Instructor
1. 23.02.2014 09:00-11:00 N1 Introduction + Sequential patterns / 1 Sequential Patterns - Slides Nanni
2. 26.02.2015 09:00-11:00 A1 Sequential patterns / 2 Link to Tool for seq. patterns Nanni
3. 02.03.2015 09:00-11:00 N1 Graph mining Slides Nanni
05.03.2015 09:00-11:00 A1 ———–
4. 09.03.2015 09:00-11:00 N1 Advanced Classification Methods / 1 Slides Monreale
5. 12.03.2015 09:00-11:00 A1 Advanced Classification Methods / 2 Monreale
6. 16.03.2015 09:00-11:00 N1 Advanced Classification Methods / 3 Exercises on Classidication Monreale
7. 19.03.2015 09:00-11:00 A1 Time series / 1 Slides Nanni
8. 23.03.2015 09:00-11:00 N1 Time series / 2 Example of DTW in R Nanni
9. 26.03.2015 09:00-11:00 A1 Exercises Exercises from past exams Nanni
10. 30.03.2015 09:00-11:00 N1 Exercises Monreale
11. 02.04.2015 09:00-11:00 A1 Exercises Monreale
03-07.04.2015 EASTER HOLIDAYS
13.04.2015 09:00-11:00 C1 Midterm test
12. 16.04.2015 09:00-11:00 A1 Case study: CRM - Customer Segmentation + CRISP-DM AMRP & Stulong CRISP-DM Nanni
13. 23.04.2015 09:00-11:00 A1 Case study: CRM - Churn Analysis Intro CRM Churn ST-Churn Nanni
14. 27.04.2015 09:00-11:00 N1 Case study: CRM - Promotions and Sophistication Promotions Sophistication Nanni
15. 30.04.2015 09:00-11:00 A1 Spatiotemporal analysis / 1 ST Analysis REF: Survey paper Nanni
16. 04.05.2015 09:00-11:00 N1 Spatiotemporal analysis / 2 Nanni
17. 07.05.2015 09:00-11:00 A1 Case study: Spatiotemporal analysys / 1 + Projects presentation Case study 1 Projects Nanni
18. 11.05.2015 09:00-11:00 N1 Case study: Spatiotemporal analysys / 2 Case study 2 Nanni
19. 14.05.2015 09:00-11:00 A1 Spatiotemporal analysis / 3 ST Classification Nanni
20. 18.05.2015 09:00-11:00 N1 Outlier detection Slides from SDM2010 tutorial Nanni
21. 21.05.2015 09:00-11:00 A1 Ethical Issues in Data Analytics Slides Monreale
22. 25.05.2015 09:00-11:00 N1 Ethical Issues in Data Analytics / Fraude Detection Case Study Monreale

Exams

Exam DM part I (DMF)

L'esame consiste in una prova scritta ed in una prova orale:

Exam DM part II (DMA)

The exam is composed of three parts:

Esercizi 2014-2015

Esercizi DM parte I -- Exercises DM First Part

Guidelines for the homework are here.

Appelli di esame

Mid-term exams

Date Hour Place Notes Marks
Mid-term 2015 Monday 13.04.2015 9.00 Room C1

Appelli regolari / Exam sessions

Session Date Time Room Notes Results
1. Monday 19 January 2015 9.00 C Results of written exam
1. Wednesday 21 January 2015 9.00 Predreschi's office oral exam. Send an email to register for the oral exam
1. Thursday 29 January 2015 14.00 Predreschi's office oral exam. Send an email to register for the oral exam
2. Monday 16 February 2015 9.00 C Results of written exam
2. Monday 23 February 2015 11.00 Predreschi's office oral exam. Send an email to register for the oral exam
2. Monday 2 March 2015 11.00 Predreschi's office oral exam. Send an email to register for the oral exam
3. Friday 05 June 2015 14.00 C Results of written exam
Session Date Time Room Notes Results
1. Monday 19 January 2015 9.00 C
2. Monday 16 February 2015 9.00 C
3. Friday 05.06.2015 14.00 C
4. Friday 26.06.2015 14.00 C
5. Friday 17.07.2015 9.00 C
6. Wednesday 09.09.2015 9.00 C

Appelli straordinari A.A. 2013/14 / Extra sessions A.A. 2013/14

Date Time Room Notes Results
7 November 2014 9:00-11:00 C1

Edizioni anni precedenti