Indice

Data Mining A.A. 2016/17

DM 1: Foundations of Data Mining

Instructors - Docenti:

Teaching assistant - Assistente:

DM 2: Advanced topics on Data Mining and case studies

Instructors:

News

Learning goals -- Obiettivi del corso

… a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.

Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.

La grande disponibilità di dati provenienti da database relazionali, dal web o da altre sorgenti motiva lo studio di tecniche di analisi dei dati che permettano una migliore comprensione ed un più facile utilizzo dei risultati nei processi decisionali. L'obiettivo del corso è quello di fornire un'introduzione ai concetti di base del processo di estrazione di conoscenza, alle principali tecniche di data mining ed ai relativi algoritmi. Particolare enfasi è dedicata agli aspetti metodologici presentati mediante alcune classi di applicazioni paradigmatiche quali il Basket Market Analysis, la segmentazione di mercato, il rilevamento di frodi. Infine il corso introduce gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza. Il corso consiste delle seguenti parti:

  1. i concetti di base del processo di estrazione della conoscenza: studio e preparazione dei dati, forme dei dati, misure e similarità dei dati;
  2. le principali tecniche di datamining (regole associative, classificazione e clustering). Di queste tecniche si studieranno gli aspetti formali e implementativi;
  3. alcuni casi di studio nell’ambito del marketing e del supporto alla gestione clienti, del rilevamento di frodi e di studi epidemiologici.
  4. l’ultima parte del corso ha l’obiettivo di introdurre gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza

Reading about the "data scientist" job

Hours - Orario e Aule

DM 1

Classes - Lezioni

Day of Week Hour Room
Lunedì/Monday 11:00 - 13:00 Aula C
Venerdì/Friday 14:00 - 16:00 Aula A1

Office hours - Ricevimento:

DM 2

Classes - Lezioni

Day of week Hour Room
Tuesday 16:00 - 18:00 B
Friday 16:00 - 18:00 B

Office hours - Ricevimento:

Learning Material -- Materiale didattico

Textbook -- Libro di Testo

Slides of the classes -- Slides del corso

Le slide utilizzate durante il corso verranno inserite nel calendario al termine di ogni lezione. In buona parte esse sono tratte da quelle fornite dagli autori del libro di testo: Slides per "Introduction to Data Mining"

Past Exams

Data mining software

Class calendar - Calendario delle lezioni (2016-2017)

First part of course, first semester (DMF - Data mining: foundations)

Day Aula Topic Learning material Instructor
1. 19.09.2016 11:00-13:00 C Canceled -
2. 23.09.2016 14:00-16:00 A1 Introduction Course OverviewDM Introduction Monreale
3. 26.09.2016 11:00-13:00 C Data Understanding3.dataunderstanding.pdf 3.data-understanting-appendix.pdf Monreale
4. 30.09.2016 14:00-16:00 A1 Data Preparation 4.data_preparation.pdf Monreale
5. 03.10.2016 11:00-13:00 C Introduction to Python, Knime python_tutorial.zip Monreale/Guidotti
6. 07.10.2016 14:00-16:00 A1 Exercises on Data Understanding. exercises-dm1.pdf Monreale/Guidotti
7. 10.10.2016 11:00-13:00 C Centroid-based methods.dm2014_clustering_intro.pdf dm2014_clustering_kmeans.pdf Monreale
8. 14.10.2016 14:00-16:00 A1 Hierarchical methods.Density Based Clustering dm2014_clustering_hierarchical.pdf knime_slides_mains.pdf Monreale
9. 17.10.2016 11:00-13:00 C Knime - Python: Data Understanding python_data_understanding.zip knime_data_manipulation_iris.zip knime_data_manipulation_adult.zip Monreale/Guidotti
10. 21.10.2016 14:00-16:00 A1 Clustering Validation dm2014_clustering_validation.pdf Monreale
11. 24.10.2016 11:00-13:00 C Knime - Python: Clustering HC with Group Average exercises-clustering.pdf knime_clustering_iris.zip titanic_clustering.ipynb.zip Monreale/Guidotti
12. 28.10.2016 14:00-16:00 A1 Exercises on Clustering HC with Group Average exercises-clustering.pdf Monreale/Guidotti
04.11.2016 9:00-11:00 A First Mid-term test Monreale/Guidotti
13. 07.11.2016 11:00-13:00 C Frequent Patterns & Association Rules 4-5tdm-restructured_assoc.pdf Monreale
14. 11.11.2016 14:00-16:00 A1 Event on Big Data: Aula Magna
15. 14.11.2016 11:00-13:00 C Frequent Patterns & Association Rules
16. 18.11.2016 14:00-16:00 A1 Knime - Python: Frequent Pattern & Association Rules knime_pattern.zip knime_pattern_titanic2.zip titanic_frequent_patterns.ipynb.zip (http://www.borgelt.net/apriori.html)
17. 21.11.2016 11:00-13:00 C Classification chap4_basic_classification.pdf
18. 25.11.2016 14:00-16:00 A1 Classification
19. 28.11.2016 11:00-13:00 C Classification
20. 02.12.2016 14:00-16:00 A1 Exercises on Patterns & Classification
21. 05.12.2016 11:00-13:00 C Canceled
22. 09.12.2016 14:00-16:00 A1 Canceled
23. 12.12.2016 11:00-13:00 C Exercises on Patterns & Classification knime_classification_iris.zip titanic_classification.ipynb.zip Guidotti / Pedreschi
24. 16.12.2016-18.12.2015 A1 Knime - Python: Classification Guidotti / Pedreschi
21.12.2016 9:00-11:00 A Second Mid-term test Monreale/Guidotti

Second part of course, second semester (DMA - Data mining: advanced topics and case studies)

Day Room (Aula) Topic Learning material Instructor (default: Nanni)
1. 21.02.2017 16:00-18:00 B Introduction + Sequential patters/1 Introduction Sequential patters Nanni + Pedreschi
2. 24.02.2017 16:00-18:00 B Sequential patterns/2
3. 28.02.2017 16:00-18:00 B Sequential patterns/3 Link to SPMF, a tool for seq. patterns and sample dataset. Exercises: Text 1 and Text 2
03.03.2017 16:00-18:00 B cancelled
4. 07.03.2017 16:00-18:00 B Time series/1 Time series
5. 10.03.2017 16:00-18:00 B Time series/2 Python examples, Knime examples, link to sounds dataset (source: speech recognition example)
6. 14.03.2017 16:00-18:00 B Time series/3 Python examples/2
7. 17.03.2017 16:00-18:00 B Time series/4 Python examples/3, Knime example
8. 21.03.2017 16:00-18:00 B DM Process/1 Example AMRP (also described in this report, in Italian), CRISP-DM, Link to the CRISP-DM 1.0 guide (by SPSS)
9. 24.03.2017 16:00-18:00 B DM Process/2 Intro_CRM Churn
10. 28.03.2017 16:00-18:00 B DM Process/3 Collective churn analysis, Promotions, Sophistication. Sample reports made by students and (loosely) following CRISP-DM: Report 1 (Italian), Report 2 (English), Report 3 (Italian). Exercise on CRISP-DM: understanding churn
31.03.2017 16:00-18:00 B Cancelled
11. 04.04.2017 16:00-18:00 B Exercises Exercise on Understanding churn (with a solution). See also exercises in section Past Exams
07.04.2017 11:00-13:00 A1 + C1 Mid-term exams
12. 21.04.2017 16:00-18:00 B Classification: alternative methods/1 slides on K-nearest neighbours and Naive Bayes Pedreschi
13. 28.04.2017 16:00-18:00 B Classification: alternative methods/2 slides on Artificial Neural Networks and Support Vector Machines Pedreschi
14. 02.05.2017 16:00-18:00 B Classification: alternative methods/3 slides on ensemble methods and slides on the wisdom of the crowds original 1907 Nature paper by Francis Galton "Vox populi" Pedreschi
15. 05.05.2017 16:00-18:00 Lecture canceled
16. 09.05.2017 16:00-18:00 B Classification: validation methods/1 Slides from P. Adamopoulos, Slides from J.F. Ehmke
17. 12.05.2017 16:00-18:00 B Classification: validation methods/2 Imbalanced data & evaluation, Knime sample classification & evaluation, Python sample classification & evaluation
18. 16.05.2017 16:00-18:00 B Classification: validation methods/3
19. 19.05.2017 16:00-18:00 B Exercises Ex. from past exams 1, Ex. from past exams 2, Mixed Exercises, Lift chart
20. 23.05.2017 16:00-18:00 B Outlier Detection/1 Slides from SDM2010 tutorial
21. 26.05.2017 16:00-18:00 B Outlier Detection/2 Python examples, Knime examples, link to ELKI framework, test dataset for ELKI
22. 30.05.2017 16:00-18:00 B Exercises Exercises on outliers detection, Exercises on ensembles and ROC/Lift chart
06.06.2017 11:00-13:00 A + B Mid-term exams

Exams

Exam DM part I (DMF)

The exam is composed of three parts:

Guidelines for the project are here.

Exam DM part II (DMA)

The exam is composed of three parts:

Appelli di esame

Mid-term exams

Date Hour Place Notes Marks
First Mid-term 2016 4.11.2016 9:00 - 11:00 Room A
Second Mid-term 2016 21.12.2016 9:00 - 11:00 Room A
Date Hour Place Notes Marks
1st Mid-term 2017 7.4.2017 11:00 - 13:00 Rooms A1 + C1 Solutions Results 7.4.2017
2nd Mid-term 2017 6.6.2017 11:00 - 13:00 Rooms A + B Solutions Results 6.6.2017

Appelli regolari / Exam sessions

Session Date Time Room Notes Solutions Marks
1. 19 Jan 2017 09:00 C In the same date we will define the dates for the oral exam.
2. 08 Feb 2017 14:00 C In the same date we will define the dates for the oral exam.
3. 08 June 2017 14:00 A1 (1) Oral exam of DM1 for students having already the vote for the written exam of DM1. (2) Oral exam of DM2 for students having already the vote for the written exam of DM2. Please, use the system for registration: https://esami.unipi.it/
4. 09 June 2017 10:00 A1 (1) Oral exam of DM1 for students having already the vote for the written exam of DM1. (2) Oral exam of DM2 for students having already the vote for the written exam of DM2. Please, use the system for registration: https://esami.unipi.it/
5. 13 June 2017 11:00 A1 Written exam of DM1/DM2. In the same date we will do oral exam for students already having the written vote and we will define the dates for the oral exam. Please, use the system for registration: https://esami.unipi.it/ Solutions Results DM2 13.6.2017
6. 04 July 2017 09:00 A1 Written exam of DM1/DM2. In the same date we will do oral exam for students already having the written vote and we will define the dates for the oral exam. Please, use the system for registration: https://esami.unipi.it/ Solutions Results DM2 4.7.2017
7. 06 September 2017 09:00 A1 Written exam of DM1/DM2. In the same date we will do oral exam for students already having the written vote and we will define the dates for the oral exam. Please, use the system for registration: https://esami.unipi.it/ Solutions Results DM2 6.9.2017

Appelli straordinari A.A. 2015/16 / Extra sessions A.A. 2015/16

Date Time Room Notes Results

Edizioni anni precedenti