Instructors - Docenti:
Teaching assistant - Assistente:
Instructors:
Instructors:
Teaching assistant - Assistente:
… a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.
Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.
La grande disponibilità di dati provenienti da database relazionali, dal web o da altre sorgenti motiva lo studio di tecniche di analisi dei dati che permettano una migliore comprensione ed un più facile utilizzo dei risultati nei processi decisionali. L'obiettivo del corso è quello di fornire un'introduzione ai concetti di base del processo di estrazione di conoscenza, alle principali tecniche di data mining ed ai relativi algoritmi. Particolare enfasi è dedicata agli aspetti metodologici presentati mediante alcune classi di applicazioni paradigmatiche quali il Basket Market Analysis, la segmentazione di mercato, il rilevamento di frodi. Infine il corso introduce gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza. Il corso consiste delle seguenti parti:
Classes - Lezioni
Day of Week | Hour | Room |
---|---|---|
Mercoledì/Wednesday | 14:00 - 16:00 | Aula C1 |
Giovedì/Thursday | 16:00 - 18:00 | Aula C1 |
Venerdì/Friday | 11:00 - 13:00 | Aula A1 |
Office hours - Ricevimento:
Classes - Lezioni
Day of week | Hour | Room |
---|---|---|
Thursday | 14 - 16 | A1 |
Friday | 16 - 18 | C1 |
Office hours - Ricevimento:
Le slide utilizzate durante il corso verranno inserite nel calendario al termine di ogni lezione. In buona parte esse sono tratte da quelle fornite dagli autori del libro di testo: Slides per "Introduction to Data Mining"
* Some text of past exams on DM1 (6CFU):
* Some solutions of past exams containing exercises on KNN and Naive Bayes classifiers DM1 (9CFU):
* Some exercises (partially with solutions) on sequential patterns and time series can be found in the following texts of exams from the last years:
Day | Aula | Topic | Learning material | Instructor | |
---|---|---|---|---|---|
1. | 20.09.2017 14:00-16:00 | C1 | Introduction | Pedreschi | |
2. | 21.09.2017 16:00-18:00 | C1 | Introduction | Introduction | Pedreschi |
3. | 22.09.2017 11:00-13:00 | A1 | Lecture canceled | Pedreschi | |
4. | 27.09.2017 14:00-16:00 | C1 | Data Understanding | Data Understanding For this topic we suggest: “Guide to Intelligent Data Analysis” | Monreale |
5. | 28.09.2017 16:00-18:00 | C1 | Introduction to Python, Knime | python_tutorial knime_tutorial | Monreale/Guidotti |
6. | 29.09.2017 11:00-13:00 | A1 | Data Understanding | Pedreschi | |
7. | 04.10.2017 14:00-16:00 | C1 | Data Preparation | 4.data_preparation.pdf | Pedreschi |
8. | 05.10.2017 16:00-18:00 | C1 | Data Preparation | Pedreschi | |
9. | 06.10.2017 11:00-13:00 | A1 | Canceled | ||
10. | 11.10.2017 14:00-16:00 | C1 | Knime - Python: Data Understanding | Pandas knime_data_understanding python_data_understanding | Pedreschi/Guidotti |
11. | 12.10.2017 16:00-18:00 | C1 | Clustering analysis: Centroid-based methods. | dm2014_clustering_intro.pdf dm2014_clustering_kmeans.pdf | Pedreschi |
12. | 13.10.2017 11:00-13:00 | A1 | Hierarchical methods. | dm2014_clustering_hierarchical.pdf | Pedreschi |
13. | 18.10.2017 14:00-16:00 | C1 | Clustering analysis: Density-based methods. Exercises on Data Understanding | dm2014_clustering_dbscan.pdf exercises-dm1.pdf | Monreale/Guidotti |
14. | 19.10.2017 16:00-18:00 | C1 | Exercises on Clustering | Online Didactic Data Mining | Monreale/Guidotti |
15. | 20.10.2017 11:00-13:00 | A1 | Knime - Python: Clustering | knime_clustering python_clustering | Monreale/Guidotti |
16. | 25.10.2017 14:00-16:00 | C1 | Clustering Validation | dm2014_clustering_validation.pdf | Monreale |
17. | 26.10.2017 16:00-18:00 | C1 | Exercises on Clustering | 2016-01-18-dm1-prima.pdf dm-clustering.pdf | Monreale |
18. | 27.10.2017 11:00-13:00 | A1 | Canceled | ||
30.10.2017 14:00-18:00 | A1,C1 | First Mid-term test | |||
19. | 08.11.2017 14:00-16:00 | C1 | Frequent Pattern & Association Rules | restructured_assoc.pdf Chapter 6 of textbook (avoid sections 6.4.2, 6.5, 6.6, 6.7.2, 6.7.2, 6.8) | Pedreschi |
20. | 09.11.2017 16:00-18:00 | C1 | Frequent Pattern & Association Rules | Pedreschi | |
21. | 10.11.2017 11:00-13:00 | A1 | Knime - Frequent Patterns & Association Rules | knime_pattern_mining python_pattern_mining Borgelt Web Page | Guidotti / Pedreschi |
22. | 15.11.2017 14:00-16:00 | C1 | Classification/1 | 11.chap4_basic_classification.pdf | Pedreschi |
23. | 16.11.2017 16:00-18:00 | C1 | Classification/2 | Monreale | |
24. | 17.11.2017 11:00-13:00 | A1 | Knime - Python: Classification | knime_classification python_classification | Guidotti/Pedreschi |
25. | 22.11.2017 14:00-16:00 | C1 | Classification/3 | Pedreschi | |
26. | 23.11.2017 16:00-18:00 | C1 | Exercises on Classification & Frequent Patterns | exercises-c-ar.pdf | Guidotti/Pedreschi |
24.11.2017 11:00-13:00 | A1 | Canceled – The next lectures are dedicated to the DM of 9 credits | |||
27. | 29.11.2017 14:00-16:00 | C1 | Alternative methods for clustering | 1-alternative-clustering.pdf | Monreale |
28. | 30.11.2017 16:00-18:00 | C1 | Transactional Clustering | 2-transactionalclustering.pdf exercises-clustering-rock.pdf Papers on Clustering | Monreale |
29. | 01.12.2017 11:00-13:00 | A1 | Alternative methods for classification/1 | K-NN & Naive Bayes | Pedreschi |
30. | 06.12.2017 14:00-16:00 | C1 | Alternative methods for classification/2 | Ensemble methods Wisdom of the crowd & Ensemble methods Galton's Vox Populi | Pedreschi |
31. | 07.12.2017 16:00-18:00 | C1 | Exercises on clustering and classification | exercises-clope.pptx exercises_classification_3cfu.pdf | Monreale |
32. | 13.12.2017 14:00-16:00 | C1 | Alternative method for frequent patterns and AR | fp-growth.pdf | Monreale |
33. | 14.12.2017 16:00-18:00 | C1 | Alternative methods for classification/2 | Rule-based classification | Pedreschi |
34. | 15.12.2017 11:00-13:00 | A1 | Exercises on the second part of the course | esercitazione20171215 | Guidotti/Pedreschi |
35. | 20.12.2017 14:00-17:00 | A1,C1 | Second Mid-term test: See Mid-term section for details |
Day | Room (Aula) | Topic | Learning material | Instructor (default: Nanni) | |
---|---|---|---|---|---|
1. | 22.02.2018 14:00-16:00 | A1 | Introduction + Sequential patters/1 | Introduction Sequential patterns | |
2. | 23.02.2018 16:00-18:00 | C1 | Sequential patterns/2 | ||
| A1 | Cancelled | |||
3. | 02.03.2018 16:00-18:00 | C1 | Sequential patterns/3 | Exercises from past exams: dm2_exam.2017.10.30.pdf dm2_mid-term_exam.2017.04.07.pdf | |
4. | 08.03.2018 14:00-16:00 | A1 | Sequential patterns/4 + Time series/1 | Sequential pattern tools: Link to SPMF + sample dataset, Python educational implementation (source), Knime example. Slides: Time Series (updated) | |
| C1 | Cancelled | |||
5. | 15.03.2018 14:00-16:00 | A1 | Time series/2 | Python preprocessing, Python DTW | |
6. | 16.03.2018 16:00-18:00 | C1 | Time series/3 | Book chapter about DTW (from Meinard Müller's book) | |
7. | 22.03.2018 14:00-16:00 | A1 | Time series/4 | Python structural distances. Exercises from past exams: dm2_exam.2017.10.30.pdf dm2_mid-term_exam.2017.04.07.pdf | |
8. | 23.03.2018 16:00-18:00 | C1 | Exercises | Exercises from past exams | |
10.04.2018 16:00-18:00 | E | Mid-term exam | |||
9. | 12.04.2018 14:00-16:00 | A1 | Classification: alternative methods/1 | Slides on the wisdom of the crowds, Original 1907 Nature paper by Francis Galton "Vox populi" | Pedreschi |
10. | 13.04.2018 16:00-18:00 | C1 | Classification: alternative methods/2 | Slides on K-nearest neighbours and Naive Bayes | |
11. | 19.04.2018 14:00-16:00 | A1 | Classification: alternative methods/3 | Slides on ANNs and Support Vector Machines | |
12. | 20.04.2018 16:00-18:00 | C1 | Classification: exercises | Exercises from past exams | |
13. | 26.04.2018 14:00-16:00 | A1 | Classification: evaluation/1 | Model performances, Unbalanced classes and Scoring models | |
14. | 27.04.2018 16:00-18:00 | C1 | Classification: evaluation/2 | Classification weights, Lift chart examples. Homeworks! | |
15. | 03.05.2018 14:00-16:00 | A1 | DM process/1 | Python sample classification & evaluation, Example AMRP (also described in this report, in Italian) | |
16. | 04.05.2018 16:00-18:00 | C1 | DM process/2 | CRISP-DM, Sample project with CRISP-DM, Link to "First Aid for Data Scientist" web site (pwd: datamining_2018 – Contains slides and Quiz to pass, see exam instructions). | |
17. | 10.05.2018 14:00-16:00 | A1 | Outlier detection/1 | Slides | |
18. | 11.05.2018 16:00-18:00 | C1 | Outlier detection/2 | ||
19. | 17.05.2018 14:00-16:00 | A1 | Outlier detection/3 | Python examples, Knime examples, link to ELKI framework, test dataset for ELKI | |
20. | 18.05.2018 16:00-18:00 | C1 | Exercises | Ex on outlier detection, Ex on classification | |
21. | 25.05.2018 16:00-18:00 | C1 | Exercises | exercises_25.05.2018.zip | |
01.06.2018 16:00-18:00 | E | 2nd Mid-term exam | |||
08.06.2018 14:00-17:00 | C | Oral exams | Reserved to who passed the mid-term written exams |
The exam is composed of three parts:
Guidelines for the project are here.
The exam is composed of three parts:
Date | Hour | Place | Notes | Marks | |
---|---|---|---|---|---|
First Mid-term 2017 | 30.10.2017 | 14:00 - 17:00 | Room A1, C1 | Please, use the system for registration: https://esami.unipi.it/ | |
Second Mid-term 2017 | 20.12.2017 | 14:00 - 17:00 | Room A1, C1 | Please, use the system for registration: https://esami.unipi.it/ | |
DM2: first mid-term 2018 | 10.04.2018 | 16:00 - 18:00 | Room E | Please, use the system for registration: https://esami.unipi.it/ | Results |
DM2: second mid-term 2018 | 01.06.2018 | 16:00 - 18:00 | Room E | Please, use the system for registration: https://esami.unipi.it/ | Results |
DM2: oral exam 2018 | 08.06.2018 | 14:00 - 17:00 | Room C | For students that passed the mid-term written exams. Please, use the system for registration: https://esami.unipi.it/ |
Session | Date | Time | Room | Notes | Marks |
---|---|---|---|---|---|
1. | 10 Jan 2018 | 09:00 | C1 | Oral exam for students who passed the mid-term exam and delivered the project work. https://esami.unipi.it/ | |
2. | 17 Jan 2018 | 09:00 | A1 | Witten Exam. In the same date we will define the dates for the next oral exams. https://esami.unipi.it/ | |
3. | 06 Feb 2018 | 09:00 | C | Witten Exam. In the same date we will define the dates for the next oral exams. https://esami.unipi.it/ | |
4. | 12 June 2018 | 09:00 | A1 | Witten Exam. In the same date we will define the dates for the next oral exams. https://esami.unipi.it/ | DM2 Results |
5. | 3 July 2018 | 09:00 | A1 | Witten Exam. In the same date we will define the dates for the next oral exams. https://esami.unipi.it/ | |
6. | 13 September 2018 | 09:00 | A1 | Witten Exam. In the same date we will define the dates for the next oral exams. https://esami.unipi.it/ |
Date | Time | Room | Notes | Results |
---|---|---|---|---|
30.10.2017 | 14:00 - 18:00 | Room A1, C1 | ||
20.12.2017 | Room A1, C1 |