Indice

Data Mining (309AA) - 9 CFU A.Y. 2024/2025

Instructors:

Teaching Assistant:

News

Learning Goals

Schedule

Classes

Day of Week Hour Room
Tuesday 11:00 - 13:00 Room C1
Thursday 14:00 - 16:00 Room A1
Friday 09:00 - 11:00 Room C1

Office hours - Ricevimento:

A Teams Channel will be used ONLY to post news, Q&A, and other stuff related to the course. The lectures will be only in presence and will NOT be live-streamed, but recordings of the lecture or of the previous years will be made available here for non-attending students.

Teaching Material

Books

Title Authors Edition
Introduction to Data Mining Pang-Ning Tan, Michael Steinbach, Vipin Kumar 2nd
Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications Laura Igual, Santi Seguí 2nd
Python Data Science Handbook: Essential Tools for Working with Data Jake VanderPlas 1st
Deep Learning Ian Goodfellow, Yoshua Bengio, Aaron Courville
Introduction to Linear Algebra Gilbert Strang 5th

Online tutorials

Slides

The slides used in the course will be inserted in the calendar after each class. Some are part of the slides provided by the textbook's authors Slides per "Introduction to Data Mining".

Software

Software material available in the Github repository.

Class Calendar (2024/2025)

First Semester

Day Topic Teaching material References Video Lectures Teacher
17.09 Candeled
1. 19.09 Overview. Introduction to KDD 1-overview-2024.pdf 1-intro-dm-2024.pdf Chap. 1 Kumar Book Part1 Part2 Monreale
2. 20.09 Data Understanding + Data Preparation (Aggr., Sampling, Dim. Reduction, Feature Selection, Feature Creation). 2-data_understanding-2024.pdf 3-data_preparation-2024.pdfChap.2 Kumar Book and additioanl resource of Kumar Book: Data Exploration Chap. If you have the first ed. of KUMAR this is the Chap 3 Lecture Recording Monreale
3 24.09 Data representation Slides: data_representation.pdf. References: Introduction to linear algebra (Sections 1, 3.1, 4.2, 6.1, 6.4, 6.5, 7.3), t-SNE paper, UMAP paper (Section 3) Lecture Recording Setzu
4. 26.09 Data Cleaning + Transformations. Python Lab: Data Understanding and Preparation Data Cleaning and Transformations Part1: Data Cleaning_Trasformations Python LabMonreale, Mannocci
5. 27.09 Python Lab: Data Understanding and Preparation + Similarities Github repository 6-data_similarity.pdf PythonLab SimilarityMonreale, Mannocci
6. 01.10 Introduction to Clustering and Centroid-based clustering Introduction to Clustering Analysis K-means Chap. 7 Kumar Book Lecture Recording Monreale
7. 03.10 Hierarchical Clustering 9-basic_cluster_analysis-hierarchical.pdf Chap. 7 Kumar Book Lecture RecordingMonreale
8. 04.10 Density Based Clustering & Variants of K-means 10-basic_cluster_analysis-dbscan.pdf 11-basic_cluster_analysis-kmeans-variants.pdf Chap. 7 Kumar Book Lecture Recording Monreale
9. 08.10 Clustering Validation + Python Lab 12-basic_cluster_analysis-validity.pdf See github for the python noteebook on clustering Chap. 7 Kumar Book Monreale, Mannocci
10.10 Lecture canceled due to UNIPI Orienta
11.10 Lecture canceled due to UNIPI Orienta
10. 15.10 Outlier detection Outlier Detection Sections 1.3.1-4, 2.2 Kumar book Setzu
11. 17.10 Outlier detection Outlier Detection Sections 3.2-3, 4.2-5 2.2 Kumar book Setzu
12. 18.10 Python Lab: Outlier detection See Github Setzu, Mannocci
13. 22.10 Association Rule Mining: Apriori 17_association_analysis.pdf Chap.5 Kumar Book Monreale
14. 24.10 Association Rule Mining: FP-Growth 17_2023-fp-growth.pdf Chap.5 Kumar Book Monreale
15. 25.10 Squential Pattern Mining 18_sequential_patterns_2024.pdf Chap.6 Kumar Book Monreale
16. 29.10 Squential Pattern Mining with Time Constraints same slides as above Chap.6 Kumar Book Monreale
17. 05.11 Python Lab: FPM + SPM. Intro to classification same slides as above Chap.6 Kumar Book Monreale
18. 07.11 Decision Trees & Classifier Evaluation dt_classification.pdfClassification Model Evaluation Chap.3 Kumar Book Monreale
19. 08.11 Classifier Evaluation + Introduction to Rule-based Classifiers Rule-based Classification Chap.4 Kumar Book Monreale
20. 12.11 Rule-based Classifiers + KNN 10-knn.pdf Chap.4 Kumar Book Monreale
21. 14.11 Supervised Learning supervised_tasks.pdf Setzu
22. 15.11 Neural networks networks.pdf Setzu
23. 19.11 Notebooks on Supervised Learning Setzu, Mannocci
24. 21.11 Notebooks on Supervised Learning, Time Series time_series.pdf Setzu, Mannocci
25. 22.11 Time series time_series.pdf Setzu
26. 26.11 Setzu
27. 28.11 Monreale
29. 29.11 Monreale
30. 03.12 Monreale
31. 06.12 Monreale
32. 10.12 Paper Presentations
33. 12.12 Paper Presentations
34. 13.12 Paper Presentations
35. 17.12 Paper Presentations
36. 18.12 Paper Presentations

Exams

Project:

A project consists in data analyses based on the use of data mining tools. The project has to be performed by a team of 3 students. It has to be performed by using Python. The guidelines require to address specific tasks. Results must be reported in a unique paper. The total length of this paper must be max 25 pages of text including figures. The students must deliver both: paper (single column) and well commented Python Notebooks.

  1. Dataset: Dataset
  2. Deadline: the fist part has to be delivered by November 19th, 2024 November 24th, 2024 . Instructions for delivery on Teams (channel project).

Students who did not deliver the above project within Dec 29, 2024 need to ask by email a new project to the teachers. The project that will be assigned will require about 20 days of work and after the delivery it will be discussed during the oral exam.

Paper Presentation (OPTIONAL)

Students need to present a research paper (made available by the teacher) during the last week of the course. This presentation is OPTIONAL: Students that decide to do the paper presentation can avoid the oral exam with open questions on the entire program. They only need to present the project (see next point) and answer open question only on the topics which will not be covered by the project. The paper presentation can be done by the group or by a single person.

Oral Exam

How to book for the exam colloquium?

In https://esami.unipi.it/ you can find the dates for the exam: one for January and one for February. Each student must do the registration on one of the 2 dates. These are not the dates of the colloquium or project delivery but we will use the list of registered students for organizing the exam dates. After that deadline we will share with you a calendar for the oral exam.

Previous years

Data Mining (309AA) - 9 CFU A.Y. 2023/2024

DM-INF 2022-2023

Data Mining (309AA) - 9 CFU A.Y. 2021/2022

Data Mining (309AA) - 9 CFU A.Y. 2020/2021

DM-2019/20