Instructors:
Teaching Assistant:
Classes
Day of Week | Hour | Room |
---|---|---|
Tuesday | 11:00 - 13:00 | Room C1 |
Thursday | 14:00 - 16:00 | Room A1 |
Friday | 09:00 - 11:00 | Room C1 |
Office hours - Ricevimento:
A Teams Channel will be used ONLY to post news, Q&A, and other stuff related to the course. The lectures will be only in presence and will NOT be live-streamed, but recordings of the lecture or of the previous years will be made available here for non-attending students.
Books
Title | Authors | Edition |
---|---|---|
Introduction to Data Mining | Pang-Ning Tan, Michael Steinbach, Vipin Kumar | 2nd |
Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications | Laura Igual, Santi Seguí | 2nd |
Python Data Science Handbook: Essential Tools for Working with Data | Jake VanderPlas | 1st |
Deep Learning | Ian Goodfellow, Yoshua Bengio, Aaron Courville | |
Introduction to Linear Algebra | Gilbert Strang | 5th |
Online tutorials
Authors | |
---|---|
Digital Signals Theory | Brian McFee |
An introduction to Dynamic Time Warping | Romain Tavenard |
Introduction to Python | Mattia Setzu |
Slides
The slides used in the course will be inserted in the calendar after each class. Some are part of the slides provided by the textbook's authors Slides per "Introduction to Data Mining".
Software
Software material available in the Github repository.
Day | Topic | Teaching material | References | Video Lectures | Teacher | |
---|---|---|---|---|---|---|
17.09 | Candeled | |||||
1. | 19.09 | Overview. Introduction to KDD | 1-overview-2024.pdf 1-intro-dm-2024.pdf | Chap. 1 Kumar Book | Part1 Part2 | Monreale |
2. | 20.09 | Data Understanding + Data Preparation (Aggr., Sampling, Dim. Reduction, Feature Selection, Feature Creation). | 2-data_understanding-2024.pdf 3-data_preparation-2024.pdf | Chap.2 Kumar Book and additioanl resource of Kumar Book: Data Exploration Chap. If you have the first ed. of KUMAR this is the Chap 3 | Lecture Recording | Monreale |
3 | 24.09 | Data representation | Slides: data_representation.pdf. | References: Introduction to linear algebra (Sections 1, 3.1, 4.2, 6.1, 6.4, 6.5, 7.3), t-SNE paper, UMAP paper (Section 3) | Lecture Recording | Setzu |
4. | 26.09 | Data Cleaning + Transformations. Python Lab: Data Understanding and Preparation | Data Cleaning and Transformations | Part1: Data Cleaning_Trasformations Python Lab | Monreale, Mannocci | |
5. | 27.09 | Python Lab: Data Understanding and Preparation + Similarities | Github repository 6-data_similarity.pdf | PythonLab Similarity | Monreale, Mannocci | |
6. | 01.10 | Introduction to Clustering and Centroid-based clustering | Introduction to Clustering Analysis K-means | Chap. 7 Kumar Book | Lecture Recording | Monreale |
7. | 03.10 | Hierarchical Clustering | 9-basic_cluster_analysis-hierarchical.pdf | Chap. 7 Kumar Book | Lecture Recording | Monreale |
8. | 04.10 | Density Based Clustering & Variants of K-means | 10-basic_cluster_analysis-dbscan.pdf 11-basic_cluster_analysis-kmeans-variants.pdf | Chap. 7 Kumar Book | Lecture Recording | Monreale |
9. | 08.10 | Clustering Validation + Python Lab | 12-basic_cluster_analysis-validity.pdf See github for the python noteebook on clustering | Chap. 7 Kumar Book | Monreale, Mannocci | |
10.10 | Lecture canceled due to UNIPI Orienta | |||||
11.10 | Lecture canceled due to UNIPI Orienta | |||||
10. | 15.10 | Outlier detection | Outlier Detection | Sections 1.3.1-4, 2.2 Kumar book | Setzu | |
11. | 17.10 | Outlier detection | Outlier Detection | Sections 3.2-3, 4.2-5 2.2 Kumar book | Setzu | |
12. | 18.10 | Python Lab: Outlier detection | See Github | Setzu, Mannocci | ||
13. | 22.10 | Association Rule Mining: Apriori | 17_association_analysis.pdf | Chap.5 Kumar Book | Monreale | |
14. | 24.10 | Association Rule Mining: FP-Growth | 17_2023-fp-growth.pdf | Chap.5 Kumar Book | Monreale | |
15. | 25.10 | Squential Pattern Mining | 18_sequential_patterns_2024.pdf | Chap.6 Kumar Book | Monreale | |
16. | 29.10 | Squential Pattern Mining with Time Constraints | same slides as above | Chap.6 Kumar Book | Monreale | |
17. | 05.11 | Python Lab: FPM + SPM. Intro to classification | same slides as above | Chap.6 Kumar Book | Monreale | |
18. | 07.11 | Decision Trees & Classifier Evaluation | dt_classification.pdfClassification Model Evaluation | Chap.3 Kumar Book | Monreale | |
19. | 08.11 | Classifier Evaluation + Introduction to Rule-based Classifiers | Rule-based Classification | Chap.4 Kumar Book | Monreale | |
20. | 12.11 | Rule-based Classifiers + KNN | 10-knn.pdf | Chap.4 Kumar Book | Monreale | |
21. | 14.11 | Supervised Learning | supervised_tasks.pdf | Setzu | ||
22. | 15.11 | Neural networks | networks.pdf | Setzu | ||
23. | 19.11 | Notebooks on Supervised Learning | Setzu, Mannocci | |||
24. | 21.11 | Notebooks on Supervised Learning, Time Series | time_series.pdf | Setzu, Mannocci | ||
25. | 22.11 | Time series | time_series.pdf | Setzu | ||
26. | 26.11 | Setzu | ||||
27. | 28.11 | Monreale | ||||
29. | 29.11 | Monreale | ||||
30. | 03.12 | Monreale | ||||
31. | 06.12 | Monreale | ||||
32. | 10.12 | Paper Presentations | ||||
33. | 12.12 | Paper Presentations | ||||
34. | 13.12 | Paper Presentations | ||||
35. | 17.12 | Paper Presentations | ||||
36. | 18.12 | Paper Presentations |
Project:
A project consists in data analyses based on the use of data mining tools. The project has to be performed by a team of 3 students. It has to be performed by using Python. The guidelines require to address specific tasks. Results must be reported in a unique paper. The total length of this paper must be max 25 pages of text including figures. The students must deliver both: paper (single column) and well commented Python Notebooks.
Students who did not deliver the above project within Dec 29, 2024 need to ask by email a new project to the teachers. The project that will be assigned will require about 20 days of work and after the delivery it will be discussed during the oral exam.
Paper Presentation (OPTIONAL)
Students need to present a research paper (made available by the teacher) during the last week of the course. This presentation is OPTIONAL: Students that decide to do the paper presentation can avoid the oral exam with open questions on the entire program. They only need to present the project (see next point) and answer open question only on the topics which will not be covered by the project. The paper presentation can be done by the group or by a single person.
Oral Exam
How to book for the exam colloquium?
In https://esami.unipi.it/ you can find the dates for the exam: one for January and one for February. Each student must do the registration on one of the 2 dates. These are not the dates of the colloquium or project delivery but we will use the list of registered students for organizing the exam dates. After that deadline we will share with you a calendar for the oral exam.