Strumenti Utente

Strumenti Sito


dm:dm.2019-20

Data Mining A.A. 2019/20

DM 1: Foundations of Data Mining (6 CFU)

Instructors - Docenti:

DM 2: Advanced Topics on Data Mining and Applications (6 CFU)

Instructors:

DM: Data Mining (9 CFU)

Instructors:

News

  • [04.07.2020] Third DM2 exams session from 17/07 to 29/07. Please register ( here) before 12/07 and select your slot at the agenda link that will be available from 12/07. We remind to submit the project one week before the exam. It is mandatory to submit the project before 15/07. Doodle will not be used for this session. Every slot can accept up to 4 students (in this case you have to register individually). Slots spans from 17/07 to 29/07 included.
  • [21.06.2020] In order to help us in correcting the projects and organizing oral exams, everyone has to submit the project with the occupancy detection dataset before midnight of the 15th of July 2020. Another dataset for will be published after this deadline and submission after the 15th of July must use the new dataset. Remains valid the rule that the project must be submitted at least ONE WEEK before the oral exam.
  • [12.06.2020] New Doodle is available for booking the DM2 exam here.
  • [22.05.2020] In the section of this page: “Exam DM part I (DMF)” you can find the new rules for the exams of DM(9CU) of computer science and DM1(6CFU) of Data Science & BI and Digital Humanities.
  • [14.04.2020] CAT4 for auto evaluation is available here (it will not be considered for final evaluation). Report your final mark here. It is recommended to do it before 18th May 2020. Solutions are available here.
  • [06.05.2020] Submission Draft 2 deadline 25/05/2020. We expect to find Task 2 and 3 completed, and if you started to do something of Task 4 and 5 is well accepted. We do not care about forms and shape what matters now is the content and proof that you continued making analysis on the data as required.
  • [01.05.2020] Keras Accuracy here.
  • [30.04.2020] DM2 Exam Rules here.
  • [14.04.2020] CAT3 for auto evaluation is available here (it will not be considered for final evaluation). Report your final mark here. It is recommended to do it before 9th Aprile 2020. Solutions are available here.
  • [08.04.2020] Submission Draft 1 deadline 16/04/2020. We expect to find Task 1 completed, Task 2 at a good stage (let say 60/70%), and if you started to do something of Task 3 is well accepted. We do not care about forms and shape what matters now is the content and proof that you started making analysis on the data as required.
  • [06.04.2020] Reading material available here.
  • [18.03.2020] CAT2 for auto evaluation is available here (it will not be considered for final evaluation). Report your final mark here. It is recommended to do it before Sunday 22nd Marc 2020. Solutions available here.
  • [05.03.2020] From Monday 9 we will have lectures online using Microsoft Teams. You can find here - ita, here - eng instructions to join the course. The code for joining the 420AA DATA MINING Team is rc6b0ko. The Microsoft Team will be used for replacing frontal lectures and office time. The material will be uploaded as usual on the DidaWiki web page.
  • [04.03.2020] Frontal lectures and office times are suspended.
  • [02.03.2020] CAT1 Results: CAT1
  • [24.02.2020] Project Dataset Change: Occupancy Detection
  • [17.01.2020] Declare Project Groups (max 3 people) by next Monday 24° February adding your information at link
  • [12.01.2020] Project evaluation and Proposed final Mark Results
  • [28.12.2019] Results of Midterm-Test December 2019: Results. Students that did not pass the midterm test can do the written exam during the winter sessions. When we will have the project mark we will compute the average between the written and project mark
  • [06.12.2019] Exercises on Clustering: ex._clustering.pdf
  • [04.11.2019] The lecture of Monday, November 4, terminates at 15:00 to allow for the participation to the Informatica 50 event “Ora che comanda lui, quando tutto è basato sul software” (in Italian), h 15:30 at Aula Magna Storica del Palazzo della Sapienza, UNIPI. Full information: sito web evento
  • [03.10.2019] Please, fill the spreadsheet with name of the group (Group1, Group2, …), the list of students composing the group.
  • [26.09.2019] Global Climate Strike: teachers of DM course tomorrow Friday September 27 will join the Global Climate strike, so tomorrow the lecture is suppressed.
  • [18.09.2019] Event: “Privacy: limite o opportunità? Gli esempi delle Nuove Tecnologie e dei Dati Sanitari” Information here.

Learning goals -- Obiettivi del corso

… a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.

Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.

La grande disponibilità di dati provenienti da database relazionali, dal web o da altre sorgenti motiva lo studio di tecniche di analisi dei dati che permettano una migliore comprensione ed un più facile utilizzo dei risultati nei processi decisionali. L'obiettivo del corso è quello di fornire un'introduzione ai concetti di base del processo di estrazione di conoscenza, alle principali tecniche di data mining ed ai relativi algoritmi. Particolare enfasi è dedicata agli aspetti metodologici presentati mediante alcune classi di applicazioni paradigmatiche quali il Basket Market Analysis, la segmentazione di mercato, il rilevamento di frodi. Infine il corso introduce gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza. Il corso consiste delle seguenti parti:

  1. i concetti di base del processo di estrazione della conoscenza: studio e preparazione dei dati, forme dei dati, misure e similarità dei dati;
  2. le principali tecniche di datamining (regole associative, classificazione e clustering). Di queste tecniche si studieranno gli aspetti formali e implementativi;
  3. alcuni casi di studio nell’ambito del marketing e del supporto alla gestione clienti, del rilevamento di frodi e di studi epidemiologici.
  4. l’ultima parte del corso ha l’obiettivo di introdurre gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza

Reading about the "data scientist" job

  • Data, data everywhere. The Economist, Feb. 2010 download
  • Data scientist: The hot new gig in tech, CNN & Fortune, Sept. 2011 link
  • Welcome to the yotta world. The Economist, Sept. 2011 download
  • Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, Sept 2012 link
  • Il futuro è già scritto in Big Data. Il SOle 24 Ore, Sept 2012 link
  • Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics download
  • Peter Sondergaard, Gartner, Says Big Data Creates Big Jobs. Oct 22, 2012: YouTube video
  • Towards Effective Decision-Making Through Data Visualization: Six World-Class Enterprises Show The Way. White paper at FusionCharts.com. download

Hours - Orario e Aule

DM1 & DM

Classes - Lezioni

Day of Week Hour Room
Lunedì/Monday 14:00 - 16:00 Aula E1
Mercoledì/Wednesday 16:00 - 18:00 Aula A1
Venerdì/Friday 11:00 - 13:00 Aula C1

Office hours - Ricevimento:

  • Prof. Pedreschi: Lunedì/Monday h 14:00 - 17:00, Dipartimento di Informatica
  • Prof. Monreale: Lunedì/Monday h 09:00 - 11:00, Dipartimento di Informatica

DM 2

Classes - Lezioni

Day of Week Hour Room
Monday 09:00 - 11:00 C
Wednesday 16:00 - 18:00 C1

Office hours - Ricevimento:

  • Room 268 Dept. of Computer Science
  • Thursday: 15-17, Room: 286
  • Appointment by email

Learning Material -- Materiale didattico

Textbook -- Libro di Testo

  • Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining. Addison Wesley, ISBN 0-321-32136-7, 2006
  • Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. GUIDE TO INTELLIGENT DATA ANALYSIS. Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7
  • Laura Igual et al. Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications. 1st ed. 2017 Edition.

Slides of the classes -- Slides del corso

Past Exams

* Exercises on Clustering: ex._clustering.pdf

* Some text of past exams on DM1 (6CFU):

* Some solutions of past exams containing exercises on KNN and Naive Bayes classifiers DM1 (9CFU):

* Some exercises (partially with solutions) on sequential patterns and time series can be found in the following texts of exams from the last years:

Data mining software

  • KNIME The Konstanz Information Miner. Download page
  • Python - Anaconda (3.7 version!!!): Anaconda is the leading open data science platform powered by Python. Download page (the following libraries are already included)
  • Scikit-learn: python library with tools for data mining and data analysis Documentation page
  • Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Documentation page
  • WEKA Data Mining Software in JAVA. University of Waikato, New Zealand Download page

Class calendar - Calendario delle lezioni (2019/2020)

First part of course, first semester (DM1 - Data mining: foundations & DM - Data Mining)

Day Topic Learning material Instructor
1. 16.09 14:00-16:00 Overview. Introduction to KDD Course Overview Introduction DM Pedreschi
18.09 16:00-18:00 Lecture canceled (Event at Scuola S. Anna Information in News Section of this page) Pedreschi
2. 20.09 11:00-13:00 Introduction to KDD: technologies, Application and Data Pedreschi
3. 23.09 14:00-16:00 Data Understanding (from Bertold book!) Slides DU Slides on Descriptive Statistics useful for clarifying some statistical notions of statistics. Unfortunately this material is only in Italian. Monreale
4. 25.09 16:00-18:00 Data Preparation Slides DP Monreale
27.09 11:00-13:00 Climate Strike
5. 30.09 14:00-16:00 Introduction to Python. Python Introduction Monreale
6. 02.10 16:00-18:00 Clustering: Introduction + Centroid-based clustering, K-means Clustering: Intro and K-means Pedreschi
7. 04.10 11:00-13:00 Lab: Data Understanding & Preparation in Knime Knime: 01_data_understanding.zip Data: Titanic File Monreale
8. 07.10 14:00-16:00 Lab: DU Python + Project presentation Python: titanic_data_understanding2.ipynb.zip Monreale
9. 09.10 16:00-18:00 Clustering: K-means + Hierarchical 5.basic_cluster_analysis-hierarchical.pdf Monreale
10. 11.10 11:00-13:00 Suppressed for Internet festival Pedreschi
11. 14.10 14:00-16:00 Clustering: DBSCAN & VALIDITY 6.basic_cluster_analysis-dbscan-validity.pdf Pedreschi
12. 16.10 16:00-18:00 Exercises on Clustering Tool for Dm ex: Didactic Data Mining Ex. Clustering PDF Ex. Clustering PPTX Monreale
13. 18.10 11:00-13:00 Lab: Clustering clustering_knime clustering_python Monreale
14. 21.10 14:00-16:00 Classification 7.chap3_basic_classification-2019.pdfA visual intro to machine learning Pedreschi
15. 23.10 16:00-18:00 Classification Pedreschi
16. 25.10 11:00-13:00 Classification Pedreschi
17. 28.10 14:00-16:00 LAB: Classificazione knime_classification python_classification Monreale
18. 30.10 16:00-18:00 Exercises Classification + Discussion Clustering ex-classification.pdf Monreale
19. 04.11 14:00-15:00 Pattern Mining Note: the lecture will terminate at 15:00 to allow for the participation of the Informatica50 event (see news) slides Pedreschi
20. 06.11 16:00-18:00 Pattern Mining Pedreschi
08-14.11 Project work
21. 15.11 11:00-13:00 Exercises and Lab on Pattern Mining knime_pattern python_pattern https://anaconda.org/conda-forge/pyfim, http://www.borgelt.net/pyfim.html ex-frequentpatterns-ar.pdf Monreale
18.11 14:00-16:00 Suppressed for weather conditions
20.11 16:00-18:00 Suppressed
22. 22.11 11:00-13:00 Exercises Classification Monreale
Next Classes are dedicated to DM of 9 CFU
23. 25.11 14:00-16:00 Alternative methods for classification/1 K-Nearest Neighbors & Naive Bayes Pedreschi
24. 27.11 16:00-18:00 Alternative methods for classification/2 Wisdom of the crowd & Ensemble methods: Bagging, Random Forest & Boosting Galton's "Vox Populi" 1907 Nature paper Pedreschi
25. 29.11 11:00-13:00 Alternative methods for classification/3 Recap Ensemble methods & Hints to Rule-based classification Pedreschi
26. 02.12 14:00-16:00 Alternative Methods for Pattern Mining + Ex on KNN and NB fp-growth.pdf KNN & NB Monreale
27. 04.12 16:00-18:00 Alternative Methods for Clustering 1-alternative-clustering-2019.pdf2-transactionalclustering-2019.pdf Monreale
28. 06.12 11:00-13:00 Sequential Pattern Mining Sequential patterns Pedreschi
29. 09.12 14:00-16:00 Exercises on sequential pattern mining & ROCK exsequentialpatternmining.pdf ex-clustering-rock.pdf Monreale
30. 11.12 16:00-18:00 Black Box Explanations 2019-dm_xai.pdf Material: LORE LIME Survey ABELE Monreale
31. 13.12 11:00-13:00 Exercises on written exam - all students 9_cfu_ex.pdf ex_clustering_fpm_dt.pdf hierarchical_max_sim.pdf Monreale
32. 16.12 13:30-16:00 Mid-term Test (Rooms A, E1, C1) Monreale
30. 18.12 16:00-18:00 Privacy in DM. Project. privacydt.pdf Overview on Privacy Privacy by design Monreale

Second part of course, second semester (DM2 - Advanced Topics on Data Mining and Applications)

Day Room (Aula) Topic Learning material Instructor (Guidotti)
1. 17.02.2020 09:00-11:00 C Introduction, Instance-based and Bayesian Classifiers Intro, Libraries, Instance-Based and Bayesian Classifiers
2. 19.02.2020 16:00-18:00 C1 Linear and Logistic Regression, Dimensionality Reduction, Exercises KNN and Naive Bayes Regression, Dimensionality Reduction, Ex_KNN_NB_Lift, Appendix
3. 24.02.2020 09:00-11:00 C Imbalanced Learning, Performance Evaluation and Rule-based Classifiers Imbalanced Learning Rule-based Classifiers
4. 26.02.2020 16:00-18:00 C1 Exercises Lift, ROC, KNN and Naive Bayes. Lab KNN and Naive Bayes. Ex_KNN_NB_Lift, Lab_KNN_NB, Data Preparation, Churn Dataset, Iris Dataset
5. 02.03.2020 09:00-11:00 C Lab Regression, Dimensionality Reduction, Imbalanced Learning + CAT1 Regression, Dimensionality Reduction, Imbalanced Learning Airquality Dataset
6. 04.03.2020 16:00-18:00 C1 CRISP-DM, SVM, Intro NN CRISP-DM, SVM, NN
7. 09.03.2020 09:00-11:00 online Neural Network, Exercises NN NN , Ex_NN_Ensemble
8. 11.03.2020 16:00-18:00 online Neural Network, Exercises NN, Deep Neural Network, Intro Ensemble, Exercises Ensemble NN , DNN Ex_NN_Ensemble
9. 16.03.2020 09:00-11:00 online Ensemble Classifiers, Exercises Ensemble Ensemble, Ex_NN_Ensemble
10. 18.03.2020 16:00-18:00 online Lab SVM, Neural Network, Ensemble Lab_SVM_NN_RF
11. 23.03.2020 09:00-11:00 online Time Series Similarity, Ex DTW Time Series Similarity, Ex_DTW
12. 25.03.2020 16:00-18:00 online Time Series Motif/Shapelet, Ex Matrix Profile Time Series Motif/Shapelet, Ex_MP
13. 30.03.2020 09:00-11:00 online Time Series Stationariety and Forecasting Time Series Forecasting
14. 01.04.2020 16:00-18:00 online Lab Time Series Lab_TS
15. 06.04.2020 09:00-11:00 online Time Series Classification, Lab Time Series Time Series Classification, Lab_TS, Data Partitioning
- 08.04.2020 Reading/Project Week
- 15.04.2020 Reading/Project Week
16. 20.04.2020 09:00-11:00 online Sequential Pattern Mining SPM
17. 22.04.2020 16:00-18:00 online SPM Time Constraints, Exercises, Lab Ex_SPM, Lab_SPM
18. 27.04.2020 09:00-11:00 online Advanced Clustering, Ex, SPM, Lab EM, X-Means Advanced Clustering , Lab_AC
19. 29.04.2020 16:00-18:00 online Transactional Clustering, Ex TC, Lab K-Mode Ex_SPM_TC
20. 04.05.2020 09:00-11:00 online Anomaly Detection, Ex AD Anomaly Detection , Ex_AD
21. 06.05.2020 16:00-18:00 online Anomaly Detection, Ex AD, Lab AD Anomaly Detection , Ex_AD, Lab_AD
22. 11.05.2020 09:00-11:00 online Ethics: Privacy Privacy
23. 13.05.2020 16:00-18:00 online Ethics: Explainability Explainability
24. 18.05.2020 09:00-11:00 online Ethics: Local Explainability, Inspection, Transparent Methods, Lab Explainability, Lab_XAI
- 20.05.2020 Reading/Project Week
- 25.05.2020 Reading/Project Week
- 27.05.2020 Reading/Project Week

Exams

Exam DM part I (DMF)

RULES FOR EXAMS for COMPUTER SCIENCE - 9CFU: EXAM RULES Summer Session - 9 CFU

RULES FOR EXAMS for DATA SCIENCE & BI and DIGITAL HUMANITIES - DM1(6CFU): EXAM RULES Summer Session - DM1(6CFU)

The exam is composed of two parts:

  • An oral exam , that includes: (1) discussing the project report; (2) discussing topics presented during the classes, including the theory of the practical parts. It is optional for students passing the written part by ONLY the mid-term test.
  • A project consists in exercises that require the use of data mining tools for analysis of data. Exercises include: data understanding, clustering analysis, frequent pattern mining, and classification. The project has to be performed by min 3, max 4 people. It has to be performed by using Knime, Python or a combination of them. The results of the different tasks must reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The paper must emailed to datamining [dot] unipi [at] gmail [dot] com. Please, use “[DM 2019-2020] Project” in the subject.

Tasks of the project:

  1. Data Understanding: Explore the dataset with the analytical tools studied and write a concise “data understanding” report describing data semantics, assessing data quality, the distribution of the variables and the pairwise correlations. (see Guidelines for details)
  2. Clustering analysis: Explore the dataset using various clustering techniques. Carefully describe your's decisions for each algorithm and which are the advantages provided by the different approaches. (see Guidelines for details)
  3. Classification: Explore the dataset using classification trees. Use them to predict the target variable. (see Guidelines for details)
  4. Association Rules: Explore the dataset using frequent pattern mining and association rules extraction. Then use them to predict a variable either for replacing missing values or to predict target variable. (see Guidelines for details)
  5. ADDITIONAL TASK for DM9 CFU (OPTIONAL): Students for computer science (DM9CFU) can decide to deliver an additional task for the project selected among the following for additional bonus of 3 points:
    1. Classification: Compare results of classification by decision tree with KNN, Naive Bayesian, analysing also the runtime at training and test phase.
    2. Clustering: Is it possible to apply EM clustering? Does the quality of the clustering result improve?
  • Project 2
    1. Dataset: Bank Loan Status
    2. Assigned: 09/01/2020
    3. Deadline: 4 days before the oral exam
    4. This dataset will be used for all tasks. For the classification task, you have to split the dataset into train and test set and the class to predict is the variable “Loan Status”.
    5. This dataset will be valid for all the exam sessions until September.
    6. Download the dataset Bank Loan Status dataset (in CSV format, zipped)

Guidelines for the project are here.

Exam DM part II (DMA)

The exam is composed of three parts:

  • A written exam, with exercises and questions about methods and algorithms presented during the classes. It can be substituted with ongoing tests held during the course.
  • A project, that consists in employing the methods and algorithms presented during the classes for solving exercises on a given dataset. The project has to be performed by max 3 people. It has to be performed by using Knime, Python, other software or a combination of them. The results of the different tasks must be reported in a unique paper. The total length of this paper must be max 30 pages (suggested 25) of text including figures + 1 cover page (minimum font 11, minimum interline 1). The project must be delivered at least 2 days before the oral exam.
  • An oral exam, that includes: (1) discussing topics presented during the classes, including the theory of the parts already covered by the written exam; (2) discussing the project report with a group presentation.
  • Dataset: the data is about Occupancy Detection and can be downloaded here: dataset. * Submission Draft 1: 16/04/2020 23:59 Italian Time * Submission Draft 2: 25/05/2020 23:59 Italian Time * Final Submission: one week before the oral exam.
  • Dataset 2: the data is about Air Quality and can be downloaded here: dataset. The dataset has not a target variable for classification. Thus, define a target variable, for instance “is weekend” and set “true” for weekend days, and “false” for the others.
  • Final Submission: one week before the oral exam or within 30/11/2020.
  • Project Task 1 - Basic Classifiers and Evaluation
    1. Prepare the dataset in order to build several basic classifiers able to predict room occupancy from the available variables. You are welcome in creating new variables.
    2. Solve the classification task with k-NN (testing values of k), Naive Bayes, Logistic Regression, Decision Tree using cross-validation and/or random/grid search for parameter estimation.
    3. Evaluate each classifier using Accuracy, Precision, Recall, F1, ROC, AUC and Lift Chart.
    4. Try to reduce the dimensionality of the dataset using the methods studied (or new ones). Test PCA and try to solve the classification task in two dimensions. Plot the dataset in the two new dimensions and observe the decision boundary and the one of the trained algorithms.
    5. Analyze the value distribution of the class to predict and turn the dataset into an imbalanced version reaching a strong majority-minority distribution (e.g. 96%-4%). Then solve again the classification task adopting the various techniques studied (or new ones).
    6. Select two continuous attributes, define a regression problem and try to solve it using different techniques reporting various evaluation measures. Plot the two-dimensional dataset. Then generalize to multiple linear regression and observe how the performance varies.
    7. Draw your conclusions about the basic classifiers and techniques adopted in this analysis.
  • Project Task 2 - Advanced Classifiers and Evaluation
    1. Using the dataset for classification prepared for Task 1 build several advanced classifiers able to predict room occupancy from the available variables. In particular, you are required to use SVM (linear and non-linear), NN (Single and Multilayer Perceptron), DNN (design at least two different architectures), Ensemble Classifier (RandomForest, AdaBoost and a Bagging technique in which you can select a base classifier of your choice with a justification).
    2. Evaluate each classifier using Accuracy, Precision, Recall, F1, ROC, etc; Draw your conclusion about the classifiers.
    3. Highlight in the report different aspects typical of each classifier. For instance for SVM: is a linear model the best way to shape the decision boundary? Or for NN: what are the parameter sets or the convergence criteria suggesting you are avoiding overfitting? How many iterations/base classifiers are needed to allow a good estimation using an ensemble method? Which is the feature importance for the Random Forest?
    4. You are NOT required to experiment also in the imbalanced case but if you do it is not considered a mistake.
  • Project Task 3 - Time Series Analysis and Forecasting/Classification
    1. Exploit the temporal information of the dataset preparing it for a univariate framework of analysis, i.e. select a feature and use it as your time series. You are welcome in using more than one reliable temporal split to have more time series of the same feature. You are welcome in creating more than a dataset using more than a feature and report the result on the feature you prefer or more than one. Analyze such datasets for finding motifs and/or anomalies and shaplets. Visualize and discuss them and their relationship with the class of the time series.
    2. On the dataset(s) created, compute clustering based on Euclidean/Manhattan and DTW distances and compare the results. To perform the clustering you can choose among different similarity methods, i.e., shape-based, feature-based, approximation-based, compression-based, etc.. Finally, analyze the clusters and the clustering and highlight similarities and differences.
    3. Apply forecasting methods on the dataset(s) created. Make sure to preprocess adequately the time series according to the method used (e.g., an exponential smoothing or an autoregression), indeed checking stationarity and reducing trends and seasonality or with the help of a statistically significant test;
    4. Solve the classification task on the univariate dataset created using different approaches, i.e., traditional classification, shapelet-based, feature-based, etc.
    5. Solve the classification task considering the whole dataset as a multivariate dataset. Develop the classification process you prefer (e.g. exploiting shapelets, traditional classifiers, CNN, or RNN) to maximize accuracy and F1-score.
  • Project Task 4 - Sequential Pattern Mining
    1. Convert the time series into a discrete format (e.g., SAX) in order to prepare the data for the task.
    2. Using different values of support, extract the most frequent sequential patterns (of at least length 3/4), then discuss the most interesting sequences.
  • Project Task 5 - Outlier Detection and Explainability
    1. From the original dataset (i.e. not the time series built on Task 3 or sequences of Task 4, nor the preprocessed dataset used in Tasks 1 and 2), identify the top 1% outliers.
    2. Adopt at least three different methods belonging to different families (i.e. statistical/depth-based, distance-based, density-based, angle-based, …) and compare the results.
    3. (Optional) Try to use an explanation method to illustrate the reasons for the classification in one of the steps of the previous Tasks (if you want to try LORE please ask the code to [email protected]).

Appelli di esame

Mid-term exams

Date Hour Place Notes Marks
DM1: First Mid-term 2018 16.12.2019 13:30-16:00 Room E1, C1, A Please, use the system for registration: https://esami.unipi.it/

Appelli regolari / Exam sessions

Session Date Time Room Notes Marks
1.16.01.2019 14:00 - 18:00 Room E
2.06.02.2019 14:00 - 18:00 Room E
3.19.06.2019 09:00 - 13:00 Room A1 Oral Exam on DM1 within 15 July. If you cannot do within that date you can do the oral exam on September. Results
4.10.07.2019 09:00 - 13:00 Room A1 Oral Exam on DM1 within 15 July. If you cannot do within that date you can do the oral exam on September. Results
5.08.06.2020 09:00 - 18:00 Microsoft Teams From 08/06 to 25/06. Please register ( here) and select your slot here. We remind to submit the project one week before the exam. It would be helpful if you submit the project within 01/06.
6.26.06.2020 09:00 - 18:00 Microsoft Teams From 26/06 to 16/07. Please register ( here) and select your slot here. We remind to submit the project one week before the exam. It would be helpful if you submit the project within 21/06.
7.17.07.2020 09:00 - 18:00 Microsoft Teams From 17/07 to 29/07. Please register ( here) and select your slot at the agenda link that will be available from 12/07 only for those registered for the exam. We remind to submit the project one week before the exam. It would be helpful if you submit the project within 10/07. It is mandatory to submit the project before 15/07.

Previous years

dm/dm.2019-20.txt · Ultima modifica: 04/11/2022 alle 12:14 (2 anni fa) da Salvatore Ruggieri

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki