Strumenti Utente

Strumenti Sito


dm:start:guidelines

Questa è una vecchia versione del documento!


Guidelines for the task on data understanding

  • Data semantics (4 points)
  • Distribution of the variables and statistics (7 points)
  • Assessing data quality (missing values + outliers) (7 points)
  • Pairwise correlations (7 points)
  • Presentation and profiling (5 points)

Guidelines for the task on clustering

  • Clustering Analysis by K-means: (15 points)
    • Identification of the best value of k
    • Characterization of the obtained clusters by using both analysis of the k centroids and comparison of the distribution of variables within the clusters and that in the whole dataset
  • Analysis by density-based clustering (10 points)
    • Study of the clustering parameters
    • Characterization and interpretation of the obtained clusters
  • Analysis by hierarchical clustering (5 points)
    • Analysis to be performed on a sampling of the data for scalability reasons (if necessary)

Guidelines for the task on Association Rules Mining

  • Frequent Pattern Extraction with analysis of different values of support(12 points)
  • Association Rule Extraction with analysis of different value of support and confidence (12 points)
  • Discussion on the interesting rules extracted (6 points)

Guidelines for the task on Classification

  • Learning of different decision trees (12 points)
  • Decision tree validation and interpretation (12 points)
  • Discussion on the best decision tree (6 points)
dm/start/guidelines.1450038248.txt.gz · Ultima modifica: 13/12/2015 alle 20:24 (9 anni fa) da Anna Monreale

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki