Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

--- dm:mains.santanna.dm4crm.2012 [13/05/2015 alle 09:59 (10 anni fa)] – [Calendar] Dino Pedreschi
+++ dm:mains.santanna.dm4crm.2012 [18/04/2016 alle 19:45 (9 anni fa)] (versione attuale) – [Calendar] Anna Monreale
@@ Linea 10: / Linea 10: @@
   * **Before Wednesday 13 May 2015: install KNIME (http://www.knime.org).**
-  * **Before Tuesday 19 June 2015: install Cytoscape (http://www.cytoscape.org/download.html).**
 ====== Goals ======
@@ Linea 44: / Linea 43: @@
 ^ ^ Date ^ Topic ^ Learning material ^Instructor ^
-|01.   | 13.05.2015 - 09:00-13:00  | Introduction to data mining and big data analytics | {{:dm:1.dm_ml_introduction.pdf| slides: intro}}
+|01.   | 11.05.2016 - 09:00-13:00  | Introduction to data mining and big data analytics | {{:dm:1.dm_ml_introduction.pdf| slides: intro}} {{:dm:2.dm_ml-casestudies.ppt.pdf| slides: case studies}} | Giannotti |
-{{:dm:2.dm_ml-casestudies.ppt.pdf| slides: case studies}} | Giannotti |
+|02.   | 11.05.2016 - 14:00-18:00  | Data understanding; data preparation; Knime tutorial | {{:dm:4.dm_ml_data_preparation.pdf| slides}} {{:dm:04_dataunderstanding.pdf| slides data understanding}} {{:dm:knime_slides_mains.pdf| Tutorial Knime}}{{:dm:du-iris.zip|Knime su Iris}} | Pedreschi, Monreale |
-|02.   | 13.05.2015 - 14:00-18:00  | Data understanding con Knime; data preparation | {{:dm:4.dm_ml_data_preparation.pdf| slides}}
+|03.   | 12.05.2016 - 09:00-13:00  | Pattern and association rule mining & market basket analysis | | Giannotti |
-{{:dm:04_dataunderstanding.pdf| slides data understanding}} | Pedreschi, Monreale |
+|04.   | 12.05.2016 - 14:00-18:00  | Pattern and association rule mining: esercizi con Knime | | Giannotti, Monreale|
-|03.   | 14.05.2015 - 09:00-13:00  | Pattern and association rule mining & market basket analysis | | Giannotti |
+|05.   | 13.05.2016 - 09:00-13:00  | Clustering analysis & customer segmentation | {{:dm:dm.pedreschi.clustering.2015.pdf| slides clustering}} {{:dm:customersegmentation.pdf| slides customer segmentation}} | Pedreschi |
-|04.   | 14.05.2015 - 14:00-18:00  | Clustering analysis & customer segmentation |  | Pedreschi |
+|06.   | 13.05.2016 - 14:00-18:00  | Clustering analysis: esercizi con Knime | | Pedreschi, Monreale |
-|05.   | 15.05.2015 - 09:00-13:00  | Pattern and association rule mining: esercizi con Knime | | Giannotti, Monreale |
+|07.   | 16.05.2016 - 09:00-13:00  | Classification & prediction | {{:dm:dm.giannotti.pedreschi.classification.2015.pdf| slides classification}} | Pedreschi |
-|06.   | 15.05.2015 - 14:00-18:00  | Clustering analysis: esercizi con Knime | | Pedreschi, Monreale |
+|08.   | 16.05.2016 - 14:00-18:00  | Prediction models for promotion performance and churn analysis | {{:dm:5.dml-ml-crm-redemption-churn-promozioni-profili-innovatori.pptx.pdf| slides}} | Giannotti |
-|07.   | 18.05.2015 - 09:00-13:00  | Classification & prediction | | Pedreschi |
+|09.   | 18.05.2016 - 09:00-13:00  | Classification & prediction: esercizi con Knime | | Pedreschi, Monreale |
-|08.   | 18.05.2015 - 14:00-18:00  | Prediction models for promotion performance and churn analysis | | Giannotti |
+|10.   | 18.05.2016 - 14:00-18:00  | Social network analysis: fundamentals | {{:dm:pedreschi_sna_crash_course_mains.pptx.pdf| slides}} | Pedreschi |
-|09.   | 19.05.2015 - 09:00-13:00  | Classification & prediction: esercizi con Knime | | Pedreschi, Monreale |
+|11.   | 20.05.2016 - 09:00-13:00  | Mobility data mining & big data analytics | | Giannotti |
-|10.   | 19.05.2015 - 14:00-18:00  | Social network analysis: fundamentals | | Pedreschi |
+|12.   | 20.05.2016 - 14:00-18:00  | Big Data Analytics: Privacy awareness | | Giannotti, Monreale |
-|11.   | 20.05.2015 - 09:00-13:00  | Mobility data mining & big data analytics | | Giannotti |
-|12.   | 20.05.2015 - 14:00-18:00  | Big Data Analytics: Privacy awareness | | Giannotti, Monreale |
 ===== Datasets =====
@@ Linea 69: / Linea 65: @@
 ===== Exercises =====
-**1. Market Basket Analysis. ** Problem: given a database of transactions of customers of a supermarket, find the set of frequent items co-purchased and analyse the association rules that is possible to derive from the frequent patterns.  Provide a short document (max three pages in pdf, excluding figures/plots) which illustrates the input dataset, the adopted frequent pattern algorithm and the association rule analysis.
+** DSB-Churn Dataset: ** The dataset consists of 20,000 examples (lines, rows) over 12 variables (fields, columns) describing features of customers of a mobile phone provider, including the class variable LEAVE representing whether e customer decided to quit the company or not. The class variable, LEAVE, is the last variable on each line, and its legal values are LEAVE and STAY.  The header of churn.arff describes the legal values of each variable.  Informally, in the following we list their meanings:
+COLLEGE : Is the customer college educated?
+INCOME: Annual income
+OVERAGE: Average overcharges per month
+LEFTOVER: Average % leftover minutes per month
+HOUSE: Value of dwelling (from census tract)
+HANDSET_PRICE: Cost of phone
+OVER_15MINS_CALLS_PER_MONTH: Average number of long (>15 mins) calls per month
+AVERAGE_CALL_DURATION: Average call duration
+REPORTED_SATISFACTION: Reported level of satisfaction
+REPORTED_USAGE_LEVEL: Self-reported usage level
+CONSIDERING_CHANGE_OF_PLAN: Was customer considering changing his/her plan?
+LEAVE : Class variable: whether customer left or stayed
-**Guidelines for the report:** The report has to illustrate the input dataset, the adopted frequent pattern algorithm and the association rule analysis discussing your findings related to the most interesting rules by using the different measure introduced in the course.
-**2. Customer segmentation with k-means.** Problem: given the dataset of RFM (Recency, Frequency and Monetary value) measurements of a set of customers of a supermarket, find a high-quality clustering using K-means and discuss the profile of each found cluster (in terms of the purchasing behavior of the customers of each cluster). Provide a short document (max three pages in pdf, excluding figures/plots) which illustrates the input dataset, the adopted clustering methodology and the cluster interpretation. Dataset legend: for each customer, the dataset contains the recency, frequency and monetary value variables (relative to all purchases, to purchases of fresh food articles, to canned food articles and no-food articles; the variables are present both with original and normalized values):
+**The dataset is available {{:dm:churn.arff.zip|here}}.**
-  * Recency: no. of days since last purchase
-  * Frequency: no. of visits (shopping in the supermarket) in the observation period
-  * Monetary value: total amount spent in purchases during the observation period.
-**Guidelines for the report:**
+**Guidelines:**
- * **Data Understanding**: useful as a preliminary step to capture some data property that can help the clustering analysis (Distribution analysis, statistics computation, suitable transformation of variables and Elimination of redundant variables by correlation analysis);
+Each group (2-3 people) is required to deliver a report (max 10 pages including all figures) describing the methods adopted and the discussion of achieved results with reference to the tasks listed below. Assume that the report is targeted to a //marketing strategist//, who is interested to learn the story inferred in the various data mining analyses and to receive suggestions on how to take appropriate actions as a consequence.
- * **Clustering Analysis by K-means**: Identification of the best value of k and Characterization of the obtained clusters by using both analysis of the k centroids and comparison of the statistics of variables within the clusters and that in the whole dataset.
+**1. Data Understanding**: useful as a preliminary step to capture basic data property. Distribution analysis, statistical exploration, correlation analysis, suitable transformation of variables and elimination of redundant variables, management of missing values.
+**2. Market Basket Analysis. ** Problem: prepare data and extract interesting association rules and frequent patterns.  The report should discuss the parameters used for the analyses, justifying your findings related to the most interesting rules according to the different measure introduced in the course.
-**3. Churn analysis with decision trees. ** Problem: given a dataset of measurements over a set of customers of an e-commenrce site, find a high-quality classifier, using decision trees, which predicts whether each customers will place only one or more orders to the shop. The explanation of the available variables is {{:dm:churnanalysislegenda.pdf|here}}. Provide a short document (max three pages in pdf, excluding figures/plots) which illustrates the input dataset, the adopted classification methodology and the decision tree validation and interpretation.
+**3. Customer segmentation with k-means.** Problem: find a high-quality clustering using K-means and discuss the profile of each found cluster (in terms of the properties that describe the properties of the customers of each cluster). The report should illustrate the adopted clustering methodology and the cluster interpretation. In particular, it is necessary to discuss the identification of the best value of k and the characterisation of the obtained clusters by using both analysis of the k centroids and comparison of the statistics of variables within the clusters with that in the whole dataset.
-**Guidelines for the report:** The report has to illustrate the input dataset, the adopted classification methodology and the decision tree validation and interpretation. Describe the process adopted to select the proposed tree, together with its quality evaluation.
+**4. Churn analysis with decision trees. ** Problem: find a high-quality decision tree that predicts whether each customer will STAY or LEAVE. The report should  illustrate the adopted classification methodology and the decision tree validation and interpretation, describing also the process adopted to select the proposed tree, together with its quality evaluation.
-**Deadline**: the three documents must be sent email to all instructors within **15 July 2014**. Specify [MAINS] in the subject of the email.
+**Deadline**: send the report by email to all instructors within **1 July 2015**. Specify [MAINS] in the subject of the email.
 ====== Exams ======
-The exam of the CRM module consists in the evaluation of the reports of the proposed exercises.
+The exam consists in the evaluation of the report of the proposed mining exercises.