Questa è una vecchia versione del documento!
Indice
Data Mining for Customer Relationship Management 2015
- Fosca Giannotti ISTI-CNR, Knowledge Discovery and Data Mining Lab fosca [dot] giannotti [at] isti [dot] cnr [dot] it
- Dino Pedreschi Università di Pisa, Knowledge Discovery and Data Mining Lab pedre [at] di [dot] unipi [dot] it
- Assistente: Anna Monreale, Università di PIsa, Knowledge Discovery and Data Mining Lab annam [at] di [dot] unipi [dot] it
News
- Before Wednesday 13 May 2015: install KNIME (http://www.knime.org).
Goals
Organizations and business are overwhelmed by the flood of data continuously collected into their data warehouses and arriving from external sources – the Web above all. Traditional exploratory techniques may fail to make sense of the data, due to its inherent complexity and size. Data mining and knowledge discovery techniques emerged as an alternative approach, aimed at revealing patterns, rules and models hidden in the data, and at supporting the analytical user to develop descriptive and predictive models for a number of business problems. This short course focusses on the main applications scenarios of data mining to challenging problems in the broad CRM domain - Customer Relationship Management.
Syllabus
- Clustering models for customer segmentation. Discussion of real cases. Hands-on project: segmentation of a base of anonymized customers from the retail industry. Clustering models for competitive intelligence.
- Patterns and association rule mining for market basket analysis. Hands-on project: mining association rules from sales data of the retail industry.
- Prediction models for promotion performance and churn analysis. Discussion of real cases. Hands-on project: churn prediction from a base of anonymized customers from the retail industry.
- Analysis of human mobility patterns by mobility data mining from big data. Mining official data for understanding of human behavior.
- Social network analysis for undestanding diffusion phenomena. Viral marketing.
- Application of data mining to geo-marketing. Analysis of innovators. Predictive models for fraud detection.
Textbooks
- Slides (see Calendar).
- Gordon S. Linoff e Michael J. Berry. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. Wiley, 2011.
- Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining. Addison Wesley, ISBN 0-321-32136-7, 2006
Reading about the "data analyst" job
Calendar
Date | Topic | Learning material | Instructor | |
---|---|---|---|---|
01. | 13.05.2015 - 09:00-13:00 | Introduction to data mining and big data analytics | slides: intro slides: case studies | Giannotti |
02. | 13.05.2015 - 14:00-18:00 | Data understanding; data preparation; Knime tutorial | slides slides data understanding Tutorial KnimeKnime su Iris | Pedreschi, Monreale |
03. | 14.05.2015 - 09:00-13:00 | Pattern and association rule mining & market basket analysis | Giannotti | |
04. | 14.05.2015 - 14:00-18:00 | Clustering analysis & customer segmentation | slides clustering slides customer segmentation | Pedreschi |
05. | 15.05.2015 - 09:00-13:00 | Pattern and association rule mining: esercizi con Knime | Giannotti, Monreale | |
06. | 15.05.2015 - 14:00-18:00 | Clustering analysis: esercizi con Knime | Pedreschi, Monreale | |
07. | 18.05.2015 - 09:00-13:00 | Classification & prediction | slides classification | Pedreschi |
08. | 18.05.2015 - 14:00-18:00 | Prediction models for promotion performance and churn analysis | Giannotti | |
09. | 19.05.2015 - 09:00-13:00 | Classification & prediction: esercizi con Knime | Pedreschi, Monreale | |
10. | 19.05.2015 - 14:00-18:00 | Social network analysis: fundamentals | Pedreschi | |
11. | 20.05.2015 - 09:00-13:00 | Mobility data mining & big data analytics | Giannotti | |
12. | 20.05.2015 - 14:00-18:00 | Big Data Analytics: Privacy awareness | Giannotti, Monreale |
Datasets
Exercises
DSB-Churn Dataset: The dataset consists of 20,000 examples (lines, rows) over 12 variables (fields, columns). The dataset constitutes a two-class supervised learning problem. The class variable, LEAVE, is the last variable on each line, and its legal values are LEAVE and STAY. The header of churn.arff describes the legal values of each variable. Informally, in the following we list their meanings: COLLEGE : Is the customer college educated? INCOME: Annual income OVERAGE: Average overcharges per month LEFTOVER: Average % leftover minutes per month HOUSE: Value of dwelling (from census tract) HANDSET_PRICE: Cost of phone OVER_15MINS_CALLS_PER_MONTH: Average number of long (>15 mins) calls per month AVERAGE_CALL_DURATION: Average call duration REPORTED_SATISFACTION: Reported level of satisfaction REPORTED_USAGE_LEVEL: Self-reported usage level CONSIDERING_CHANGE_OF_PLAN: Was customer considering changing his/her plan? LEAVE : Class variable: whether customer left or stayed Il dataset viene fornito in formato ARFF and is available here. Each group of 2-3 people has to produce a report for each one of the following tasks: 1. Data Understanding: useful as a preliminary step to capture some data property that can help the next step and especially the clustering analysis (Distribution analysis, statistics computation, suitable transformation of variables and Elimination of redundant variables by correlation analysis, managing of missing values and so on); 2. Market Basket Analysis. Problem: the above dataset prepare the data for the extraction of interesting association rules that is possible to derive from the frequent patterns. The report should be discuss the parameters used for the analyses, the adopted frequent pattern algorithm and the association rule analysis justifying your findings related to the most interesting rules by using the different measure introduced in the course. 3. Customer segmentation with k-means. Problem: given the above dataset, find a high-quality clustering using K-means and discuss the profile of each found cluster (in terms of the properties that describe the behaviour of the customers of each cluster). The report should illustrate the adopted clustering methodology and the cluster interpretation. In particular, it is necessary to discuss the identification of the best value of k and the characterisation of the obtained clusters by using both analysis of the k centroids and comparison of the statistics of variables within the clusters and that in the whole dataset. 4. Churn analysis with decision trees. Problem: given a above dataset, find a high-quality classifier, using decision trees, which predicts whether each customer will STAY or LEAVE. The report should illustrate the adopted classification methodology and the decision tree validation and interpretation, describing also the process adopted to select the proposed tree, together with its quality evaluation. Deadline: the three documents must be sent email to all instructors within 1 July 2015**. Specify [MAINS] in the subject of the email.
Exams
The exam of the CRM module consists in the evaluation of the reports of the proposed exercises.